String encoding

Hi,
I'm sending a few strings to the DB through a JDBC PreparedStatement, and it looks like it's sending them to the DB using a wrong character encoding. It's connecting to an Oracle i9 DB.
Wondering, is there anyway to conifigure what encoding the driver uses in communictating to the DB? Is there any where I could do some more research on character encoding and JDBC drivers in general? I'm totally new to the subject, so excuse the ignorance.
thanks,
J

Is there any where I could do some more research on
character encoding and JDBC drivers in general? Here.
Oracle - OTN site.
Just in general (using Oracle and character encoding.)

Similar Messages

JPA string encoding

How can I tune the String encoding in JPA ?
In some cases I need UTF8 encoding, in some cases I need UTF16.
What annotation/attributes do we assign values to ?
How do we do this? Thanks.

javaUserMuser wrote:
String encoding in the database.
Why - because it is the right thing to do: I have decided that.still not clear, what does that have to do with JPA? that's something you would configure in your database, using database specific tools. the java layer doesn't really care how the database stores the strings.

Problems with string encoding - need the text content in char* format.

The problem is non ASCII-characters, which comes out as some sort of unicode I need to desipher.
Here's what I got:
A text frame object with the TextString "Agnartjørna"
I get the text content of this object into an ai::UnicodeString the following way:
AIErr
VMGetTextOfTextArt( AIArtHandle textArt, ai::UnicodeString &ucStr)
    ASUnicode *textBuffer = NULL;
    AITRY {
        TextFrameRef ateTextRef;
        AIX( sAITextFrame->GetATETextFrame( textArt, &ateTextRef));
        ATE::ITextFrame ateText( ateTextRef);
        ATE::ITextRange ateRange = ateText.GetTextRange( true);
        ASInt32 textLen = ateRange.GetSize();
        AIX( sSPBlocks->AllocateBlock( (textLen+2) * sizeof( ASUnicode), nil, (void**) &textBuffer));
        ateRange.GetContents( textBuffer, (ASInt32) textLen+1);
        /* trim off trailing newlines */
        if ((textBuffer[textLen] == '\n') || (textBuffer[textLen] == '\r'))
             textBuffer[textLen] = 0;
        ucStr.clear();
        ucStr.append( ai::UnicodeString( textBuffer, textLen));
        sSPBlocks->FreeBlock( textBuffer);
        textBuffer = NULL;
       AIRETURN;
    AICATCH {
        if (textBuffer) sSPBlocks->FreeBlock( textBuffer);
       AIPROPAGATE;
Now, the next step is to convert it into a form that I can use to call regexp.
Baiscally, I want to detect the ending "tjørna" (meaning small lake) on a map label, and apply a standard abbevriation "tj^a" (with "a" superscripted).
So the problem is to obtain the regexp pattern and the text content in same encoding. And since the regexp library is old *char based, I would like to convert the text content in to plain old *char.
Hence the following code:
static AIErr
VMAbbreviateTextArt( AIArtHandle textArt,
                         vmTextAbbrevEffectParams *params)
    AITRY {
    /* first obtain the text contents of the textArt */
       ai::UnicodeString ucText;
      const int kTextLen = 256;
      char textContent[kTextLen];
      AIX( VMGetTextOfTextArt( textArt, ucText));
      ucText.as_Roman( textContent, kTextLen);
But textContent now has the value "Agnartj\xbfnna" (According to XCode),
which will not get a match on the pattern "tj([øe][rn])na\\" (with backslash matching the end of the string)
Any other ways to convert the textContent to a plain *char string?

Thank you very much, your method will work fine. with
the "UTF-8" parameter the byte[].length is double,
cause every valid byte is preceeded by an -62, but I
will just filter the valid bytes into a new array.
Thanks again,
StefanActually what you need to do is to find the character encoding that your device expects, and then you can code your strings in Arabic.
That's the way Java does things; Strings and char values are always in UNICODE (see www.unicode.org) (which means \u600 to \u6ff for arabic) and uses a specified character encoding when translating these to and from a byte stream.
Each national character encoding has a name. Most of them are identical to ASCII for 0-127 and code their national characters in 128-255.
Find the encoding name for your display and, odds are, the JRE has it in the library.
BTW the character encoding ISO-8859-1 simply maps UNICODE characters 0-255 on to bytes.

String encoding problem - pls help.

I need to read in a string, let say as aString (in ASCII)
inside my program will produce astring, bString
i want the string b in unicode and insert to string a.
but string a need to be ascii.
structure like:
for example:
aString = "<datetime id number $insertHere>"
bString = someCharatersInUnicodeFormat
finalString = "<datetime id number someCharatersInUnicodeFormat>"
but finalString is in ASCII.
how to do it in java ?
do it need to append it in bits level ??
how to manage the bit level in java ?
if i want to take a look on the bytes or bits, how to write it out in java ?
thanks....

while, of coz it will be fine if i use UTF-8 encoding
for my file.
Then it will much easier to write my chinese word.
But actually, the file is produce for another
system.
and that system only accept ascii file.If the other system only accepts ascii files you can't write chinese characters.
DrClap already tried to make this clear to you.
What do exactly you mean with ascii. You possibly mean something different than DrClap, all the others here an I mean.
So i need to make up an ascii file with the chinese
word in unicode.What do you mean with unicode? Unicode is not an encoding
Often people say Unicode but acually they mean UTF-16 which is the native internal representation of text in the Microsoft Windows NT, Windows CE, Qualcomm BREW, and Symbian operating systems; the Java and .NET bytecode environments".
http://en.wikipedia.org/wiki/Comparison_of_Unicode_encodings
The example i post up there is the correct output.
Is there anyone know how to make up a such file like
that ?Independenent of what I think of the idea of mixing up encodings within one file, you can write the first bytes ascii encocded and append an "unicode" encoded part.
FileOutputStream fileOut = new FileOutputStream("file");
// whatever this should be
String unicodeEncoding = "???";
byte[] bytes;
bytes = "ascii part".getBytes("US-ASCII");
fileOut.write(bytes, 0, bytes.length);
bytes = "unicode part".getBytes(unicodeEncoding);
fileOut.write(bytes, 0, bytes.length);

Oracle 9i +Java: Change string encoding from UTF-16 to Windows-1251

Dear colleagues,
I have a very urgent case: need to change encoding of the string retrieved from the file (with encoding UTF-16) to Windows-1251 and put it to db table, to CLOB field.
Code of the Java function
+public static void file2table(String sql, String fileName, String characterSet, int asByteArray) throws SQLException, IOException {+
Connection con = null;
Writer writer = null;
Reader reader = null;
+try {+
con = getConnection();
PreparedStatement ps=con.prepareStatement(sql);
reader = new InputStreamReader(new BufferedInputStream(new FileInputStream(new File(fileName))), characterSet);
BufferedReader br = new BufferedReader(reader);
String s;
+while ((s = br.readLine()) != null) {+
byte[] defaultBytes=s.getBytes(characterSet);
String win1251str=new String(defaultBytes, "windows-1251");
+if(asByteArray>0) {+
ps.setBytes(1, defaultBytes);
+//ps.setBytes(1, win1251str.getBytes("windows-1251"));+
+} else {+
ps.setString(1, s);
+}+
ps.executeUpdate();
+}+
con.commit();
+} finally {+
+if (reader != null) {reader.close();}+
+if (con != null) {con.close();}+
+}+
+}+
I was check, all bytes from the file received correctly. But if I put readed bytes to database table, result text in table is broken.

>
Yes, currently I already have filled table with all file lines in result table but with incorrect encoding
>
No you haven't - not using the code you posted. You can't save LOB data using only the BLOB or CLOB.
That isn't data that you strored - it is garbage that is being stored as the LOB locator.
I ask you why you were trying to store the data that way instead of the way the doc shows you and you said
>
Because var. s is type of Java String.
For method setClob must be use type of CLOB
>
You are teriibly confused about LOBs. A BLOB or CLOB Java datatype is the LOB LOCATOR and doesn't contain any data.
Yes - it is true that method setClob must be of type CLOB but that CLOB instance HAS TO BE THE LOB LOCATOR - not the data.
You access LOB data using streams. To store LOB data you have to RETRIEVE (not send) a LOB locator from the database and then use the locator's stream to send the actual data.
So if you are creating a new record in the table you typically do an INSERT that includes an EMPTY_LOB() and have the newly created LOB locator returned to you. Then you use that locators stream to send the actual data.
Since you are not doing that your approach will not work.
Here is a link to the 9i JDBC Dev Guide
http://docs.oracle.com/cd/B10501_01/java.920/a96654.pdf
See page 8-2 to start with
>
BLOB and CLOB data is
accessed and referenced by using a locator, which is stored in the database table and
points to the BLOB or CLOB data, which is outside the table.
To work with LOB data, you must first obtain a LOB locator. Then you can read or
write LOB data and perform data manipulation. The following sections also
describe how to create and populate a LOB column in a table.
The oracle.sql.BLOB and CLOB classes implement the java.sql.Blob and
Clob interfaces, respectively (oracle.jdbc2.Blob and Clob interfaces under
JDK 1.1.x). By contrast, BFILE is an Oracle extension, without a corresponding
java.sql (or oracle.jdbc2) interface.
Instances of these classes contain only the locators for these datatypes, not the data.
After accessing the locators, you must perform some additional steps to access the
data. These steps are described in "Reading and Writing BLOB and CLOB Data" on
page 8-6 and "Reading BFILE Data" on page 8-22.
Note: You cannot construct BLOB, CLOB, or BFILE objects in your
JDBC application—you can only retrieve existing BLOBs, CLOBs,
or BFILEs from the database or create them using the
createTemporary() and empty_lob() methods.
>
Read the above quotes several times until you understand what they are telling you. These are the two main concepts you need to accept:
>
To work with LOB data, you must first obtain a LOB locator.
You cannot construct BLOB, CLOB, or BFILE objects in your JDBC application
>
See the example code and description starting on page 8-11 for how to populate a LOB column in a table
>
Create a BLOB or CLOB column in a table with the SQL CREATE TABLE statement,
then populate the LOB. This includes creating the LOB entry in the table, obtaining
the LOB locator, creating a file handler for the data (if you are reading the data from
a file), and then copying the data into the LOB.
>
Until you start using the proper methodology you are just wasting you time and will not be successful.

Need information on string encoding

Between NSStrings.h and CFStringEncodingExt.h there are scads of encodings listed but I haven't been able to find any detailed information on them. Does anyone know where I can get such information?
A secondary question: is there a simple way of including 8-bit ASCII into a string in my code? For instance, if I want to say NSString *x; x = @"ß=√π"; how can I do it?
Pete

Pete C wrote:
Between NSStrings.h and CFStringEncodingExt.h there are scads of encodings listed but I haven't been able to find any detailed information on them. Does anyone know where I can get such information?
What sort of information do you want? That isn't a small topic.
Unless you need to read any of those wacky string encodings (such as if you were writing a web browser with compatibility with web sites from 1993) you don't need to worry about any of them except for UTF-8. UTF-8 will handle 95% of your needs. MacOS X resource files and Java text uses UTF-16 for another 4.8% of your needs.
A secondary question: is there a simple way of including 8-bit ASCII into a string in my code? For instance, if I want to say NSString *x; x = @"ß=√π"; how can I do it?
You put that string into a resource file (which is encoding using UTF-16) and then load it using NSLocalizedStringFromTable.
You cannot use 8-bit data in a source file. This is a limitation of the GCC compiler and has nothing to do with a Mac.

XML converted to string - encoding lost?

Hi,
I am using a socket to obtain xml data. Because it is a continuous stream, I need to check for <?xml version...> in order to split the data into parsable chunks because the parser can only parse one xml file at a time.
In order to do this I use a BufferedReader and readline() until I reach the appropriate place. I save the read data in a string and then pass it to the parser.
Do I loose the UTF-8 encoding in this process? I end up receiving the following error:
"Illegal XML character: &#x13" or "Illegal XML character: &#x1e" Also, some other characters seem to be displayed incorrectly in my applet.
How can I solver this problem?

In order to do this I use a BufferedReader and
readline() until I reach the appropriate place. I
save the read data in a string and then pass it to
the parser.
Do I loose the UTF-8 encoding in this process?Most likely, because the BufferedReader uses your system's default encoding. This is commonly ISO-8859-1 or Windows-something but almost never UTF-8.
But if you know the XML is encoded in UTF-8, the simplest thing to do is to read it as such:BufferedReader reader = new BufferedReader(new InputStreamReader(yourXML, "UTF-8"));

XML String encoding - anyone have the the code?

I need to encode strings for use in XML (node values) and replace
items like ampersands, < and > symbols, etc with the proper escaped strings.
My code will be installed on systems where I CANNOT add additional libraries to whatever they may already have.
So, I cannot use JAXP, for example.
Does anyone have the actual Java code for making strings XML compatible ?
I am particularly concerned that the if the string already contains a valid encoding that it is NOT 're-processed' so that this (excuse the extra -'s):
'Hello &-amp-;'
does not become this:
'Hello &-amp-;-amp-;'
Thanks.

It isn't especially difficult code. Here's what you have to do:
1. Replace & by &
2. Replace < by <
3. Replace > by >
4. Replace " by "
5. Replace ' by '
Note that it's important that #1 come first, otherwise you will be incorrectly processing things twice. The order of the rest doesn't matter. (Technically you don't have to do all of these things in all situations -- for example attribute values have different escaping rules than text nodes do -- but it isn't wrong to do them all.)
And note that this is called "escaping", not "encoding".
I am particularly concerned that the if the string already contains a valid encoding that it is NOT 're-processed'This isn't a valid design criterion. You have to set up your design so that you have unescaped strings and you are creating escaped strings, or vice versa. If you have a string and you don't know if it has already been escaped or not, then that's a design failure.

Determine String encoding

Hello folks,
I have a problem with converting data to UTF-8.
My task involves a Oracle database table with a long field.
First of all, the data in this table can be Chinese/Japanese/English/Korean encoded.
So when I retrieve the data, I will need to invoke:
String localStr = new String(rs.getBytes(CONTENT_INDEX),"SJIS");
String utf8Str = new String (localStr.getBytes("SJIS"), "UTF8");
where rs is java.sql.ResultSet
Then I will need to store the utf8 String to a new database table.
Since the default encoding is "ISO8859_1" and that I have no idea whether the data is
Chinese/Japanese/English/Korean encoded, how can I make the proper conversion?
Since the data is in Long Field and I will have to use getBytes() to get the data and
convert it to the local encoding.
So I am asking if there is any way that I can determine what these bytes'
original encoding was?
Is there anything in Character class that I can make use of??
Pls help.

Thank your for your replies.
My problem is that I have no idea whether the row of data I am getting is either Big5/SJIS/KSC5601/ISO8859_1 encoded, everything stores in one table while no column is used for indicating the encoding used.
My client needs to upgrade their content management system which only able to interpret UTF8 data. So the old data needs to convert to UTF8 and store in a new table.
E.g. SJIS (Japanese) to UTF8 to SJIS (again when shown on browser)
Now the problem is that how to find out that these bytes are SJIS encoded originally???

Probleme with string encoding

   Hello,
I have an application in Flex3 and when I send an Soap Query, I have a corect enveloppe,
but when I try the same code in Flex 4, the sting is encoding :
exemple :
Flex 3 :
<ns1:orderLine>
      <ns1:line_nr>2</ns1:line_nr>
      <ns1:productCode>7443</ns1:productCode>
      <ns1:quantity>20</ns1:quantity>
     </ns1:orderLine>
Flex 4 :
<ns1:orderLine>
      <ns1:line_nr>1</ns1:line_nr>
      <ns1:productCode><productCode xmlns="http://emt.netsoa.netinfluence.com/types/order">7505</productCode></ns1:productCode>
      <ns1:quantity>20</ns1:quantity>
     </ns1:orderLine>
The integer is correct, but the string fail.
How can i correct this ?
Jean

Your flex 4 response is translated as below after replacing reserved xml characters. -
<ns1:orderLine>
      <ns1:line_nr>1</ns1:line_nr>
      <ns1:productCode><productCode xmlns="http://emt.netsoa.netinfluence.com/types/order">7505</productCode></ns1:productCode>
      <ns1:quantity>20</ns1:quantity>
     </ns1:orderLine>
its strange that a productCode element came under the productCode element itself. Can you check if this was possibly due to the server returning a malformed response. help in this link -
http://anirudhs.chaosnet.org/blog/2009.06.01.html
I can analyze this if you provide the soap packet dumps

Getting String encoding

Hello ,
I need to get encoding from String object which is already created and dont know which encoding it has,how can i get this encoding? is there some easy solutions if yes please give me a peace of code please,
thanks

To clarify: In Java character and String types are stored in UNICODE, so the actual codes should always be consistent whatever languages you're using and you shouldn't need to know what coding is used. Indeed I'd regard it as bad practice to write code which depends on the specific codes, there are plenty of classification tests in the Character class.
When text is changed to of from a sequence of bytes, that's when encoding becomes an issue. Of course a file is a sequence of bytes to encoding also applies when text data is read from or written to a file.
So whenever you read or write text to or from a file or a byte array a specific encoding will be used (even if you allow it to default to the standard encoding of the system you're running on).
Most encodings can only cope with a subset of the UNICODE characters. The exceptions are UTF-8 and UTF-16. If a character can't be converted Java normally substitutes a "?".
Character encodings tend to come with nice, readily memorable names like ISO-8859-1.

Database String & encoding

I am able to read nation characters from oracle database using rs.getString(), and store it String (perhapse it is UTF) it is corectly printed using System.out.println into Tomcat console, but how to convert them into cp1250 which generate JSP.
I try to use
new String(rs.getString(1).getBytes("iso-8859-1"),"Cp1250") but it change all to ? question mark.
has anybody some advice how to solve this problem.
Thanks Tomas

How to convert the String to cp1250?byte[] bytesInCp1250 = rs.getString(1).getBytes("Cp1250");Don't make the mistake of thinking you can have a cp1250 String. You can't. Strings don't have an encoding, only byte arrays can.

Please Help - Very strange problem with internal String encoding

I created a file in one-byte russian encoding cp1251 and declared String literal with 2 letters:
String str = "ab"; //attention! ab - two russian characters
After I got bytes from it - str.getBytes("cp1251") - it returned 2 byte's array.
Now I created a file with UTF-8 encoding with equal content and suddenly:
After I got bytes from it - str.getBytes("cp1251") - it returned 4! byte's array. Why?
I need to get a one-byte encoded arrays (cp1251 or koi8-r) but getBytes ALWAYS returns two-byte encoded arrays.
It is very strange: cp1251 is always one-byte encoded, but .getBytes("cp1251") returns two-byte on each letter. Why? Please help.

I did not read a string from a file. I created 2 .java files with different encodings (cp1251 and UTF-8) and compiled them, telling compiler with -Dfile.encoding=*** to read them correctly. While execution java interprets two looking equal in editor strings as different objects with different .intern() representation.
Why java consider source .java file encoding while creating internal representation of String object and creates from looking equally in editor strings two DIFFERENT Unicode representations. And it is impossible to convert one representation to other - impossible to get two equal byte[] arrays.

Java Thread Dump and String encoding

Hi
My Application is server application, running on a linux box with JDK 1.4.2_04
Recently, my application got hang, there are no processing. when i took stack trace. I found
"Thread-6204" prio=1 tid=0x0x8e3a400 nid=0x2621 waiting for monitor entry [b40b1000..b40b186c]
at java.nio.ByteBuffer.wrap(ByteBuffer.java:342)
at java.lang.StringCoding$CharsetSD.decode(StringCoding.java:179)
at java.lang.StringCoding.decode(StringCoding.java:220)
at java.lang.StringCoding.decode(StringCoding.java:226)
at java.lang.String.<init>(String.java:380)
Thread is wating for monitor entry in ByteBuffer.wrap.
but the Bytebuffer.wrap is static method and it is not synchronized
i want to know why is it waiting for a monitor when the method is not synchronized.
Thanks in advance

Hi,
If you post this in the java forums , it is more likely that you will get a relevent answer.
Btw , you can actually take a look at the thread which is holding the lock ... Looking at the code , wild guess would be that your code is blocked on waiting for the class to be loaded (Heap*Buffer class).
Regards,
Mridul

String to XML Safe String encoding?

I have a string that I'm 99.99999% sure will never have come from anything but a BigDecimal.toString, but is till want to make sure I don't break my XML if something changes later down the line....so.
Are there any premade utilities out there that will take in a normal old string and make it XML safe (ie escape out all the (< > /) stuff??
Is there something in Xerxes?

Normally, you'd build up your XML document as a DOM, and let the serializer do all the escaping (classes in javax.xml.transform). The only problem is that the standard serializer wants to output entire XML subtrees. You can do a single element, which may be sufficient.
The JDK 1.5 XML support is Xerces, but as far as I know there's no (easy) way to get inside the implementation even if you explicitly use Xerces. DOM4J does provide an XMLWriter class that does appear to let you output just text.
I'm not sure if that's what you really want, though. Are you concerned that someone will give you XML that has "foobar" where you expect a decimal value? Or that you'll write that XML? In that case, you probably want to pass the XML though a Schema validator before processing it / handing it out.

String encoding

Similar Messages

Maybe you are looking for