Strings, byte[]s, and encoding ....

I'm realising I really don't know anything of the java encoding fonctionnalities ...
For example: I' ve written amethod String encode(String) which transforms a bytecode (I hope parameter's byecode) in an other bytecode (I hope result's bytecoed). This transformation is done with a special Charset that I've built. And I use Charset.encode(String).
I've try to call this method with two String parameters. These two String give the same result when calling System.out.println(....) and String.getBytes(). But the two results of my method's call with these two parameters are different !!!!!! How is it possible ?????
There a lot of things I don't know. For example:
- how many bytes are needed to encode an ISO-8859-1 (default java encoding) or UTF-8 character ?
- How can I get the real bytecode of a String, I mean the bytecodes of all its characters ? (for example if a String contains 4 characters, its real bytecode should contain 8 bytes)
Thank's for any help.

I've try to call this method with two String
parameters. These two String give the same result when
calling System.out.println(....) and
String.getBytes(). But the two results of my method's
call with these two parameters are different !!!!!!
How is it possible ?????You used getBytes(String charsetName) with a set of characters that included both overlapping and non-overlapping characters?

Similar Messages

  • SJIS- Japan Encoding Issues(*Unable to handle Double Byte Numeric and Spec)

    Hi All,
    Problem:
    Unable to handle Double Byte Numeric and Special Characters(Hypen)
    The input
    区中央京勝乞田1944-2
    Output
    区中央京勝乞田1944?2
    We have a write service created based on the JCA (Write File Adapter) with the native schema defined with SJIS Encoding as below.
    *<?xml version="1.0" encoding="SJIS"?>
    <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
         xmlns:nxsd="http://xmlns.oracle.com/pcbpel/nxsd" xmlns:tns="http://nike.com/***/***********"
         targetNamespace="http://nike.com/***/*************"
         elementFormDefault="unqualified" attributeFormDefault="unqualified"
         nxsd:version="NXSD" nxsd:stream="chars" nxsd:encoding="SJIS">*
    Do anyone have similar issue? How can we handle the double byte characters while using SJIS encoding? At the least how can we handle double byte hyphen ??
    Thanks in Advance

    Have modified my schema as shown below and it worked well for me and i am partially successful up to some extent. Yet, not sure the workaround will resolve the issue at the final loading...
    <?xml version="1.0" encoding="UTF-8"?>
    <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
    xmlns:nxsd="http://xmlns.oracle.com/pcbpel/nxsd" xmlns:tns="http://nike.com/***/***********"
    targetNamespace="http://nike.com/***/*************"
    elementFormDefault="unqualified" attributeFormDefault="unqualified"
    nxsd:version="NXSD" nxsd:stream="chars" nxsd:encoding="UTF-16">*
    If anyone has the resolution or have these kind of issues let me know.........

  • File I/O and encoding (J2SDK 1.4.2 on Windows)

    I encountered a strange behavior using the FileReader / Writer classes for serializing the contents of a java string. What I did was basically this:
    String string = "some text";
    FileWriter out = new FileWriter(new File("C:/foo.txt"));
    out.write(string);
    out.flush();
    out.close();In a different method, I read the contents of the file back:
    FileReader in = new FileReader(new File("C:/foo.txt"));
    StringWriter out = new StringWriter();
    char[] buf = new char[128];
    for (int len=in.read(buf); len>0; len=in.read(buf)) {
        out.write(buf, 0, buf.length);
    out.flush(); out.close(); in.close();
    return out.toString();Problems arise as soon as the string contains non ascii characters. After writing and reading, the value of the string differs from the original. It seems that different character encodings are used when reading and writing, although the doc states that, if no explicit encoding is specified, the platform's default encoding (in my case CP1252) will be used.
    If I use streams directly instead of writers, it does not work, either, as long as I do not specify the encoding when converting bytes to strings and vice versa.
    When I specify the encoding (no matter which one, as long as I specify the same for reading as for writing), the resulting string is equal to the original one.
    If I replace the FileReader and Writer by StringReader and StringWriter (bypassing the serialization), it works, too (without specifying the encoding).
    Is this a bug in the file i/o classes or did I miss something?
    Thanks for your help
    Ralph

    first.... if you are writing String objects via serialization, encoding doesn't matter whatsoever. Not sure you were saying you tried that, but just for future reference.
    For String.getBytes() and String(byte[]) or InputStreamReader and OutputStreamWriter: If you don't specify an encoding, the system default (or default specified on the command-line or set in some other way) will be used in all cases.
    For byte streams: If you are reading/writing bytes thru streams, then the character conversion is up to you. You call getBytes on a string or create a string with the byte[] constructor.
    For readers/writers: If you are reading/writing characters thru readers/writers, then the character conversion is done by that class.
    However, StringReader and StringWriter are just writing to/from String objects and they are writing Unicode char's, so it's really a special case.
    Okay...
    So if you have a string which has characters outside the range of the encoding being used (default or explicitly specified), then when it's written to the file, those characters are messed up. So say you have a Chinese character which needs 2 bytes. Generally, the 2 bytes are written, but when read back, that one character shows as 2. Whether 2 bytes are written or 1, probably depends on the encoding. But the result is the same, you get a munged up string.
    Generally speaking, you are going to get better storage on most text when using UTF-8 as your encoding. You need to specify it always for reads and writes, or set it as the default. The reason is that chars are written in as many bytes as needed. And it'll support anything Unicode supports, thus anything String supports.

  • String.getBytes() & String(byte[]) - java.nio.BufferOverflowException

    The application in question uses JNI for legacy integration and I suspect the legacy code is corrupting the stack causing the above error. However, the error does not occur in Java 1.3, only Java 1.4.
    Is there some way to suppress 1.4's use of the native IO API when encoding and decoding byte streams? This would at least provide a workaround in the meantime.
    Thanks.

    This is beginning to make a little sense. The problem is that you got a String and you don't want one. A String wraps an array of chars, which your app needs, right? Specifically they're chars because you need 16-bit char sets.
    Presumably the getBytes() method call is used to get an array of bytes for some data transfer operation. java.nio was probably added in 1.4 as it has some very efficient ways of handling buffers as simultaneously of two or more types. It's trying to use the underlying char array as a byte array and there's a straight up bug someplace.
    Workaround is strange to contemplate, but I'm pretty sure it will work: use String.getChars() to get an array of chars, and then use java.nio yourself to create your byte array! If you've never been there, it's not very hard. I use nio all the time and it's never been a problem.

  • Embedding HTML in XML CDATA and encoding issues

    Hi all,
    I'm embedding HTML code in a CDATA section. My problem is that, depending on the document, the HTML can be encoded in many formats. I borrowed a piece of code that sniffs that format so i can create String in the "right" encoding (or at least the one that was guessed).
    - If I directly injected those in the CDATA section, i guess they'd be encoded in UTF-8 and some character would be misinterpreted?
    - What if i would transcode the HTML from the sniffed format to utf-8?
    -Are there any issues woth doing this?
    Sorry if this is a dumb question but I'm quite new to that kind of encoding issues.
    BTW i'm using DOM.
    Thanks
    lexo

    I don't know if it's a dumb question. I just don't understand it at all. Encoding issues only arise when you write data from a Java program to an external location, or when you read data from an external location into a Java program. And none of the activities you mentioned there have anything to do with that.
    When you write your XML to an external file, or wherever you write it to, it gets encoded at that moment. The whole thing. Elements, attributes, CDATA sections, the whole thing. Doesn't matter what's in it, the whole thing gets encoded in whatever charset was chosen.
    Does that help?

  • Converting from spreadshet string to array and then back to spreadsheet string

    My questions is; why is the Spreadsheet string to array function creating more data than the original string had when you change the array back into a spreadsheet string. Im trying to analyze a comma delimited file using array functions since my column and row size is constant, but my data varies. Thus my reason for not using string parsing functions which would get more involved and difficult. So, however, after i convert to a 2D array of data from the comma delimited file I read from, and then I convert back to string using the Array to Spreadsheet String, I get added columns to the file, which prevents another program from receiving these files. Also, the data which I am reading is not all contiguous, it has gaps in some places for empty data. Looking at the file compared to the original after it has gone from string to array and then back to string again, looks almost identical except for the file size which got larger by 400 bytes and where the original file has empty spaces, the new file has a lot of commas added. Any idea?
    Charles

    The result you get is normal when the spreadsheet string contains rows of uneven length. Since the array rows have the same number of elements, nil values are added during the coonversion. And of course, the back to string conversion keep those added values in the string, with the associated commas.
    example : 3 x 3 array
    1,2,3
    4
    5,6,7
    is converted into
    1 2 3
    4 0 0
    5 6 7
    then back to
    1,2,3
    4,0,0
    5,6,7
    Chilly Charly    (aka CC)
             E-List Master - Kudos glutton - Press the yellow button on the left...        

  • Memory leak in String(byte[] bytes, int offset, int length)

    Has anyone run into memory leak problem using this String(byte[] bytes, int offset, int length) class? I am using it to convert byte array to string, and I am showing memory leak using this class. Any idea what is going on?

    Hi,
    If you post in Native methods forum I assume you are using this constructor in the native side.
    Be aware that getting char * from jstring eats memory that you must free before returning from native with env->ReleaseStringUTFChars().
    --Marc (http://jnative.sf.net)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           

  • How to retrieve version and encoding info from SAX parser

    I want to have access to the version and encoding info in a parsed XML document (when using SAX2). Who knows how this works?
    If not possible with SAX2, what other solutions are there?
    Second question: how to retrieve comments in XML files?
    Thanks in advance!

    thanks for answering, but ...
    LexicalHandler is fine for retrieving the comments,
    but does not give me the encoding (specified by
    encoding="UTF-8") and version (specified by
    version="1.0").
    I am feeding my parser with a byte stream, the parser
    resolves the correct encoding and i want to have
    access to this value!!hi , It is unlikely that you will get the encoding and version . This is being addressed by the DOM level3
    , see
    http://www-106.ibm.com/developerworks/xml/library/x-dom3.html?dwzone=xml .

  • How to split the blob byte array and insert in oracle

    how to split the blob byte array and insert in oracle
    I am having a string which is of more than lenght 4000 so i am using BLOB datatype as input to get the string.
    need to split the blob in oracle and store in the different column
    The string is of bytearray i need to strore it in each column based on the byte array.

    this will be my input i need to split the above and store it in different columns in a table.
    for spliting say
    Column1 -1 byte
    Column2-3 byte
    Column3-5 byte
    ColumnN-5 byte
    Table will have corresponding data type
    Column1 - number(10,2)
    Column2 - Float
    Column3 - Float
    ColumnN-Float
    here Column2 datatype is float but it will always have 3 byte information.
    where as Column3 datatype is also float but it will always have 5 byte information.
    Say N is Column 120

  • New String(byte[]) taking up huge space?

    Ok. I have a byte[] of approx. 5.000.000 bytes. Whenever, at any point, I try to create a String using this array by new String(array), my memoryusage increases by about 60/65 megabytes. Can anyone please explain me why on earth it's doing this?
    Note also that this space is not freed up by the GC (afterwards).
    Thanks in advance

    Actually, I've discovered some details about the swing textcomponents. They're very bad for 'simply' showing 2mb+ files.
    Check out these bugs; original and "more recent".
    It doesn't quite explain why putting a byte array to a string occupies twice the space, but it does seem to account for the huge memory increase when putting it to a JTextArea.
    String is immutable so it has to copy your char[] to
    a new char[] backing the String. If it used theSure thing, but I'm passing on a byte[] array, and afaik, it's not using this array and should only copy it. Once.
    Oh hey, could it be that the bytes are saved in unicode? That would double the size... and make sense...
    Oh and PS; Maybe you should read this if you really think strings are immutable... (although you are generally correct :))

  • String & byte

    Hello everybody,
    I'm in this situation: I have a byte array initialized by a list of integers. For some reason, my goal is to convert it to a String and then to get those integers.
    I wrote this simple code:
    byte b[] = { -38, -127, -53, -52, 87, 40, -50, -44, -53, 87, -56, -52, 63, 81, 0, -56, 76, -84, 2, 0, 43, -120, 5, -5 };
    String s = new String(b);
    byte b2[] = s.getBytes();
    for(int i=0; i<b2.length; i++)
    System.out.println((int)(b2));
    but, it works with all ingers except -127, in facts it displays -127 as 63!
    How can I solve this problem? Thanks!

    BalusC wrote:
    I tried ISO-8859-1 here and it worked.
    byte[] bytes = { -38, -127, -53, -52, 87, 40, -50, -44, -53, 87, -56, -52, 63, 81, 0, -56, 76, -84, 2, 0, 43, -120, 5, -5 };
    String string = new String(bytes, "ISO-8859-1");
    for (byte b : string.getBytes("ISO-8859-1")) {
    System.out.print(b + " ");
    All the ISOs 'work' but the point is - what it the OP trying to do?

  • What is the difference between string != null and null !=string ?

    Hi,
    what is the difference between string != null and null != string ?
    which is the best option ?
    Thanks
    user8729783

    Like you've presented it, nothing.  There is no difference and neither is the "better option".

  • How to decoding and encoding PNG and GIF images?

    I could decode and encode JPEG images using following create functions which are in com.sun.image.codec.jpeg package.
    JPEGImageDecoder decoder = JPEGCodec          .createJPEGDecoder(inputStream);
    JPEGImageEncoder encoder = JPEGCodec                    .createJPEGEncoder(outputStream);
    But I dont know required package and functions to decode and encode PNG and GIF images. Please help me.

    Is the API that hard to follow?
    ImageIO.read( file/stream/url)
    ImageIO.write( image, format (e.g. PNG, GIF(1), JPEG), file/stream what have you)
    1) Not sure if Java supports GIF saving, it might if you install JAI, or Java 6.

  • String to Int and Int to String

    How can I convert a string to Int & and an Int to String ?
    Say I've
    String abc="234"
    and I want the "int" value of "abc", how do I do it ?
    Pl. help.

    String.valueOf(int) takes an int and returns a string.
    Integer.parseInt(str) takes a string, returns an int.
    For all the others long, double, hex etc. RTFM :)

  • [svn:osmf:] 11205: Fix bug FM-169: Trait support for data transfer sample doesn' t display bytes loaded and bytes total for SWF element

    Revision: 11205
    Author:   [email protected]
    Date:     2009-10-27 15:04:26 -0700 (Tue, 27 Oct 2009)
    Log Message:
    Fix bug FM-169: Trait support for data transfer sample doesn't display bytes loaded and bytes total for SWF element
    Ticket Links:
        http://bugs.adobe.com/jira/browse/FM-169
    Modified Paths:
        osmf/trunk/apps/samples/framework/PluginSample/src/PluginSample.mxml
        osmf/trunk/apps/samples/framework/PluginSample/src/org/osmf/model/Model.as

    The bug is known, and a patch has been submitted: https://bugs.freedesktop.org/show_bug.cgi?id=80151. There's been no update since friday, so I wonder what the current status is, or if it's up for review at all.
    Does anyone know how we can be notified when this patch hits the kernel?

Maybe you are looking for