Strings, byte[]s, and encoding ....
I'm realising I really don't know anything of the java encoding fonctionnalities ...
For example: I' ve written amethod String encode(String) which transforms a bytecode (I hope parameter's byecode) in an other bytecode (I hope result's bytecoed). This transformation is done with a special Charset that I've built. And I use Charset.encode(String).
I've try to call this method with two String parameters. These two String give the same result when calling System.out.println(....) and String.getBytes(). But the two results of my method's call with these two parameters are different !!!!!! How is it possible ?????
There a lot of things I don't know. For example:
- how many bytes are needed to encode an ISO-8859-1 (default java encoding) or UTF-8 character ?
- How can I get the real bytecode of a String, I mean the bytecodes of all its characters ? (for example if a String contains 4 characters, its real bytecode should contain 8 bytes)
Thank's for any help.
I've try to call this method with two String
parameters. These two String give the same result when
calling System.out.println(....) and
String.getBytes(). But the two results of my method's
call with these two parameters are different !!!!!!
How is it possible ?????You used getBytes(String charsetName) with a set of characters that included both overlapping and non-overlapping characters?
Similar Messages
-
SJIS- Japan Encoding Issues(*Unable to handle Double Byte Numeric and Spec)
Hi All,
Problem:
Unable to handle Double Byte Numeric and Special Characters(Hypen)
The input
区中央京勝乞田1944-2
Output
区中央京勝乞田1944?2
We have a write service created based on the JCA (Write File Adapter) with the native schema defined with SJIS Encoding as below.
*<?xml version="1.0" encoding="SJIS"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:nxsd="http://xmlns.oracle.com/pcbpel/nxsd" xmlns:tns="http://nike.com/***/***********"
targetNamespace="http://nike.com/***/*************"
elementFormDefault="unqualified" attributeFormDefault="unqualified"
nxsd:version="NXSD" nxsd:stream="chars" nxsd:encoding="SJIS">*
Do anyone have similar issue? How can we handle the double byte characters while using SJIS encoding? At the least how can we handle double byte hyphen ??
Thanks in AdvanceHave modified my schema as shown below and it worked well for me and i am partially successful up to some extent. Yet, not sure the workaround will resolve the issue at the final loading...
<?xml version="1.0" encoding="UTF-8"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:nxsd="http://xmlns.oracle.com/pcbpel/nxsd" xmlns:tns="http://nike.com/***/***********"
targetNamespace="http://nike.com/***/*************"
elementFormDefault="unqualified" attributeFormDefault="unqualified"
nxsd:version="NXSD" nxsd:stream="chars" nxsd:encoding="UTF-16">*
If anyone has the resolution or have these kind of issues let me know......... -
File I/O and encoding (J2SDK 1.4.2 on Windows)
I encountered a strange behavior using the FileReader / Writer classes for serializing the contents of a java string. What I did was basically this:
String string = "some text";
FileWriter out = new FileWriter(new File("C:/foo.txt"));
out.write(string);
out.flush();
out.close();In a different method, I read the contents of the file back:
FileReader in = new FileReader(new File("C:/foo.txt"));
StringWriter out = new StringWriter();
char[] buf = new char[128];
for (int len=in.read(buf); len>0; len=in.read(buf)) {
out.write(buf, 0, buf.length);
out.flush(); out.close(); in.close();
return out.toString();Problems arise as soon as the string contains non ascii characters. After writing and reading, the value of the string differs from the original. It seems that different character encodings are used when reading and writing, although the doc states that, if no explicit encoding is specified, the platform's default encoding (in my case CP1252) will be used.
If I use streams directly instead of writers, it does not work, either, as long as I do not specify the encoding when converting bytes to strings and vice versa.
When I specify the encoding (no matter which one, as long as I specify the same for reading as for writing), the resulting string is equal to the original one.
If I replace the FileReader and Writer by StringReader and StringWriter (bypassing the serialization), it works, too (without specifying the encoding).
Is this a bug in the file i/o classes or did I miss something?
Thanks for your help
Ralphfirst.... if you are writing String objects via serialization, encoding doesn't matter whatsoever. Not sure you were saying you tried that, but just for future reference.
For String.getBytes() and String(byte[]) or InputStreamReader and OutputStreamWriter: If you don't specify an encoding, the system default (or default specified on the command-line or set in some other way) will be used in all cases.
For byte streams: If you are reading/writing bytes thru streams, then the character conversion is up to you. You call getBytes on a string or create a string with the byte[] constructor.
For readers/writers: If you are reading/writing characters thru readers/writers, then the character conversion is done by that class.
However, StringReader and StringWriter are just writing to/from String objects and they are writing Unicode char's, so it's really a special case.
Okay...
So if you have a string which has characters outside the range of the encoding being used (default or explicitly specified), then when it's written to the file, those characters are messed up. So say you have a Chinese character which needs 2 bytes. Generally, the 2 bytes are written, but when read back, that one character shows as 2. Whether 2 bytes are written or 1, probably depends on the encoding. But the result is the same, you get a munged up string.
Generally speaking, you are going to get better storage on most text when using UTF-8 as your encoding. You need to specify it always for reads and writes, or set it as the default. The reason is that chars are written in as many bytes as needed. And it'll support anything Unicode supports, thus anything String supports. -
The application in question uses JNI for legacy integration and I suspect the legacy code is corrupting the stack causing the above error. However, the error does not occur in Java 1.3, only Java 1.4.
Is there some way to suppress 1.4's use of the native IO API when encoding and decoding byte streams? This would at least provide a workaround in the meantime.
Thanks.This is beginning to make a little sense. The problem is that you got a String and you don't want one. A String wraps an array of chars, which your app needs, right? Specifically they're chars because you need 16-bit char sets.
Presumably the getBytes() method call is used to get an array of bytes for some data transfer operation. java.nio was probably added in 1.4 as it has some very efficient ways of handling buffers as simultaneously of two or more types. It's trying to use the underlying char array as a byte array and there's a straight up bug someplace.
Workaround is strange to contemplate, but I'm pretty sure it will work: use String.getChars() to get an array of chars, and then use java.nio yourself to create your byte array! If you've never been there, it's not very hard. I use nio all the time and it's never been a problem. -
Embedding HTML in XML CDATA and encoding issues
Hi all,
I'm embedding HTML code in a CDATA section. My problem is that, depending on the document, the HTML can be encoded in many formats. I borrowed a piece of code that sniffs that format so i can create String in the "right" encoding (or at least the one that was guessed).
- If I directly injected those in the CDATA section, i guess they'd be encoded in UTF-8 and some character would be misinterpreted?
- What if i would transcode the HTML from the sniffed format to utf-8?
-Are there any issues woth doing this?
Sorry if this is a dumb question but I'm quite new to that kind of encoding issues.
BTW i'm using DOM.
Thanks
lexoI don't know if it's a dumb question. I just don't understand it at all. Encoding issues only arise when you write data from a Java program to an external location, or when you read data from an external location into a Java program. And none of the activities you mentioned there have anything to do with that.
When you write your XML to an external file, or wherever you write it to, it gets encoded at that moment. The whole thing. Elements, attributes, CDATA sections, the whole thing. Doesn't matter what's in it, the whole thing gets encoded in whatever charset was chosen.
Does that help? -
Converting from spreadshet string to array and then back to spreadsheet string
My questions is; why is the Spreadsheet string to array function creating more data than the original string had when you change the array back into a spreadsheet string. Im trying to analyze a comma delimited file using array functions since my column and row size is constant, but my data varies. Thus my reason for not using string parsing functions which would get more involved and difficult. So, however, after i convert to a 2D array of data from the comma delimited file I read from, and then I convert back to string using the Array to Spreadsheet String, I get added columns to the file, which prevents another program from receiving these files. Also, the data which I am reading is not all contiguous, it has gaps in some places for empty data. Looking at the file compared to the original after it has gone from string to array and then back to string again, looks almost identical except for the file size which got larger by 400 bytes and where the original file has empty spaces, the new file has a lot of commas added. Any idea?
CharlesThe result you get is normal when the spreadsheet string contains rows of uneven length. Since the array rows have the same number of elements, nil values are added during the coonversion. And of course, the back to string conversion keep those added values in the string, with the associated commas.
example : 3 x 3 array
1,2,3
4
5,6,7
is converted into
1 2 3
4 0 0
5 6 7
then back to
1,2,3
4,0,0
5,6,7
Chilly Charly (aka CC)
E-List Master - Kudos glutton - Press the yellow button on the left... -
Memory leak in String(byte[] bytes, int offset, int length)
Has anyone run into memory leak problem using this String(byte[] bytes, int offset, int length) class? I am using it to convert byte array to string, and I am showing memory leak using this class. Any idea what is going on?
Hi,
If you post in Native methods forum I assume you are using this constructor in the native side.
Be aware that getting char * from jstring eats memory that you must free before returning from native with env->ReleaseStringUTFChars().
--Marc (http://jnative.sf.net) -
How to retrieve version and encoding info from SAX parser
I want to have access to the version and encoding info in a parsed XML document (when using SAX2). Who knows how this works?
If not possible with SAX2, what other solutions are there?
Second question: how to retrieve comments in XML files?
Thanks in advance!thanks for answering, but ...
LexicalHandler is fine for retrieving the comments,
but does not give me the encoding (specified by
encoding="UTF-8") and version (specified by
version="1.0").
I am feeding my parser with a byte stream, the parser
resolves the correct encoding and i want to have
access to this value!!hi , It is unlikely that you will get the encoding and version . This is being addressed by the DOM level3
, see
http://www-106.ibm.com/developerworks/xml/library/x-dom3.html?dwzone=xml . -
How to split the blob byte array and insert in oracle
how to split the blob byte array and insert in oracle
I am having a string which is of more than lenght 4000 so i am using BLOB datatype as input to get the string.
need to split the blob in oracle and store in the different column
The string is of bytearray i need to strore it in each column based on the byte array.this will be my input i need to split the above and store it in different columns in a table.
for spliting say
Column1 -1 byte
Column2-3 byte
Column3-5 byte
ColumnN-5 byte
Table will have corresponding data type
Column1 - number(10,2)
Column2 - Float
Column3 - Float
ColumnN-Float
here Column2 datatype is float but it will always have 3 byte information.
where as Column3 datatype is also float but it will always have 5 byte information.
Say N is Column 120 -
New String(byte[]) taking up huge space?
Ok. I have a byte[] of approx. 5.000.000 bytes. Whenever, at any point, I try to create a String using this array by new String(array), my memoryusage increases by about 60/65 megabytes. Can anyone please explain me why on earth it's doing this?
Note also that this space is not freed up by the GC (afterwards).
Thanks in advanceActually, I've discovered some details about the swing textcomponents. They're very bad for 'simply' showing 2mb+ files.
Check out these bugs; original and "more recent".
It doesn't quite explain why putting a byte array to a string occupies twice the space, but it does seem to account for the huge memory increase when putting it to a JTextArea.
String is immutable so it has to copy your char[] to
a new char[] backing the String. If it used theSure thing, but I'm passing on a byte[] array, and afaik, it's not using this array and should only copy it. Once.
Oh hey, could it be that the bytes are saved in unicode? That would double the size... and make sense...
Oh and PS; Maybe you should read this if you really think strings are immutable... (although you are generally correct :)) -
Hello everybody,
I'm in this situation: I have a byte array initialized by a list of integers. For some reason, my goal is to convert it to a String and then to get those integers.
I wrote this simple code:
byte b[] = { -38, -127, -53, -52, 87, 40, -50, -44, -53, 87, -56, -52, 63, 81, 0, -56, 76, -84, 2, 0, 43, -120, 5, -5 };
String s = new String(b);
byte b2[] = s.getBytes();
for(int i=0; i<b2.length; i++)
System.out.println((int)(b2));
but, it works with all ingers except -127, in facts it displays -127 as 63!
How can I solve this problem? Thanks!BalusC wrote:
I tried ISO-8859-1 here and it worked.
byte[] bytes = { -38, -127, -53, -52, 87, 40, -50, -44, -53, 87, -56, -52, 63, 81, 0, -56, 76, -84, 2, 0, 43, -120, 5, -5 };
String string = new String(bytes, "ISO-8859-1");
for (byte b : string.getBytes("ISO-8859-1")) {
System.out.print(b + " ");
All the ISOs 'work' but the point is - what it the OP trying to do? -
What is the difference between string != null and null !=string ?
Hi,
what is the difference between string != null and null != string ?
which is the best option ?
Thanks
user8729783Like you've presented it, nothing. There is no difference and neither is the "better option".
-
How to decoding and encoding PNG and GIF images?
I could decode and encode JPEG images using following create functions which are in com.sun.image.codec.jpeg package.
JPEGImageDecoder decoder = JPEGCodec .createJPEGDecoder(inputStream);
JPEGImageEncoder encoder = JPEGCodec .createJPEGEncoder(outputStream);
But I dont know required package and functions to decode and encode PNG and GIF images. Please help me.Is the API that hard to follow?
ImageIO.read( file/stream/url)
ImageIO.write( image, format (e.g. PNG, GIF(1), JPEG), file/stream what have you)
1) Not sure if Java supports GIF saving, it might if you install JAI, or Java 6. -
String to Int and Int to String
How can I convert a string to Int & and an Int to String ?
Say I've
String abc="234"
and I want the "int" value of "abc", how do I do it ?
Pl. help.String.valueOf(int) takes an int and returns a string.
Integer.parseInt(str) takes a string, returns an int.
For all the others long, double, hex etc. RTFM :) -
Revision: 11205
Author: [email protected]
Date: 2009-10-27 15:04:26 -0700 (Tue, 27 Oct 2009)
Log Message:
Fix bug FM-169: Trait support for data transfer sample doesn't display bytes loaded and bytes total for SWF element
Ticket Links:
http://bugs.adobe.com/jira/browse/FM-169
Modified Paths:
osmf/trunk/apps/samples/framework/PluginSample/src/PluginSample.mxml
osmf/trunk/apps/samples/framework/PluginSample/src/org/osmf/model/Model.asThe bug is known, and a patch has been submitted: https://bugs.freedesktop.org/show_bug.cgi?id=80151. There's been no update since friday, so I wonder what the current status is, or if it's up for review at all.
Does anyone know how we can be notified when this patch hits the kernel?
Maybe you are looking for
-
How can I update cluster items from inside a while loop that does not contain the cluster?
I have a VI that contains front panel clusters and two while loops. The main cluster contains items such as a doubles "distance" and "stepsize" and boolean "step" (a whole buch of this type stuff). The first loop contains an event structure to detect
-
My airport is activated but refuses to be connected to any local network and does'nt detect them
I have a macbook pro system leopard and the problem encountered is my airport is activated but does'nt detect any local network and remained disconnected. Can anybody give some help please ?
-
I purchased CS6 Design Premium directly from Adobe, upgrading from CS5.5 Design Premium. Now it won't acknowledge my valid previous version serial number. And I have the original documentation. What should I do? I can use the trial version, but that
-
Strange colored lines on Macbook pro screen and Mac blocked :((((
I new Macbbok pro bought 3 weeks ago got strange probleme sometime with it screen. Colored line appear in the header of the screen and I need to power it off to make it live again.... Any idea of what wrong ? ? ? ? 3 week old and troubles appear it N
-
Hi all, I'm not familiar with Cisco new AP and have a question for all. Is it possible to have one access point authenticate domain users on one vlan and then outside users on another vlan without authentication? For example, if i am as a domain user