Escaped data to UTF-8, i.e. 「

Hi everyone!
I have a form that needs to be in ISO-8859-1 format, because the form data is inserted into a database that expects ISO-8859-1.
At one point, I need to get the data out of the database and insert it into another database that accepts only UTF-8 data.
When I insert asian characters into the first database, the Web server automatically converts the asian characters into escaped characters, i.e. 「 (but the & # version - the forum is converting the symbol here :-(). That's the way it's inserted into the database. That is okay. I want it that way.
When I get out data from that database, I get the escaped characters. Now, I need to convert the data to UTF-8 so I can put it into the other database.
Is there a way to convert the escaped data into UTF-8 data?
dailysun
Message was edited by:
dailysun

First off, you don't need to convert the data to UTF-8. When you insert it into the other database, you should be able to specify "UTF-8" as the encoding, and the conversion will be done for you.
So your real problem is merely to convert the HTML escapes back to the characters they represent. That should be a simple matter of extracting the numerical portion of the escape, parsing it with Integer.parseInt(), and casting the resulting int to a char. The regex package makes the process relatively painless: private static Pattern numRefPattern =
      Pattern.compile("&#(?:(\\d++)|[Xx](\\p{XDigit}++));");
public static String unescapeHTML(String source)
    Matcher m = numRefPattern.matcher(source);
    StringBuffer sb = new StringBuffer(source.length());
    while (m.find())
      // append any text between the last match and the current one
      m.appendReplacement(sb, "");
      // find out whether it was a decimal or hex escape and parse accordingly
      int value = m.start(1) != -1
                ? Integer.parseInt(m.group(1));
                : Integer.parseInt(m.group(2), 16);
      if (!Character.isDefined(value))
        throw new IllegalArgumentException("No such character: " + m.group());
      sb.append((char)value);
    // append whatever text followed the final match
    m.appendTail(sb);
    return sb.toString();
}This code is untested, but the technique is one I use often.
If your data might contain characters whose Unicode values are too large to fit in a 16-bit char, the process becomes a bit more complicated, but (as of JDK 1.5) the Character class has methods to handle them, too. If you're only dealing with the major asian character sets (i.e., CJK), I don't think you have to worry about that.

Similar Messages

Escaping data ? Method or class available ?

Hi,
Does any know of a class or method to escape a given String of data ?
Thanks,
Rob

Thanks, I'm passing the Str to a SQL method:
stat = connection.createStatement();
    public void insertTableData(String cmd) {
        connectDB();
        try {
            stmt = connection.prepareStatement(cmd);
            stmt.execute( cmd );
        } catch (SQLException ex) {
            reportSQLException( ex );
    }I get the following error , caused by the { chars?
2006-02-02 12:33:48 : Exception: SQL Exception: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'OSR script osr.bt1.042801 starting... please wait" ??{??sudo -u bt1mq run_OSR_mq' at line 1
2006-02-02 12:33:48 : Exception: SQL State: 42000
2006-02-02 12:33:48 : Exception: Vendor Error: 1064

11g Client installation errror - Unable to construct admin data&Invalid UTF

I am trying to install Oracle 11g Client on Windows 7 (version is Oracle Database 11g Release 2 Client (11.2.0.1.0) for Microsoft Windows (32-bit)). I already have installed Oracle 10g Client and it is working without problem. I read few articles that say it can be used simultaneously on single machine.
I get following error:
"ERROR!! Unable to construct admin data segment. java.io.UTFDataFormatException:Invalid UTF8 encoding."
I have full admin rights on machine, Windows is updated and latest Java is installed. Searched all over the net but can't find what could be the problem.
Thank you in advance.

This may be helpfull:
http://download.oracle.com/javase/1.4.2/docs/api/java/lang/class-use/Exception.html
its look like you have a problem with java
Edited by: Fran on 12-jul-2011 5:02

Content-Type=application/x-www-form-urlencoded with UTF-8 data which servlet fails to interpret correctly .

My Environment is Weblogic 6.1 SP2 on WIN2K.
I have an HTTP client sending an HTTP request with
Content-Type=application/x-www-form-urlencoded
the data in HTTP request is in UTF-8. Note that HTTP client does not specify charset=UTF-8
as part of the Content-Type.
I have a servlet "VosXmlEvents" processing this HTTP request.
The HTTP request contains data like “Zoë” which the servlet below
interprets as “ZoÃ«” causing me to think that my deployment settings
might be wrong.
My Deployment settings:
In weblogic.xml deployment descriptor I have specified following:
<charset-params>

<input-charset>
<resource-path>/vos/events/xml/*</resource-path>

<java-charset-name>UTF8</java-charset-name>
</input-charset>

<charset-mapping>
<iana-charset-name>ISO_8859-1:1987</iana-charset-name>
<java-charset-name>ISO-8859-1</java-charset-name>
</charset-mapping>
<charset-mapping>
<iana-charset-name>ISO_8859-1:1987</iana-charset-name>
<java-charset-name>ISO8859_1</java-charset-name>
</charset-mapping>
<charset-mapping>
<iana-charset-name>UTF-8</iana-charset-name>
<java-charset-name>UTF-8</java-charset-name>
</charset-mapping>
<charset-mapping>
<iana-charset-name>UTF-8</iana-charset-name>
<java-charset-name>UTF8</java-charset-name>
</charset-mapping>
</charset-params>
My web.xml has:
<servlet>
<servlet-name>VosXmlEvents</servlet-name>
<servlet-class>some class name</servlet-class>
</servlet>
<servlet-mapping>
<servlet-name>VosXmlEvents</servlet-name>
<url-pattern>/vos/events/xml/*</url-pattern>
</servlet-mapping>
According to Servlet spec. 2.3 sec. SRV.4.9 Request data encoding, cut pasting
from spec below:
"Currently, many browsers do not send a char encoding qualifier with the Content-Type
header, leaving open the determination of the character encoding for reading HTTP
requests. The default encoding of a request the container uses to create the request
reader and parse POST data must be “ISO-8859-1”, if none has been
specified by the client request. However, in order to indicate to the developer
in this case the failure of the client to send a character encoding, the container
returns null from the getCharacterEncoding method.
If the client hasn’t set character encoding and the request data is encoded
with a different encoding than the default as described above, breakage can occur.
To remedy this situation, a new method setCharacterEncoding(String enc) has been
added to the ServletRequest interface. Developers can override the
character encoding supplied by the container by calling this method. It must be
called prior to parsing any post data or reading any input from the request. Calling
this method once data has been read will not affect the encoding."
Q1. Should not <input-charset> related settings in weblogic.xml for a servlet
helps container actually do some thing like setCharacterEncoding(String enc) on
HTTP Request before it call the servlet?
Q2. If not then that would mean I have to programatically call setCharacterEncoding(String
enc)to correctly interpret my HTTP Request?
Q3. If the answer to Q1 is "yes" then I will assume that getInputStream() or getReader()
methods on the HttpServletRequest will give the chacter in the client encoded
format which in this case is UTF-8, i.e I will get "Zoë" as in this example. Am
I correct here?
Reference:
1. http://edocs.bea.com/wls/docs70/webapp/components.html#139932
2. http://java.sun.com/j2se/1.3/docs/guide/intl/encoding.doc.html

Stefan,
Thanks for the information. I have the following question then.
The Webserver I am interacting with does not recognize user and password provided in the channel. It accepts user and pwd in payload and in application/x-www-form-urlencoded content type only.
Sample Raw Post that server server requires looks like this
User=yourname&Password=yourpassword&INPUT_XML=%3C%3Fxmlversion%3D%221.0%22standalone%3D%22no%22%3F%3E%3CDELIVERY%3E%3CMESSAGE%3E%0D%0A++++%3CDESTINATION_ADDR%3E%2B447900570205%3C%2FDESTINATION_ADDR%3E%%3C%2FMESSAGE%3E%0D%0A%0D%0A%3C%2FDELIVERY%3E%
What i have done is that constructed the post manually in Java code in un encoded form and expected the HTTP Adapter to do the encoding. Having done this, the HTTP Server is able to scuccessfully parse the xml except for & and < , > chars which err out as invalid XML.
Is there an elegant way of doing above scenario using XML post and standard HTTP Adapter.
Best Regards,
Sudharshan N A

Utf-8 encoding problem on solaris

Hello all.
I am using weblogic 9.2 and I am facing a very weird problem regarding the encoding. I fetch data from the db (informix btw) and I forward data as utf-8 to jsps. I have set up everything succesfully on my web.xml, weblogic.xml and all jsps include the page directive for utf-8. When I deploy my application on windows 2k machine everything goes smooth. But when the deployment happens on a solaris machine my jsps show "?" instead of letters. Has anyone faced this problem before? Could you plz direct me towards a solution because this thing has taken me days and days and I still haven't managed to find a solution
Thanx in advance
axel

Hi,
Start the app, and hook an Eclipse debug project to it. Check if the enconding problem is while retrieving from the DB or while generating the response. If the issue is on the DB, you may need to define the enconding on the connection (I am not sure what driver you are using, but should be able to check this out.) If the issue is while generating the response, just XML escape every character.
Regards,
LG

Quiz App - What is the Best Way to Represent Data?

I'm creating a quiz app, kind of like flash cards.
The text for the questions and answers might contain HTML code, double and single quotes, etc..
The text for a question or answer might also extend across multiple lines, with some lines indented.
Should I store the question and answer text in an XML file, or in text files?
I ask because I know sometimes when you store data in XML files you have to jump through hoops to have Flex not mess up the data if it contains <, >, ", ', and lots of whitespace.
If I store the data in an XML file, I guess I need to escape < > " ' and unescape after reading in the data?

If I enter this text and send it using JSON to PHP:
What does this code do?
<?xml version="1.0" encoding="utf-8"?>
<mx:Application xmlns:mx="http://www.adobe.com/2006/mxml">
<mx:Script>
    <![CDATA[
      private function testFunc(testArg:String):int{
        var myVar:String = "#000000";
        var myVar2:int = 0xFFFFFF;
        return 1;
    ]]>
</mx:Script>
</mx:Application>
This is the raw, escaped data received by my PHP script:
%7B%22questionData%22%3A%7B%22answerDescription%22%3A%22test%20question%20description%22%2 C%22questionAnswers%22%3A%5B%7B%22answer%22%3A%22test%20answer1%22%2C%22correct%22%3A1%7D% 2C%7B%22answer%22%3A%22test%20answer2%22%2C%22correct%22%3A0%7D%5D%2C%22questionMetadata%2 2%3A%5B%7B%22metadata%22%3A%22test%22%7D%5D%2C%22questionText%22%3A%22What%20does%20this%2 0code%20do%3F%5Cr%5Cr%3C%3Fxml%20version%3D%5C%221.0%5C%22%20encoding%3D%5C%22utf-8%5C%22% 3F%3E%5Cr%3Cmx%3AApplication%20xmlns%3Amx%3D%5C%22http%3A//www.adobe.com/2006/mxml%5C%22%3 E%5Cr%20%20%3Cmx%3AScript%3E%5Cr%20%20%20%20%3C%21%5BCDATA%5B%5Cr%20%20%20%20%20%20private %20function%20testFunc%28testArg%3AString%29%3Aint%7B%5Cr%20%20%20%20%20%20%20%20var%20myV ar%3AString%20%3D%20%5C%22%23000000%5C%22%3B%5Cr%20%20%20%20%20%20%20%20var%20myVar2%3Aint %20%3D%200xFFFFFF%3B%5Cr%20%20%20%20%20%20%20%20return%201%3B%5Cr%20%20%20%20%20%20%7D%5Cr %20%20%20%20%5D%5D%3E%5Cr%20%20%3C/mx%3AScript%3E%5Cr%3C/mx%3AApplication%3E%22%2C%22quest ionType%22%3A%22Multiple%20Choice%22%7D%7D
And this is the data after urldecode(), str_replace(), and utf8_encode():
{"questionData":{"answerDescription":"test question description","questionAnswers":[{"answer":"test answer1","correct":1},{"answer":"test answer2","correct":0}],"questionMetadata":[{"metadata":"test"}],"questionText":"What does this code do?rr<?xml version="1.0" encoding="utf-8"?>r<mx:Application xmlns:mx="http://www.adobe.com/2006/mxml">r <mx:Script>r    <![CDATA[r      private function testFunc(testArg:String):int{r        var myVar:String = "#000000";r        var myVar2:int = 0xFFFFFF;r        return 1;r      }r    ]]>r </mx:Script>r</mx:Application>","questionType":"Multiple Choice"}}
My PHP script is not able to parse this and get data from these objects:
$questionData = $dataRaw["questionData"];
$questionType = $dataRaw["questionData"]["questionType"];
$questionText = $dataRaw["questionData"]["questionText"];
The actual problem is that $questionData does not contain anything.
However, if I enter this text and send it using JSON to PHP:
What does this code do?
<?xml version="1.0" encoding="utf-8"?><mx:Application xmlns:mx="http://www.adobe.com/2006/mxml"><mx:Script><![CDATA[private function testFunc(testArg:String):int{var myVar:String = "#000000";var myVar2:int = 0xFFFFFF;return 1;}]]></mx:Script></mx:Application>
This is the raw, escaped data received by my PHP script:
%7B%22questionData%22%3A%7B%22answerDescription%22%3A%22test%20question%20description%22%2 C%22questionAnswers%22%3A%5B%7B%22answer%22%3A%22test%20answer1%22%2C%22correct%22%3A1%7D% 2C%7B%22answer%22%3A%22test%20answer2%22%2C%22correct%22%3A0%7D%5D%2C%22questionMetadata%2 2%3A%5B%7B%22metadata%22%3A%22test%22%7D%5D%2C%22questionText%22%3A%22What%20does%20this%2 0code%20do%3F%5Cr%5Cr%26lt%3B%3Fxml%20version%3D%26quot%3B1.0%26quot%3B%20encoding%3D%26qu ot%3Butf-8%26quot%3B%3F%26gt%3B%26lt%3Bmx%3AApplication%20xmlns%3Amx%3D%26quot%3Bhttp%3A// www.adobe.com/2006/mxml%26quot%3B%26gt%3B%26lt%3Bmx%3AScript%26gt%3B%26lt%3B%21%5BCDATA%5B private%20function%20testFunc%28testArg%3AString%29%3Aint%7Bvar%20myVar%3AString%20%3D%20% 26quot%3B%23000000%26quot%3B%3Bvar%20myVar2%3Aint%20%3D%200xFFFFFF%3Breturn%201%3B%7D%5D%5 D%26gt%3B%26lt%3B/mx%3AScript%26gt%3B%26lt%3B/mx%3AApplication%26gt%3B%22%2C%22questionTyp e%22%3A%22Multiple%20Choice%22%7D%7D
And this is the data after urldecode(), str_replace(), and utf8_encode():
{"questionData":{"answerDescription":"test question description","questionAnswers":[{"answer":"test answer1","correct":1},{"answer":"test answer2","correct":0}],"questionMetadata":[{"metadata":"test"}],"questionText":"What does this code do?rr<?xml version="1.0" encoding="utf-8"?><mx:Application xmlns:mx="http://www.adobe.com/2006/mxml"><mx:Script><![CDATA[private function testFunc(testArg:String):int{var myVar:String = "#000000";var myVar2:int = 0xFFFFFF;return 1;}]]></mx:Script></mx:Application>","questionType":"Multiple Choice"}}
And my PHP script processes the data successfully. In this case $questionData correctly contains the data.
I've diffed the raw data and JSON data received in my PHP script, and the " and < and > are there, but if I do not escape them manually as we have seen it does not work.
On the Flex side this is how I process the data:
private function saveQuestion(saveOrUpdate:String):void{
var questionObj:Object = {    questionData: {
      questionType: quesType.selectedItem,
      questionText: questionTxt.text,
      answerDescription: answerDescTxt.text,
      questionAnswers: getAnswers(),
      questionMetadata: getMetadata()
var requestSend:Object = new Object();
var dataString:String = JSON.encode(questionObj);
dataString = escape(dataString);
requestSend.questionDataString = dataString;
requestSend.addQuestion =
"true"; addQuestionRequest.send(requestSend);

BIP report Output has issue due to & in the data

Hi All,
we have created a BIP report in R12.some columns have '&' in the value. Because of this ,the concurrent program is ending with warning.
Please give me some ideas to rectify this issue.
Thanks in advance.
Regards,
P.Kalidoss

one way is to replace '&' to '%amp;' in data
another way is to escape data by utl_i18n (prefer for me)
so
SQL> with t as
2   (select 'some &text' some_text from dual)
3 select replace(some_text, '&', '&'),
4         utl_i18n.escape_reference(some_text)
5    from t
6 /
REPLACE(SOME_TEXT,'&','&AMP;') UTL_I18N.ESCAPE_REFERENCE(SOME
some &text                 some &text
SQL>

Romaji yen sign in Terminal in the UTF-8 encoding

Hello all,
I have a MacBook Pro with a Japanese keyboard running Mac OS X 10.6.2. In Romaji mode, the Japanese keyboard has a dedicated yen sign (¥) key, and Option-¥ produces a backslash (\). In Terminal, for some reason, the ¥ key produces \ without the Option modifier. (Option-¥ also produces \ in Terminal, which is normal behavior.)
A similar situation was discussed in an older topic, http://discussions.apple.com/thread.jspa?messageID=10665836 , where the problem was diagnosed as having the Shift JIS encoding enabled in Terminal. However, this doesn‘t reflect my situation, since the only encoding that is enabled in my Terminal is UTF-8 – and there‘s certainly a yen sign available in UTF-8.
I am able to type other UTF-8 characters in Terminal in Romaji mode; for example, I can type Option-e e to produce é, and entering the command *echo é | od -x* within Terminal shows that the correct UTF-8 byte sequence is generated for é. Since the command *echo -e '\0302\0245'* within Terminal will produce a yen sign there, the problem seems to be connected to the key mapping rather than to a stty interface problem.
Is there anyone running 10.6.2 with a Japanese keyboard who can type the ¥ key in Romaji mode in Terminal with the UTF-8 encoding enabled, and have a yen sign appear rather than a backslash?
(This topic was initially posted in the +Installation and Setup+ forum, and I‘ve taken the advice of a kind soul there to repost the topic in this forum.)

I don't know the exact reason why ¥ is forcefully converted to \ in Terminal (even in UTF-8 encoding), and anyway it would be better to add an option to turn off this conversion (or there may already be a hidden option which I can't find).
But the conversion may be helpful for many users, as expected from the following reasons:
I guess there is no key for backslash on the Japanese keyboard of MacBook Pro. If this is the case, then being able to input \ by just hitting the ¥-key (instead of typing option-¥) may be "useful" for may Terminal users (because \ is used much more frequently than ¥ in programs). Kotoeri has an option to swap ¥ and option-¥ keys (so hitting ¥-key inputs \ and option-¥ inputs ¥), but this setting is global (i.e., not restricted to Terminal.app), so making this as the default setting may confuse most of Japanese users (they don't use Terminal.app at all, but uses ¥ as the currency symbol in other apps). Even Terminal users would use ¥ more frequently than \ in apps other then Terminal, so don't want to modify the global setting.
Another reason may be that there are still many Japanese textbooks for programing which uses ¥ as the escape character (I guess you know why). For example the first C program looks like: printf("Hello World!¥n"); So many beginners would try to input ¥ as written in the textbook, without knowing the escape character in UTF-8 should be \, not ¥. Converting ¥ to \ may be helpful for these users (of course they would be surprised to see not ¥ but \ appears on the screen, but anyway the program would work).
You can send a bug report or feature request at:
http://www.apple.com/feedback/macosx.html

UTF-16 parsing problem in XI

Hi! we are having a problem with XML encoding. An external system A sends us a XML message through HTTP. An example of the XML is:
<?xml version="1.0" encoding="UTF-16"?>
<Envelope version="01.00">
</Envelope>
The system A uses UTF-16 for the encoding. We write a Java mapping to manipulate the data and writes out the data in UTF-8 and use UTF-8 in the header. Then we would do a message mapping. The trace shows that the Java mapping gets executed successfully. However, it throws out a ParserException in the message mapping. This is a copy of the trace log:
<Trace level="1" type="T">RuntimeException during appliction Java mapping com/sap/xi/tf/_Can_HRXML_to_RFC_Req_</Trace>
<Trace level="1" type="T">com.sap.aii.utilxi.misc.api.BaseRuntimeException: Fatal Error: com.sap.engine.lib.xml.parser.ParserException: Invalid char #0x0(:main:, row:3, col:0) at com.sap.aii.mappingtool.tf3.Transformer.checkParserException(Transformer.java:41) at com.sap.aii.mappingtool.tf3.Transformer.start(Transformer.java:79) at com.sap.aii.mappingtool.tf3.AMappingProgram.execute(AMappingProgram.java:232) at com.sap.aii.ibrun.server.mapping.JavaMapping.executeStep(JavaMapping.java:63) at com.sap.aii.ibrun.server.mapping.Mapping.execute(Mapping.java:91) at com.sap.aii.ibrun.server.mapping.SequenceMapping.executeStep(SequenceMapping.java:55) at com.sap.aii.ibrun.server.mapping.Mapping.execute(Mapping.java:91) at com.sap.aii.ibrun.server.mapping.MappingHandler.run(MappingHandler.java:78) at com.sap.aii.ibrun.sbeans.mapping.MappingRequestHandler.handleMappingRequest(MappingRequestHandler.java:88) at com.sap.aii.ibrun.sbeans.mapping.MappingRequestHandler.handleRequest(MappingRequestHandler.java:63) at com.sap.aii.ibrun.sbeans.mapping.MappingServiceImpl.processFunction(MappingServiceImpl.java:79) at com.sap.aii.ibrun.sbeans.mapping.MappingServiceObjectImpl0.processFunction(MappingServiceObjectImpl0.java:131) at sun.reflect.GeneratedMethodAccessor294.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:324) at com.sap.engine.services.ejb.session.stateless_sp5.ObjectStubProxyImpl.invoke(ObjectStubProxyImpl.java:187) at $Proxy42.processFunction(Unknown Source) at sun.reflect.GeneratedMethodAccessor568.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:324) at com.sap.engine.services.rfcengine.RFCDefaultRequestHandler.handleRequest(RFCDefaultRequestHandler.java:95) at com.sap.engine.services.rfcengine.RFCJCOServer.handleRequestInternal(RFCJCOServer.java:113) at com.sap.engine.services.rfcengine.RFCJCOServer$ApplicationRunnable.run(RFCJCOServer.java:171) at com.sap.engine.core.thread.impl3.ActionObject.run(ActionObject.java:37) at java.security.AccessController.doPrivileged(Native Method) at com.sap.engine.core.thread.impl3.SingleThread.execute(SingleThread.java:95) at com.sap.engine.core.thread.impl3.SingleThread.run(SingleThread.java:159)</Trace>
When I run SXMB_MONI to show the processed XML message, the message from the system A isnt shown correctly. This is what it says:
The XML page cannot be displayed
Switch from current encoding to specified encoding not supported. Error processing resource 'file:///C:/Documents and Setti...
<?xml version="1.0" encoding="UTF-16"?>
I tried to use a HTTP post tool to post the XML message, if I changed UTF-16 to UTF-8, the message could be processed successfully.
How do we resolve this encoding problem? The external system A could change the encoding scheme to UTF-8, but the header is hard-coded, and will be remained to be UTF-16 whatever scheme it uses. Any input will be appreciated.
Julie

Hi,
I have a similar problem.
Input is xml with UTF8 encoding, output is xml with UTF8 encoding.
In spite of that simple situation, Transformer returns bad output:
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
DOMSource source = new DOMSource(document);
StreamResult result = new StreamResult(outputStream);
/*Properties properties = transformer.getOutputProperties();
properties.setProperty(OutputKeys.ENCODING, "UTF-8");
transformer.setOutputProperties(properties);*/ this didn't help
transformer.transform(source, result);
//source --> correct in UTF8
//result --> after transformation is incorrect in UTF8
And the result ist:
An invalid character was found in text content. Error processing resource 'file:///C:/Documents and Settings/rlatta/Local S...
<?xml version="1.0" encoding="utf-8" ?><SDS_XSD_ZPPM_POB><pob>070</pob><skratkaPobocky>SE</...
Does anybody know some answer?
p.s. we recently installed support packages SAP_BASIS 13, SAP_ABA 13. Before it worked so far I know.

Triple byte unicode to utf-16

I need to convert a triple byte unicode value (UTF-8) to UTF-16.
Does anyone have any code to do this. I have tried some code like:
String original = new String("\ue9a1b5");
byte[] utf8Bytes = original.getBytes("UTF8");
byte[] defaultBytes = original.getBytes();
but this does not seem to process the last byte (b5). Also, when I try to convert the hex values to utf-16, it is vastly off.
-Lou

Good question. Answer is, it does.
Oops, sorry, I think I left my brain in the kitchen :)
I was somehow thinking that "hmmm, e is not a hexadecimal digit so that must result in an error"... but of course it is...
Am I representing the triple byte unicode character
wrong? How do I get a 3 byte unicode character into
Java (for example, the utf-16 9875)?It's simply "\u9875".
If you have byte data in UTF-8 encoding, this is what you can do:try {
    byte[] utf8 = {(byte) 0xE9, (byte) 0xA1, (byte) 0xB5}
    String stringFromUTF8 = new String(utf8, "UTF-8");
} catch (UnsupportedEncodingException uee) {
    // UTF-8 is guaranteed to be supported everywhere
}

Convert Chinese signs to UTF-8

Hi there
I am trying to find a tool, or some code, who can help me covert Chinese signs to UTF-8 format.
The resen is that I am working on a project who has to develop a web-site containg both english and chinese text.
So first of all to test oure setup we need some chinese signs in UTF-8 format. Later on we will develop a work.flow that will convert those files/chinese signs.
But right now it will be a greate help to get som input on how to get this working.
Purhaps someone has made a smalle code who take a string af chinese signs as input and create a string of UTF-8 as output??
Regards
Tolstrup

To read the file:Reader in = new InputStreamReader(
new FileInputStream(yourChineseFile), "yourChineseFile'sEncoding"));I don't know what encoding your file is in, but it's your file not mine.
To write the data in UTF-8:Writer out = new OutputStreamWriter(new FileOutputStream(yourUTF8File), "UTF-8");

Write file in UTF-16BE Format

I am programming web page that runs SWF file (Flash movie) and servlet.
The SWF file sends with URL command data to the server.
send_lv.sendAndLoad("http://�", result_lv, "POST");The data at � UTF-8 (UTF-8 is the standard encoding for exchanging text, such as online mail systems. UTF is an 8-bit system.).
At the servlet in doPost command I get the data with
String temp = request.getParameter("smstext");I need to write the parameter �temp� into file in UTF-16BE Format, how I do it?
I try almost everything

Once you've got a String, you can get bytes corresponding to a specific encoding with the getBytes(String charset) method.
In your case:byte[] bytes = temp.getBytes("UTF-16BE");Then you may write those bytes to a file using a FileOutputStream.
http://java.sun.com/j2se/1.4.2/docs/api/java/lang/String.html#getBytes(java.lang.String)
http://java.sun.com/j2se/1.4.2/docs/api/java/io/FileOutputStream.html
You should obtain the same result with an OutputStreamWriter, for which you can specify the encoding in the contructor, and then write the String directly:OutputStreamWriter writer = new OutputStreamWriter(new FileOutputStream(fileName), "UTF-16BE");http://java.sun.com/j2se/1.4.2/docs/api/java/io/OutputStreamWriter.html

Change character encoding from UTF-8 to EUC-KR

We are receiving data in UTF-8 in the querystring from a partner formatted as:
%EA%B3%A0%EB%AF%BC%ED%95%98%EC%9E%90%21
Our site uses EUC-KR so using this text for search/display/etc is not possible. Does anyone know how we can convert this to the proper Korean EUC encoding so it can be displayed properly using JSP? Basically it should be:
%B0%ED%B9%CE%C7%CF%C0%DA%21
Thanks in advance.

I'm not sure where you are getting %xx encoded UTF-8.... Is it cuz you have it in a GET method form and that's what you are seeing in the browser's location bar? ...
Let's assume you have a form on a page, and the page's charset is set to UTF-8, and you want to generate a URL encoded string (%xx format, although URLEncoder will not encode ASCII chars that way...).
In the page processing the form, you need to do this:
request.setCharacterEncoding("UTF-8"); // makes bytes read as UTF-8 strings(assumes that the form page was properly set to the UTF-8 charset)
String fieldValue = request.getParameter("fieldName"); // get value
// the value is now a Unicode String in Java, generated from reading the bytes submitted from the form as UTF-8 encoded text...
String utf8EncString = URLEncoder.encode(fieldValue, "UTF-8");
// now utf8EncString is a URL encoded (%xx) string of UTF-8 values
String euckrEncString = URLEncoder.encode(fieldValue, "EUC-KR");
// now euckrEncString is a URL encoded (%xx) string of EUC-KR valuesWhat is probably screwing things up for you mostly is this:
euckrValue = new String(utf8Value.getBytes(), "EUC-KR");
What this does is takes the bytes of the string utf8Value (which is not really UTF-8... see below) in the local encoding (possibly Cp1252 (Windows) or ISO8895-1 (Linux), or EUC-KR if it's Korean Windows), and then reads them as if they were EUC-KR... which they aren't.
The key here is that Strings in Java are not of any encoding. They are pure Unicode values. Encodings only matter when converting to or from bytes. The strings stored in a file or sent over the net have to convert to bytes since that's what is stored/sent, just bytes. The encoding defines how the characters can be encoded into 1 or more bytes, and thus reconstructed.

BUG?? UTF-8 non-Latin database chars in IR csv export file not export right

Hello,
i have this issue: my database character set is UTF-8 (AL32UTF8) and contains data in a table used in IR that are Greek (non-Latin). While i can see them displayed correctly in IR and also via select / in Object Browser in SQL Workshop when i try to Download as csv the produced csv does not have the Greek characters exported correctly, while the Latin ones are ok.
This problem is the same if i try IE or Firefox. Also the export in HTML works successfully and i see the Greek characters there correctly!
Is there any issue with UTF-8 and non-Latin characters in export to csv from IRs ? Can someone confirm this, or has a similar export problem with UTF-8 DB and non-Latin characters ?
How could i solve this issue ?
TIA

Hello Joel,
thanks for taking the time to answer to my Issue. Well this does not work for my case as the source of data (Database character set) is UTF-8. The Data inside the database that are shown in the IR on the Screen is UTF-8 and this is done correctly. You can see this in my example. The actual Data in the Database are from multiple languages, English, Greek, German, Bulgarian etc that's why i selected the UTF-8 character set when implementing the Database and this requirement was for all character data. Also the suggested character set from Oracle is Unicode when you create a Database and you have to support data from multiple languages.
What is the requirement, is that what i see in the IR (i mean in Display) i need to export in CSV file correctly and this is what i expect from the Download as CSV feature to achieve. I understand that you had in mind Excel when implementing this feature but a CSV is just an easy way to export the Data - a Comma Separated Values file, not necessarily to open them directly in Excel. Also i want to add here that in Excel you can import the Data in UTF-8 encoding when importing from CSV, which is fine for my customer. Also Excel 2008 and later understands a UTF-8 CSV file if you have placed the UTF-8 BOM character at the start of the file (well, it drops you to the wizzard, but it's almost the same as importing).
Since the feature you describe and if i understood correctly is creating always an ANSI encoded file in every case, even when the Database character set is UTF-8, it is impossible to export correctly if i have data that are neither in Latin, not in the other 128 country specific characters i choose in Globalization attributes and these data is that i see in Display and need to export to CSV. I believe that this feature in case the Database character set is UTF-8 should create a CSV file that is UTF-8 encoded and export correctly what i see i the screen and i suspect that others would also expect this behaviour. Or at least you can allow/implement(?) this behaviour when Automatic CSV encoding is set to No. But i stongly believe - and especially from the eyes of a user - to have different things in screen and in the depicted CSV file is a bug, not a feature.
I would like to have comments on this from other people here too.
Dionyssis

Displaying unicode or HTML escaped characters from HTTPService in Flex components.

Here is a solution on the Flex Cookbook I developed for
displaying data in Flex components when the data comes back from
HTTPService as unicode of HTML escaped data:
Displaying
unicode or HTML escaped characters from HTTPService in Flex
components.

Hi again Greg,
I have just been adapting your idea for encountering
occasional escaped characters within a body of "normal" text, eg
something like
hellô sun&scaron;ine
Now, the handy String.fromCharCode(charCode) call works a
dream if instead of the above I have
hellô sunšine
Do you know if there is an equivalent call that takes the
named entities rather than the numeric ones? Clearly I can just do
some text substitution to get the mapping, but this means rather
more by-hand work than I had hoped. However, this is definitely a
step in a useful direction for me.
Thanks,
Richard
PS hoping that the web page won't simply outguess me and
replace all the above! Basically, the first line uses named
entities and the second the equivalent numbers...

Escaped data to UTF-8, i.e. 「

Similar Messages

Maybe you are looking for