Convert Asian characters to EBCDIC double byte

How can I convert Asian characters like Chinise or Japanese into EBCDIC double byte array in Java. Do any code available to do this
Thanks

If I have a asian character in test (String) need to
convert it to double byte EBCDIC array. Can you please
let me know how to convert.That is somewhat like saying how do you convert it to ASCII, neither is possible.
There are a number of character sets in the world that use the EBCDIC encoding for the lower bytes and support a multibyte format for another language.
First you need to figure out what character set(s) you can use.
Second you see if java supports an encoding for that.
If it does then you use String.getBytes(String charsetName)
If it doesn't you will have to make your own mapping function.

Similar Messages

Alogrithm for converting Unicode characters to EBCDIC

I would like to know if there is any algorithm for converting Unicode Characters to EBCDIC.
Awaiting your replys
Thanks in advance,
Ravi

I would like to know if there is any algorithm for
converting Unicode Characters to EBCDIC.Isn't ECBDIC a 7-bit code like ASCII. Unicode is
16-bit. This means there is no way Unicode can be
mapped on ECBDIC without loss of information. Link to
Unicode,
No. That is like saying that since UTF-8 is 8 bit based then it can't be mapped to UTF-16. But it does.
EBCDIC either directly supports or has versions which support multibyte character sets. A multibyte character set can encode any fixed format sized character set. The basic idea is the same way UTF-8 works.
Multibyte character sets have the added benifit that most of the data in the world is from the ASCII character set and the encodings always support that using only 8 bits. Thus the memory savings over UTF-16 (or UTF-32) are significant.

SDK double bytes

Hi All
I'd like to ask you something. I updated a business partners data by using SDK, but Japanese characters which is double byte are broken and can't be seen on my PC.
What kind of　setting do I need to use double bytes characters like Japanese?
Thanks,
Satoru

> Does your answer suggest that data-loading is enabled
> only if the source data is in Unicode?
Yes and no. I do not know the topic well enough to give you a precise answer. Your post was about "double-byte characters". So my question was whether you meant UTF-16 character encoding. I don't know whether there are other double-byte character encoding schemes in the asian sphere (assuming you work there).
At any rate, it is obvious the encoding of the source data and the one the DTW/API expects ought to match. Let's hope none of them awaits LATIN-1 in one of its more obscure parts.
Have you tried importing data with XML? You can specify the encoding explicitly with that.
No idea about the JCO, nohow.
I'm afraid I can't tell you much more. I tried to narrow down your problem, and do hope there'll be someone around these parts who can solve it.

Asian double-byte characters

Hello,
When importing English data into the CRM field lengths are defined for example "Name" = 100 chartacters. When im importing some Asian languages these are defined as double byte. Can anyone confirm if this has any effects on the field lenghts i can import for fields or should i be able to expect to import the same number of charters as English?

Jono, I would recommend trying it and see what happens. I have a document called "Steps to Import Asian Characters in a Data File". If you want it let me know your email address.

Text strings from VISA read don't match identical looking text constants - could it be double byte characters"

Our RS232-enabled instrument sends ASCII strings to COM 1 and I read strings in. For example I get the string "TPM", or at least it looks like "TPM" if I display it. However, if I send that to the selector input of a Case structure, and create a case for "TPM", whether the two appear to match varies. Sometimes it matches, and measuring its length returns 3. Sometimes it measures 7 or 11 or 12 characters long, and it doesn't match. I can reproduce a match or a mismatch by my choice of the command that went to the instrument prior to the command that causes the TPM response, but have made no sense of this clue. I have run it through Trim Whitespace, with Both Ends (the default) explicitly selected. I have also turned the string into a byte array, autoindexed a For loop on that, and only passed the bytes if they don't equal 32, or if they don't equal 0, thinking spaces or nulls might be in there, but no better.
The Trim Whitespace function's Help remarks that it does not remove "double byte characters". But I can't find anything else about "double byte characters". Could this be the problem? Are there functions that can tell whether there are "double byte characters", or convert into or out of them? By "double byte characters", do they just mean Unicode?
Solved!
Go to Solution.

Cebailey,
The double byte characters are generally used for characters specific to languages other than English. If you display your message in " '\' Codes Display" in a string indicator do you see any other characters? You could also use Hex Display to see count the number of bytes in the message. You are probably getting messages with non-printable characters that might need to be trimmed before using your application. If you want more information the '\' Codes Display, there's a detailed description found in the LabVIEW Help. You can also find the same information on our website in the LabVIEW Help. Backslash ('\') Codes Display
Caleb WNational Instruments

How do I convert a double-byte encoded file to single-byte ASCII?

Hello,
I am working with XML files (apparently coded in UTF-8) which encoded in double-byte characters.
The problem is the characters for end of line: 00 0D 00 0A
This double byte end of line is causing a problem with a legacy conversion tool (which deals with 0D 0A). The file itself contains no
accented/international characters, so in principle converting to single-byte should not cause any problems.
I have tried to convert this file with tools like native2ascii and the conversion tools that are part of Notepad++ but without
any luck - the "00 0D 00 0A" are still present in the output
Can anyone point me to a tool or some code that can convet this file into single-byte?
Thank you.

Amiens wrote:
native2ascii.exe -encoding UTF-16 -reverse INPUT.xml OUTPUT.xml
gives 00 00 0 0D 00 00 00 0A
so clearly that is not the required output.What you've got there is UTF-16 encoded text that's been converted to UTF-16. Get rid of the "-reverse" option and you should see the result you expect.

ASCII representations of double-byte characters

My file contains ASCII representations of double-byte CJK characters (output of native2ascii). How do I restore them back to the original native characters?
I mean, when I load the file with FileInputStream, what I get are all strings like \uabcd. How do I get the characters represented by these strings?

My file contains ASCII representations of double-byte
CJK characters (output of native2ascii. How do
I restore them back to the original native
characters?
I am no expert in unicode so I don't know if this is correct, but I assume that if a String starts with "\u" then there will be 4 more characters that are a hexadecimal representation of the char value. If that's right, then you should be able to parse out the "\uxxxx" and convert it to a char by parsing the hex. For example//the variable unicode is a String like \uabcd
String hex = unicode.substring(2);
char result = (char) (Integer.parseInt(hex,16));

How best to send double byte characters as http params

Hi all
I have a web app that accepts text that can be in many languages.
I build up a http string and send the text as parameters to another webserver. Hence, whatever text I receive i need to be able to represent on a http query string.
The parameters are sent as urlencoded UTF8. They are decoded by the second webserver back into unicode and saved to the db.
Occassionally i find a character that i am unable to convert to a utf8 string and send as a parameter (usually a SJIS character). When this occurs, the character is encoded as '3F' - a question mark.
What is the best way to send double byte characters as http parameters so they always are sent faithfully and not as question marks? Is my only option to use UTF16?
example code
<code>
public class UTF8Test {
public static void main(String args[]) {
encodeString("\u7740", "%E7%9D%80"); // encoded UTF8 string contains question mark (3F)
encodeString("\u65E5", "%E6%97%A5"); // this other japanese character converts fine
private static void encodeString(String unicode, String expectedResult) {
try {
String utf8 = new String(unicode.getBytes("UTF8"));
String utf16 = new String(unicode.getBytes("UTF16"));
String encoded = java.net.URLEncoder.encode(utf8);
String encoded2 = java.net.URLEncoder.encode(utf16);
System.out.println();
System.out.println("encoded string is:" + encoded);
System.out.println("expected encoding result was:" + expectedResult);
System.out.println();
System.out.println("encoded string16 is:" + encoded2);
System.out.println();
} catch (Exception e) {
e.printStackTrace();
</code>
Any help would be greatly appreciated. I have been struggling with this for quite some time and I can hear the deadline approaching all too quickly
Thanks
Matt

Hi Matt,
one last visit to the round trip issue:
in the Sun example, note that UTF8 encoding is used in the method that produces the byte array as well as in the method that creates the second string. This is equivalent to calling:
String roundTrip = new String(original.getBytes("UTF8"), "UTF8");//sun exampleWhereas, in your code you were calling:
String utf8 = new String(unicode.getBytes("UTF8"))//Matt's code
[/code attracted
The difference is crucial. When you call the string constructor without a second (encoding) argument, the default encoding (usually Cp1252) is used. Therefore your code is equivalent toString utf8 = new String(unicode.getBytes("UTF8"), "Cp1252")//Matt's code
i.e.you are encoding with one transformation format and decoding back with a different transformation format, so in general you won't get your original string back.
Regarding safely sending multi-byte characters across the Internet, I'm not completely sure what the situation is because I don't do it myself. (When our program is run as an applet, the only interaction it has with the web server is to download various files). I've seen lots of people on this forum describing problems sending multi-byte characters and I can't tell whether the problem is with the software or with the programming. Two possible methods come to mind (of course you need to find out what your third party software is doing):
1) use the DataOutput/InputStreams writeUTF/readUTF methods
2) use the InputStreamReader/OutputStreamWriter pair with UTF8 encoding
See this thread:
http://forum.java.sun.com/thread.jsp?forum=16&thread=168630
You should stick to UTF8. It is designed so that the bytes generated by encoding non-ASCII characters can be safely transmitted across the Internet. Bytes generated by UTF16 can be just about anything.
Here's what I suggest:
I am running a version of the Sun tutorial that has a program running on a server to which I can send a string and the program sends back the string reversed.
http://java.sun.com/docs/books/tutorial/networking/urls/readingWriting.html
I haven't tried sending multi-byte characters but I will do so and test whether there are any transmission problems. (Assuming that the Sun cgi program itself correctly handles characters).
More later,
regards,
Joe
P.S.
I thought one the reasons for the existence of UTF8 was to
represent things like multi-byte characters in an ascii format?Not exactly. UTF8 encodes ascii characters into single bytes with the same byte values as ASCII encoding. This means that a document consisting entirely of ASCII characters is the same whether it was encoded as UTF8 or ASCII and can consequently be read in any ASCII document reader (e.g.notepad).

How to display double byte characters with system.out.print?

Hi, I'm a newbie java programmer having trouble to utilize java locale with system io on dos console mode.
Platform is winxp, jvm1.5,
File structure is:
C:\myProg <-root
C:\myProg\test <-package
C:\myProg\test\Run.java
C:\myProg\test\MessageBundle.properties <- default properties
C:\myProg\test\MessageBundle_zh_HK.properties <- localed properties (written in notepad and save as Unicode, window notepad contains BOM)
inside MessageBundle.properties:
test = Hello
inside Message_zh_HK.properties:
test = 喂 //hello in big5 encoding
run.java:
package test;
import java.util.*;
public class Run{
public static void main(String[] args){
    Locale locale = new Locale("zh","HK");
    ResourceBundle resource =
        ResourceBundle.getbundle("test.MessageBundle", locale);
    System.out.println(resource.getString("test"));
}//main
}//classwhen run this program, it'll kept diplay "hello" instead of the encoded character...
then when i try run the native2ascii tool against MessageBundle_zh_HK.properties, it starts to display monster characters instead.
Trying to figure out what I did wrong and how to display double byte characters on console.
Thank you.
p.s: while googling, some said dos can only can display ASCII. To demonstrate the dos console is capable of displaying double byte characters, i wrote another helloWorld in chinese using notepad with C# and compile using "csc hello.cs", sure enough, console.write in c# allowed me to display the character I was expecting. Since dos console can print double byte characters, I must be missing something important in this java program.

after google a brunch, I learned that javac (hence java.exe) does not support BOM (byte order mark).
I had to use a diff editor to save my text file as unicode without BOM in order for native2ascii to convert into a ascii file.
Even the property file is in ascii format, I'm still having trouble to display those character in dos console. In fact, I just noticed I can use system.out.println to display double byte character if I embedded the character itself in java source file:
public class Run {
     public static void main(String[] args) throws UnsupportedEncodingException{
          String msg = "中文";    //double byte character
                try{
                System.out.println(new String(msg.getBytes("UTF-8")) + " new string"); //this displays fine
                catch(Exception e){}
                Locale locale = new Locale("zh", "HK");
          ResourceBundle resource = ResourceBundle.getBundle("test.MessagesBundle", locale);
                System.out.println(resource.getString("Hey"));      //this will display weird characterso it seems like to me that I must did something wrong in the process of creating properties file from unicode text file...

Encoded double byte characters string conversion

I have double byte encoded strings stored in a properties file. A sample of such string below (I think it is japanese):
\u30fc\u30af\u306e\u30a2
I am supposed to read it from file, convert it to actual string and use them on UI. I am not able to figure how to do the conversion -- the string contains text as is, char backslash, char u, and so on. How to convert it to correct text (either using ai::UnicodeString or otherwise)?
Thanks.

Where did this file come from? Some kind of Java or Ruby export? I don't think AI has anything in its SDK that would natively read that. You could just parse the string, looking for \u[4 characters]. I believe if you created a QChar and initialized it with the integer value of the four-character hex string, it would properly create the character.

Support for double-byte characters

Does RH6 have support for double-byte characters for
localization/translation to Asian languages? This feature was in
X3, removed from X5, but did it make it into 6?
Thanks,
Mike

Sorry but no.
Here's a link to what did go into RH6.
http://www.adobe.com/devnet/logged_in/mhu_rh_whatsnew.html

Using Double Byte Characters in URL For Session Variables

When I supply the value for a session variable in the URL for an IRPT page where the value contains double byte characters, Japanese in this case, the characters are corrupted by the time they are entered for the session variables. Does anyone know a solution to this problem or experience in this area? Currently using xMII 11.5 SR3.

Hi Bryan,
I would suspect that under the covers the session variable is of datatype string. For double byte characters, it would need to be wstring. There is a better explanation to be found at:
Link: [Kanji and Java Datatypes|http://www.unix.com.ua/orelly/java-ent/jenut/ch10_04.htm] or you can try google on Kanji Datatype OR Kanji Java Datatype
It could also be a problem with the operating system which I ran into about 10 years ago, but I would hope that Microsoft had moved beyond that by now.
Maybe some more technical folks could chime in to confirm or deny my explanation.
Mike
Edited by: Michael Appleby on Jul 8, 2008 5:23 PM

Invoke-WebRequest - Double byte characters issue in windows 8.1

I try write a powershell script to download a file from web server but failed. The path have double byte characters.
I could run in Windows server 2012 and 2012 R2 successfully, but fail in Windows 8 & 8.1
Do there any difference below Windows server and client powershell?
Region and setting are same in Windows 2012 & Windows 8
Script as below
Invoke-WebRequest -Uri " http://hostname/m/%E9%...../......./...../xxx.jpg"

Security settings are one possible cause of this.
Since we don't have your URL we cannot reproduce this.
It is "different". Using "difference" had me confused for qa bit. I though you were trying to figure out the difference between two things.
Use:
$wc=New-Object System.Net.WebClient
$ws.DownloadFile($url,'c:\file.jpg')
You will see less issues and it is faster.
¯\_(ツ)_/¯

Given filename or path contains Unicode or double-byte characters.Retry using ASCII characters for filename and path What does this mean? it happen when I publish an OAM

Given file name or path contains Unicode or double-byte characters. Retry using ASCII characters for filename and path
What does this mean? It is happening when I try to publish an OAM for Dreamweaver.
Also: How can I specify the browser in Edge Animate? It is just going wherever. Are there no Preferences for Edge Animate?
BTW. Just call it Edge. Seriously. Do you call it Illustrator Draw? Photoshop Retouching?

No, my file name is mainContent.oam
My project name is mainContent.an
This error happens when I try to import into Dreamweaver. Sorry, I wasn't clear on that earlier.
I thought maybe it was because I had saved my image as a png. So re-saved as a svg, still get the error.
DO I have a setting is Dreamweaver CC that is wrong? Should I try this in Dreamweaver CS6? I might try that next.
Why is this program so difficult? I know Flash. I know After Effects. I can work the timeline part just great. It's always in the export that I have problems.
On a MacPro, 10.7.
Are you an Adobe person or just a nice helper?

Regular Expressions and Double Byte Characters ?

Is it possible to use Java Regular Expressions to parse
a file that will contain double byte characters ?
For example, I want a regular expression to match the following line
tag="double byte stuff" id="double byte stuff"

The comments on the bytes/strings were helpful. Thanks.
But I'm still confused as to what matching pattern could be used.
For example a pattern like:
[A-Za-z]
I assume would not match any double byte characters.
I also assume the following won't work either:
[\\p{Alpah}]
because it is posix - US-ASCII only.
So how do you say "match the tag, then take any characters,
double byte, ascii, whatever, then match the text tag - per the
original example ?

Convert Asian characters to EBCDIC double byte

Similar Messages

Maybe you are looking for