Unicode beyond FFFF

How to display Unicode sets beyond FFFF, such as Gothic or Old Italic?
I tried the code below, but it just doesn't work, even though the font "Code2001" (downloaded from http://home.att.net/~jameskass/code2001.htm) works well in browsers.
My poor Java:
import java.awt.*;
import javax.swing.*;
class UnicodeFonts
 JFrame frame;
 JLabel label;
 JButton button;
 Font font;
 Container contentPane;
 public static void main(String[] args)
 new UnicodeFonts();
 UnicodeFonts()
 String sUnicode="";
 int codePoints;
 label=new JLabel("default label");
 button=new JButton("default button");
 int start=0;
 int tamil=0x0b80;
 int oldItalic=0x10300;
 int devanagari=2304;
 start=devanagari;
 for(int i=0;i<16;i++)
 for(int j=0;j<16;j++)
 codePoints= j + i*16 + start;
 sUnicode=sUnicode+(char)codePoints;
 font=new Font("Courier", Font.PLAIN, 16);
 label.setFont(font);
 label.setText(sUnicode);
 sUnicode="";
 start=oldItalic;
 for(int i=0;i<16;i++)
 for(int j=0;j<16;j++)
 codePoints= j + i*16 + start;
 sUnicode=sUnicode+(char)codePoints;
 //downloaded from http://home.att.net/~jameskass/code2001.htm
 font=new Font("Code2001", Font.PLAIN, 26);
 button.setFont(font);
 button.setText(sUnicode);
 frame=new JFrame("UnicodeFonts");
 frame.setSize(600,400);
 contentPane=frame.getContentPane();
 contentPane.setLayout(new BoxLayout(contentPane, BoxLayout.PAGE_AXIS));
 contentPane.add(label);
 contentPane.add(button);
 frame.setVisible(true);
}My lucky HTML:
<!DOCTYPE html PUBLIC '-//W3C//DTD XHTML 1.0 Transitional//EN' 'http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd'>
<html xml:lang='en' lang='en' xmlns='http://www.w3.org/1999/xhtml'><head>
<meta http-equiv='content-type' content='text/html;charset=x-user-defined'/>
 <title>Etruscan</title>
</head>
<body style='font-family:"Code2001","Etruscan Epigraphic", "Etruscan", "Etruscan mid/late Bold";'>
𐌀 𐌁 𐌂 𐌃 𐌄 𐌅 𐌆 𐌇 𐌈 𐌉 𐌊 𐌋 𐌌 𐌍 𐌎 𐌏 𐌐 𐌑 𐌒 𐌓
𐌔 𐌕 𐌖 𐌗 𐌘 𐌙 𐌚 𐌛 𐌜 𐌝 𐌞 𐌟 𐌠 𐌡 𐌢 𐌣 𐌤 𐌥 𐌦 𐌧
𐌨 𐌩 𐌪 𐌫 𐌬 𐌭 𐌮 𐌯
</body>
</html>

Thank you, uncle_alice!
I found the solution on my own already.
http://forum.java.sun.com/thread.jspa?forumID=57&threadID=592090
Explanations :
http://java.sun.com/developer/technicalArticles/Intl/Supplementary/
I had to upgrade my Java to jdk1.5 as well.
my regards.

Similar Messages

How do I get Unicode chars beyond the ASCII range to display ?

Hello all.
I have just recently started to learn Java.
I want to display the data in a array using Unicode characters, but when I use the unicode code from the code sheet I merely get a ?.
I looked about the net and understand that Java doesnt support none ASCII characters (by default?) , I think its possible to import various codes but not sure about that.
Anyways...
My question is: How do I get the Unicode character 2254 Box Drawing Element to display (also other non standard ASCII ) ? Print("\u2254") results in a ?.
If you are wondering why:
I want to store a map for a game into a data array with the entities represented as normal letters and characters, this map will be generated by the program randomly but I want to see the output of the map to test if its obeying the rules I set out for the map generation.
I figured just read the array and print out the result , but to make it more legible in debugging convert the text characters to box drawing characters.

Both methods you mentioned just generate a question mark instead of the box drawing element I want.
I can get all the normal, ie letters and numbers and the common characters... but beyond that and all I get is a '?' in its place.
Initially I just wanted it to print the character in the legend.
The code below just prints a few lines of text , the legend to decipher the level display, and calls a class to create a level ..then the last call is to call a debug class to display the level that was created.
*   Prelim Code for the Random Dungeon Generator       *
*   used in Dungeon Runner                             *
*   Created : 03/07/2009                               *
public class DungeonGenerator {
      public static void main (String[] args) {
      System.out.println(
           "Prelimary code for the Random Dungeon Generator for Dungeon Runner Game\n\n"
                               );                    // Just a text heading reminder for me
      System.out.println(
           "Test 1 - debug screen - 03 July 2009\n\n"
      System.out.println("Legend: \u2254 = Top Left Corner \t 2 = Top Right Corner");
      System.out.println("        3 = Bot Left Corner \t 4 = Bot Right Corner");
      System.out.println("        L = Left Wall \t\t R = Right Wall");
      System.out.println("        T = Top Wall \t\t B = Bottom Wall");
      System.out.println("        D = Doorway \t\t + = Play Space");
      System.out.println("        E = Exit Entity \t S = Start Entity");
      System.out.println("        . = None Play Space");
      System.out.println("\n----------------------------------------------------\n");
     // Call the LevelGen
     /** This is just a temperary call method
          I am will use another array and a
          loop to call the LevelGen the
          number of levels I decide the
          game will have
     LevelGen levelone = new LevelGen();          // Calls the LevelGen to create a Level
     levelone.displayLevel();                    // Display the level that was generated
                                             // for debugging only
      }     // End Main method
}      // End DungeonGenerator class

Unicode text in PDF

I have gone through some threads about japanese/chinease text in PDF.
my application creates PDF files. In my application text is stored in UNICODE form. currently the text (for Tj) is convered to char* which is locale dependant. I need to store text in PDF which will be locale independant.
I am using 'embedded' type 1 (TTF) font in above example. The result is, PDF created on Eng locale gives '?' for (each) *** text and PDF created on *** locale gives correct results (*How*).
From PDF Ref 3.8.1 Text Strings, I understand that, the text can be stored in UTF-16BE. I tried it but boxes (default char) apears in Adobe Reader 6.0... means each byte is treated seperatly. But as I said before for PDF created on *** locale (eventhough 1 character is of 2 byte) characters are read correctly.
This is a situation. Now I am confused about some aspects of speficications:
- Do I have to use 'type 0' font object (even if I am embedding simple TTF) to display text beyond 256 char code?
- why 2 bytes per character is read for text created on *** locale and not when I create text in UTF-16BE?
Thanks for your help,
Sameer

>From PDF Ref 3.8.1 Text Strings, I understand that, the text can be stored in UTF-16BE.
Many people have read this and made a big assumption that is not
valid.
Text strings are a particular type that is used in particular cases.
For example, they are used for bookmarks, which can indeed be
UTF-16BE. Nowhere does it say that this text string type works for
page contents.
>
>- Do I have to use 'type 0' font object (even if I am embedding simple TTF) to display text beyond 256 char code?
Absolutely. A /Type1 or /TrueType font is by definition a single byte
font, and the rules for Encoding are followed exactly as described.
There is no two byte escape.
If you wanted to use only 256 characters FROM a large font this is
possible; you could break your Japanese font down into multiple
embedded subsets, each of less than 256 characters.
>
>- why 2 bytes per character is read for text created on *** locale and not when I create text in UTF-16BE?
You mean this sometimes seems to work? Suggests a bug.
Aandi Inston

JACOB's Unicode Compatibility

I tried to insert or replace text with a Unicode character, say, "\u0111", into Word using JACOB (http://danadler.com/jacob ), but it would only show "?" for that; as a matter of fact, it displays "?" for all characters beyond the ANSI range \u0000-\u00FF. It seems that something is missing when JACOB communicates with COM. Word itself can handle Unicode characters without problem.
How can I make JACOB Unicode compatible? Is there a workaround this shortcoming? Thanks.
import com.jacob.activeX.ActiveXComponent;
import com.jacob.com.*;
import java.io.*;
public class Word
 private ActiveXComponent wordApp;
 private Object aDoc;
 private Object selectionObj;
 private Object findObj;
 public Word()
 wordApp = new ActiveXComponent("Word.Application");
 wordApp.setProperty("Visible", new Variant(true));
 public void open(String docName)
 Object documents = wordApp.getProperty("Documents").toDispatch();
 aDoc = Dispatch.call(documents, "Open", docName).toDispatch();
 selectionObj = wordApp.getProperty("Selection").toDispatch();
 findObj = Dispatch.call(selectionObj, "Find").toDispatch();
 public void replaceAll(String source, String target) throws java.lang.InterruptedException
 Variant True = new Variant(true);
 Variant False = new Variant(false);
 Dispatch.call(findObj, "ClearFormatting");
 Object replacementObj = Dispatch.call(findObj, "Replacement").toDispatch();
 Dispatch.call(replacementObj, "ClearFormatting");
 Variant FindText = new Variant(source);
 Variant ReplaceWith = new Variant(target);
 Variant Format = False;
 Variant MatchCase = True;
 Variant MatchWholeWord = False;
 Variant MatchWildcards = False;
 Variant MatchSoundsLike = False;
 Variant MatchAllWordForms = False;
 Variant Forward = True;
 final int WdFindWrap_wdFindContinue = 1;
 Variant Wrap = new Variant(WdFindWrap_wdFindContinue);
 final int WdReplace_wdReplaceAll = 2;
 Variant Replace = new Variant(WdReplace_wdReplaceAll);
// Find.Execute(ref FindText, ref MatchCase, ref MatchWholeWord, ref MatchWildcards,
// ref MatchSoundsLike, ref MatchAllWordForms, ref Forward, ref Wrap,
// ref Format, ref ReplaceWith, ref Replace);
 Dispatch.callN(findObj, "Execute", new Variant[] {
 FindText, MatchCase, MatchWholeWord, MatchWildcards,
 MatchSoundsLike, MatchAllWordForms, Forward, Wrap,
 Format, ReplaceWith, Replace} );
 Dispatch.call(selectionObj, "WholeStory");
 Object font = Dispatch.call(selectionObj, "Font").toDispatch();
 Dispatch.put(font, "Name", "Arial");
 public void close()
 File outdir = new File("c:\\temp_Unicode");
 if (!outdir.exists())
 outdir.mkdir();
 Dispatch.call(wordApp, "ChangeFileOpenDirectory", outdir.getAbsolutePath());
 Dispatch.call(aDoc, "SaveAs", "out.doc");
 Dispatch.call(aDoc, "Close", new Variant(false));
 wordApp.invoke("Quit", new Variant[] {});
 public static void main(String[] args) throws java.lang.InterruptedException
 Word test = new Word();
 test.open("c:\\temp\\test.doc");
 Thread.sleep(1500);
 test.replaceAll("a", "\u0111");
 Thread.sleep(1500);
 test.close();
}

hi can any one tell me how the change the size of the window,I use resize it works but for the first time when the window is in Maximum its gives me Window is Maximum error.but after manually when i resize it it works fine.
I am pasting my code Plz help me
import com.jacob.activeX.ActiveXComponent;
import com.jacob.com.*;
import java.io.*;
public class Word1 implements Serializable
 public ActiveXComponent wordApp;
 private Object aDoc;
 private Object selectionObj;
 private Object findObj;
 private Object closeObj;
 private Object font;
 private String docContect;
 public Word1()
 public ActiveXComponent open(String docName) throws java.lang.InterruptedException
 wordApp = new ActiveXComponent("Word.Application");
 wordApp.setProperty("Visible", new Variant(true));
 final int a = 0;
 Variant ReplaceWindow = new Variant(a);
 //Dispatch.call(wordApp,"WindowState",a);
 Dispatch.callN(wordApp,"WindowState",new Variant[] {ReplaceWindow});
 Variant va = Dispatch.call(wordApp,"WindowState");
 System.out.println(" VAlue :"+ va.toInt());
 //Dispatch.call(testdocuments,"wdWindowStateNormal");
 try {
 Dispatch.call(wordApp,"Resize",new Integer(450),new Integer(300));
 catch( com.jacob.com.ComFailException ce) {
 System.out.println(ce);
 //int a =1;
 //Object testdocuments = Dispatch.call(wordApp,a,"wdWindowStateNormal").toDispatch();
 //Dispatch.call(wordApp,"NormalTemplate");
 //System.out.println("testdocuments :" +testdocuments);
 //Dispatch.call(testdocuments,"wdWindowStateNormal");
 ////Object documents = wordApp.getProperty("Documents").toDispatch();
 ////aDoc = Dispatch.call(documents, "Open", docName).toDispatch();
 return wordApp;
 public String findString(String source) throws java.lang.InterruptedException
 // Initialize the starting position to zero
 int position = 0;
 int position1 = 0;
 int arrayCount = 0;
 String firstSource;
 String middleSource;
 String lastSource;
 String findPosition = "";
 firstSource = source + " ";
 middleSource = " " + source + " ";
 lastSource = " " + source;
 Object activeDoc = wordApp.getProperty("ActiveDocument").toDispatch();
 docContect = Dispatch.call(activeDoc, "Content").toString();
 selectionObj = wordApp.getProperty("Selection").toDispatch();
 findObj = Dispatch.call(selectionObj, "Find").toDispatch();
 Variant True = new Variant(true);
 Variant False = new Variant(false);
 Dispatch.call(findObj, "ClearFormatting");
 Object replacementObj = Dispatch.call(findObj, "Replacement").toDispatch();
 Dispatch.call(replacementObj, "ClearFormatting");
 String target = "to";
 Variant FindText = new Variant(source);
 Variant ReplaceWith = new Variant(target);
 Variant Format = False;
 Variant MatchCase = True;
 Variant MatchWholeWord = True;
 Variant MatchWildcards = False;
 Variant MatchSoundsLike = False;
 Variant MatchAllWordForms = False;
 Variant Forward = True;
 final int WdFindWrap_wdFindContinue = 1;
 Variant Wrap = new Variant(WdFindWrap_wdFindContinue);
 final int WdReplace_wdReplaceAll = 2;
 Variant Replace = new Variant(WdReplace_wdReplaceAll);
 Object content = Dispatch.call(activeDoc,"Content").toDispatch();
 Object char1 = Dispatch.call(content,"Characters").toDispatch();
 Variant vwordcount = Dispatch.call(char1,"Count");
 int wordcount = vwordcount.toInt();
 String[] strArray = docContect.split(" ");
 //System.out.println("docContect :" + docContect);
 for(int i=0;i<wordcount;i++) {
 position = docContect.indexOf(firstSource,position1);
 if(position == 0 || position == -1)
 position = docContect.indexOf(middleSource,position1);
 if(position == 0 || position == -1) {
 position = docContect.indexOf(lastSource,position1);
 if(position == 0 || position == -1) {
 position = docContect.indexOf(source,position1);
 if(position == 0 || position == -1) {
 //System.out.println("position :" + position);
 break;
 //findPosition = position+"#";
 position1 = position + 1;
 i = position;
 //System.out.println("String Location :" + position1);
 Object item = Dispatch.call(char1,"Item",new Integer(position1)).toDispatch();
 Dispatch.call(item,"Select");
 Thread.sleep(1000);
 return findPosition;
 public void gotoPosition(int num) {
 Object activeDoc = wordApp.getProperty("ActiveDocument").toDispatch();
Object content = Dispatch.call(activeDoc,"Content").toDispatch();
 Object char1 = Dispatch.call(content,"Characters").toDispatch();
Object item = Dispatch.call(char1,"Item",new Integer(num)).toDispatch();
 Dispatch.call(item,"Select");
 public void close() {
wordApp.setProperty("Visible", new Variant(false));

Could you pls give the details about the Unicode conversion during Upgrade

Hi,
Can anyone give details about the Unicode conversion during SAP Upgradation fro 4.6C to ECC6.
Waiting for quick response
Best Regards,
Padhy

Hi,
These are the few points i gathered during my upgradation project.
Before starting any upgradation project, it is necessary to take up the back-up of the existing systems. As we are going to upgrade the entire system, we will be changing so many things and if something happens, without back-up, we will be in a trouble.
So it is advised to keep a back-up of the existing system.
Say for example we have the existing system E4B which is of Version 4.6C. Now we want to upgrade it to Version 4.7. Let us see how we can do it.
Version upgrades not only means that we need to run the new Version CD over the existing Version System but only involves some other thing.
Version Upgrade involves the following Steps.
Say we want to upgrade for Version 4.7 from Version 4.6, which is in the System E4B. Now we created one more system called as E1B in which the upgradation for Version 4.7 can be done.
First copy the entire E4B system into the E1B System which is created for Version 4.7.
Apply the Version 4.7 CD provided by SAP over the E1B System.
Now check whether all the functionalities that was in E4B system works fine with E1B system also.
Thus the Version Upgrade involves two steps.
1. SAP Upgradation with the help of the CD
2. Manual Upgradation.
1. SAP Upgradation with the help of the CD
This is nothing but after taking the copy of the existing system into a new system, the upgradation CD from SAP is applied over the new system.
2. Manual Upgradation.
This Manual Upgradation involves
2.1 Upgradation of Standard Objects
2.1.1 SPAU Objects
2.1.2 SPDD Objects
2.2 Upgradation of Custom Objects.
Upgradation of Custom Objects can be placed under the following three categories.
Unicode Compliance
Retrofit
Upgrade
Please Find below some of the common Unicode Errors and their solutions
1. Error:
In case of Translate Error; Dangerous use of Translate in Multilingual system.
Correction:
To correct the Error occurring on TRANSLATE statement use this additional statement before the Translate statement.
SET LOCALE LANGUAGE sy-langu.
This statement defines the Text Environment of all the programs & internal sessions in the language specified in the LANGUAGE KEY, which in this case is sy-langu, i.e. the log on language of the user.
2. Error:
In case of Open Dataset Error; Encoding Addition must be included.
Correction:
This Error occurs only when the MODE is TEXT.
To correct the Error occurring on OPEN DATASET statement use this statement instead.
OPEN DATASET dataset_name FOR access IN TEXT MODE ENCODING DEFAULT.
Where: dataset_name NAME OF THE DATASET.
Access INPUT or OUTPUT or APPENDING or UPDATE.
DEFAULT - Corresponds to UTF-8 in UNICODE systems &
NON_UNICODE in NON-UNICODE systems.
3. Error:
In case of the usage of the Obsolete FM UPLOAD/DOWNLOAD or WS_UPLOAD/DOWNLOAD; Function module UPLOAD is flagged as obsolete.
Correction:
The FM GUI_DOWNLOAD/UPLOAD is used.
The variations to be made in the parameters of the FM:
1. Filename It must be of STRING type.
2. Filetype DAT is not used any longer, instead ASC is used.
3. Field Separator The default value SPACE is used, incase for a TAB separated file X can be used.
4. Error:
In case of CURRENCY/UNIT Addition Error; Use addition CURRENCY/UNIT when outputting.
Correction:
The CURRENCY addition specifies the currency-dependant decimal places for the output of the data objects of type i or p. To obtain the currency-key, the field CURRKEY of the table TCURX is used. The system determines the number of the decimal places from the field CURRDEC of the selected CURRKEY.
To correct this error follow the following method:
WRITE: /3 'TOTAL',' ', TOTAL.
WRITE: /3 TOTAL, , TOTAL CURRENCY 2. --- Where 2is the Currency Key for Getting 2 decimal places.
5. Error:
In case of TYPE X Error; Variable must be of C, N, D, T or STRING type.
Correction:
We need to change all the Type X (Hexadecimal) variables to Type C with their values unchanged.
So the method to be followed is:-
1. Load the definition of the class CL_ABAP_CONV_IN_CE or CL_ABAP_CHAR_UTILITIES.
2. Declare the variable as Type C, and use the method UCCP(XXXX) of the class CL_ABAP_CONV_IN_CE where XXXX represents the 8-bit Hexadecimal value and incase the variable holds a Hex value for a Horizontal Tab , then the Attribute HORIZONTAL_TAB of the class CL_ABAP_CHAR_UTILITIES can be used directly instead of using the method UCCP.
E.g.:
i) *DATA: TAB TYPE X VALUE 09, Tab character
CLASS: CL_ABAP_CHAR_UTILITIES DEFINITION LOAD.
DATA TAB TYPE C VALUE CL_ABAP_CHAR_UTILITIES=>HORIZONTAL_TAB.
ii) * DATA: CHAR TYPE X VALUE 160.
CLASS: CL_ABAP_CONV_IN_CE DEFINITION LOAD.
DATA CHAR TYPE C.
CHAR = CL_ABAP_CONV_IN_CE=>UCCP(00AO).
(Here 00A0 is the Hexadecimal equivalent of the decimal 160).
3. Incase the TYPE X Variable has a length more than 1, then an internal table must be created for the variable.
E.g.:
CLASS: CL_ABAP_CONV_IN_CE DEFINITION LOAD.
DATA : LF(2) TYPE X VALUE 'F5CD'.
DATA : BEGIN OF LF,
A1 TYPE C,
A2 TYPE C,
END OF LF.
LF-A1 = CL_ABAP_CONV_IN_CE=>UCCP('00F5').
LF-A2 = CL_ABAP_CONV_IN_CE=>UCCP('00CD').
6. Error:
In case of the Character -Error; The Character -cant appear in names in Unicode Programs.
Correction:
The Character -(Hyphen) appearing in Variable names is replaced by the character _ (Under Score) for Unicode/Upgrade Compliance.
E.g.:
*wk-belnr LIKE bkpf-belnr,
*wk-xblnr LIKE bkpf-xblnr,
*wk-date LIKE sy-datum,
*wk-wrbtr LIKE bseg-wrbtr,
*wk-name1 LIKE lfa1-name1,
*wk-voucher(8) TYPE c.
wk_belnr LIKE bkpf-belnr,
wk_xblnr LIKE bkpf-xblnr,
wk_date LIKE sy-datum,
wk_wrbtr LIKE bseg-wrbtr,
wk_name1 LIKE lfa1-name1,
wk_voucher(8) TYPE c.
7. Error:
In case of The SUBMIT-TO-SAP-SPOOL Error; you should not use the statement SUBMIT-TO-SAP-SPOOL without the WITHOUT SPOOL DYNPRO addition.
Correction:
1. Declare variables of type PRI_PARAMS, ARC_PARAMS, and a variable of TYPE C which would be used as a VALID FLAG.
2. Call the FM GET_PRINT_PARAMETERS:
CALL FUNCTION 'GET_PRINT_PARAMETERS'
EXPORTING
ARCHIVE_MODE = '3'
DESTINATION = P_DEST
IMMEDIATELY = 'X'
IMPORTING
OUT_ARCHIVE_PARAMETERS = archive_parameters
OUT_PARAMETERS = print_parameters
VALID = valid_flag
EXCEPTIONS
INVALID_PRINT_PARAMS = 2
OTHERS = 4
3. Use the SUBMIT-TO-SAP-SPOOL statement.
E.g.:
 submit zrppt500
 using selection-set 'AUTO3'
 with res_no eq lo_rsnum
 with sreserv in preserv
 to sap-spool destination p_dest
 immediately 'X'. "print immediate
DATA: print_parameters type pri_params,
archive_parameters type arc_params,
valid_flag(1) type c.
CALL FUNCTION 'GET_PRINT_PARAMETERS'
EXPORTING
ARCHIVE_MODE = '3'
DESTINATION = P_DEST
IMMEDIATELY = 'X'
IMPORTING
OUT_ARCHIVE_PARAMETERS = archive_parameters
OUT_PARAMETERS = print_parameters
VALID = valid_flag
EXCEPTIONS
INVALID_PRINT_PARAMS = 2
OTHERS = 4
Submit zrppt500
Using selection-set 'AUTO3'
With res_no eq lo_rsnum
with sreserv in preserv
to sap-spool
SPOOL PARAMETERS PRINT_PARAMETERS
ARCHIVE PARAMETERS ARCHIVE_PARAMETERS
WITHOUT SPOOL DYNPRO.
8. Error:
In case of Message Error; Number of WITH fields and number of Place Holders are not same .
Correction:
Split the statement after WITH into the same number as the place holder for that Message ID.
E.g.:
1. * MESSAGE E045.
MESSAGE E045 WITH '' ''.
2. in program ZIPI0801
 Start of change for ECC6
 message e398(00) with 'Could not find access sequence'
 'for condition type:'
 p_ptype.
message e398(00) with 'Could not find '
'access sequence'
'for condition type:'
p_ptype.
 End of change made for ECC6
9. Error:
In case of Move between 2 different Structures; The structures are not mutually convertible in a Unicode program.
Correction:
Make both the Data Types compatible and then assign the contents.
E.g.:
The statement move retainage_text to temp_text. Gives an error, where RETAINAGE_TEXT is an internal table and TEMP_TEXT is a string of length 200.
A Feasible solution for this is to specify from which position to which position of the string, the fields of RETAINAGE_TEXT should be assigned.
TEMP_TEXT+0(1) = RETAINAGE_TEXT-DQ1.
TEMP_TEXT+1(1) = RETAINAGE_TEXT-HEX.
TEMP_TEXT+2(20) = RETAINAGE_TEXT-FILLER1.
TEMP_TEXT+22(15) = RETAINAGE_TEXT-AMT_DUE.
TEMP_TEXT+37(8) = RETAINAGE_TEXT-TEXT.
TEMP_TEXT+45(10) = RETAINAGE_TEXT-DUE_DATE.
TEMP_TEXT+55(1) = RETAINAGE_TEXT-DQ2.
10. Error:
In case of no description found; add a GUI title.
Correction:
In this type of error gui title is generally missing so add a GUI title to the module pool.
11. Error:
In case of writing internal or transparent table
Correction:
Write individual fields.
E.g.:
WRITE: / EXT. --> EXT should be a character type field
WRITE: / EXT-ZZSTATE, EXT-LINE_NO, EXT-LINE_TXT, EXT-AMT, EXT-ZZSKUQTY.
12. Error:
In case of combination reference table/field S541-UMMENGE does not exist
Correction:
Was due to error in reference table S541. TABLE S541 has errors
1)Foreign key S541- ZZMARKET (ZZMARKET AND KATR2 point to different domains)
2)Foreign key S541-ZZACQUIGRP (ZZACQUIGRP AND KATR8 point to different domains)
Changed the domain of ZZMARKET (from ZMKCODE to ATTR2)
And that of ZMKCODE (from ZACCODE to ATTR8)
13. Error:
In case of KEY does not exist
Correction:
The reference table for field KBETR was KNOV earlier changed it to RV61A as KNOV was in turn referring to RV61A.
14. Error:
Incase of WRITE statement, Literals that take more than one line is not permitted in Unicode systems.
Correction: To correct this error, we need to align the spaces accordingly so that the statement doesnt go beyond the line.
15. Error:
Incase of Data statement, The data type ZWFHTML can be enhanced in any way. After a structure enhancement, this assignment or parameter might be syntactically incorrect..
Correction: To correct this error, instead of like in the Data statement, use type.
16. Error:
Incase of DESCRIBE statement, DESCRIBE can be used only with IN BYTE... Or IN CHARACTER mode in Unicode systems.
Correction: To correct this error, use additional text, IN BYTE MODE / IN CHARACTER MODE along with this statement.
CHARACTER MODE is added when the data object is of flat/ character type.
BYTE MODE is added when the data object is a deep structure.
Syntax: DESCRIBE FIELD data_obj : LENGTH blen IN BYTE MODE,
LENGTH clen IN CHARACTER MODE.
Where blen and clen must be of type I.
17. Error:
Incase of DO-LOOP Error, In Do loop range addition needed
Correction:
An internal tables is declared and the two fields (VARYING field and NEXT field) were
Included inside the internal table
E.g.: In program SAPMZP02
DO 11 TIMES
 VARYING STATION_STATE FROM STATION1 NEXT STATION2. ECC6
CASE SYST-INDEX.
WHEN 1
STATION_STATE = STATION1.
WHEN 2
STATION_STATE = STATION2.
WHEN 3
STATION_STATE = STATION3.
WHEN 4
STATION_STATE = STATION4.
WHEN 5
STATION_STATE = STATION5.
WHEN 6
STATION_STATE = STATION6.
WHEN 7
STATION_STATE = STATION7.
WHEN 8
STATION_STATE = STATION8.
WHEN 9
STATION_STATE = STATION9.
WHEN 10
STATION_STATE = STATION10.
WHEN 11
STATION_STATE = STATION11.
18. Error:
Incase of the parameter QUEUE-ID Error, QUEUE-ID is neither a parameter nor a select option in program rsbdcbtc.
Correction:
The parameter in program rsbdcbtc is QUEUE_ID and so is changed in this program
E.g.: In program Z_CARRIER_EDI_INTERFACE
 submit rsbdcbtc with queue-id = apqi-qid and return. "ECC6
 The parameter name changed by replacing '-' with '_' as in program rsbdcbtc "ECC6
Submit rsbdcbtc with queue_id = apqi-qid and return. "ECC6
19. Error:
Incase of EPC Error, Field symbol <TOT_FLD> is not assigned to a field .
Correction:
This error couldn't be rectified as the error occurs in a Standard SAP include- LSVIMF29.
The OS Note - 1036943 needs to be applied.
Error:
OPEN DATASET P_FILE FOR OUTPUT IN TEXT MODE.
Correct:
OPEN DATASET P_FILE FOR OUTPUT IN TEXT MODE ENCODING DEFAULT.
Error:
Constants : c_tab type x value '09' .
Correct:
Constants : c_tab type abap_char1 value cl_abap_char_utilities=>horizontal_tab .
Error:
Data : begin of output_options occurs 0 . Include structure ssfcompop.
Data : end of output_options .
Correct:
Data : output_options type standard table of ssfcompop with header line .
Error:
PARAMETERS : NAST TYPE NAST .
Correct:
PARAMETERS : NAST TYPE NAST NO-DISPLAY .
Replace WS_DOWNLOAD and WS_UPLOAD by
GUI_UPLOAD and GUI_DOWNLOAD and check the import and export parameter types , do the changes accordingly. Because FILENAME paramater type is different because of this it will give dump.
For issue during Issue using SO_NEW_DOCUMENT_ATT_SEND_API1 Function module, the solution is After this FM we should put COMMIT WORK.
Issue:
Moving data from one structure to another structure if those two structures are not compatible
Solution:
we should use move-corresponding or field by filed we need to move it.
If database structures are different in 4.6c and ECC6.0,
Then we should go with append structure concept.
While testing the report if it gives dump at Select query level or any database or view level,then just goto that table or view and goto the data base utility(se14) adjust the database. But make sure that selected radio button in se14 transaction should be activate and adjust database
Also Check this link.
http://help.sap.com/saphelp_nw04/helpdata/en/62/3f2cadb35311d5993800508b6b8b11/frameset.htm
Reward points if helpful.
Regards,
Ramya

To TYPE Unicode characters

Hi!
I use FrameMaker 8 on Windows XP.
I have scanned a two-volume Greek book and ran it through an OCR program. Not all of the transcription is correct, so I have to fix it. Some of the text is quoted from older books, so there's a lot of interesting combinations of diacritical marks, creating special characters. Most of them I have found in the font I use (Alkaios) and have no big trouble typing them with the corresponding key combinations.
Running charmap from the Run... field under Start I can see that also the rest of the characters are present in the Alkaios font. But I don't seem to find the combination of keys in FrameMaker to produce them (and probably those particular combinations are not implemented). And I can't use Alt+number, since that only works with ASCII characters, and the ones I'm after are far beyond those.
Searching the web, I found that what I'm supposed to type is U+number. The problem is that in any editor, typing a 'U' will present me with a 'U'. (Very logical and practical, since you sometimes also want to be able to type a 'U'!) Searching some more, I found that in for example OpenOffice the 'U' in the typing sequence should be translated as Ctrl+Shift (haven't tried it there, though), and I also found another editor in which it works that way. And in Word 2007 I actually CAN use Alt+number (the decimal numbers 0912 and 0944, in this case).
But in FM I can't use Alt+number, because I get a question mark or a ring, or something else. And I can't use Ctrl+Shift+number, because when I hit the zero key, FM tries to copy whatever is highlighted in the document (which i nothing, and hence FM protests).
As you've probably guessed by now, I would like to know what key combination (or other trick) I have to use to be able to somehow type or enter the special characters into my document? Or am I stuck with copy-and-paste from the charmap?
Regards, Mikael Persson!

I think using the Windows calculator is a perfectly acceptable way to convert between hex and decimal :)
Here's another issue:
there are two stages to getting a unicode character to display: you have to specify the correct character, and then the font involved has to actually have a glyph for that character in it.
I expect, to put it rather anthropomorphically, it goes something like:
1. you type/paste the character
2. Application asks the font if it has a glyph for that character
3. If not, application asks operating system if it has a font that has that character
4. Once one is found, it gets displayed.
My hunch from what I've seen is that applications like Word, and most web-broswers these days, will liase with the Windows operating system until a font is found that has the necessary glyphs for the required characters.
However, some applications give up at step two: if, for example, your doc in some Adobe application is using Helvetica, and you ask for a very obscure unicode character, and Helvetica doesn't have a glyph for it, you may just get a blank square or "Missing character" symbol.
This used to also be the case with the Firefox browser - if I was writing a webpage and put in an obscure character, which wasn't present in default fonts like "Times New Roman" or "Verdana", I'd get a missing character symbol instead, even though my PC had other fonts which *did* have glyphs for that. I'd need to explicitly put "font="Arial Unicode" or sthg like that into the webpage to make it talk to windows to retrieve a glyph.
However, over the past couple of years, Firefox has stopped this and will just go get whatever glyphs it needs.
Perhaps programs like Frame and InDesign actually see this kind of behaviour as a virtue?
(All the above is totally speculative based on my own experience, mind you...)

SIK Transport files and None unicode SAP system

Dear all,
I have a question about SIK Transport files.
As you know, when we install BOE SIK,we need transport some files into SAP system.
There is a TXT file for describing how to use SIK transport files in SAP system.
I found that there is no detail about none unicode SAP system in this TXT file.
All of them is about unicode.
If your SAP system is running on a BASIS system earlier than 6.20, you must use the files listed below:
(These files are ANSI.)
Open SQL Connectivity transport (K900084.r22 and R900084.r22)
Info Set Connectivity transport (K900085.r22 and R900085.r22)
Row-level Security Definition transport (K900086.r22 and R900086.r22)
Cluster Definition transport (K900093.r22 and R900093.r22)
Authentication Helpers transport (K900088.r22 and R900088.r22)
If your SAP system is running on a 6.20 BASIS system or later, you must use the files listed below:
(These files are Unicode enabled.)
Open SQL Connectivity transport (K900574.r21 and R900574.r21)
Info Set Connectivity transport (K900575.r21 and R900575.r21)
Row-level Security Definition transport (K900576.r21 and R900576.r21)
Cluster Definition transport (K900585.r21 and R900585.r21)
Authentication Helpers transport (K900578.r21 and R900578.r21)
The following files must be used on an SAP BW system:
(These files are Unicode enabled.)
Content Administration transport (K900579.r21 and R900579.r21)
Personalization transport (K900580.r21 and R900580.r21)
MDX Query Connectivity transport (K900581.r21 and R900581.r21)
ODS Connectivity transport (K900582.r21 and R900582.r21)
If our SAP BASIS system is beyond 6.20,but iit is not unicode system.
Could we use these transport files to none unicode SAP system ?
Thanks!
Wayne

Hi Wayne,
the text and the installation guide is clearly advising based on the version of your underlying BASIS system and differentiates 620 or 640 and higher.
so based on the fact that you system is a BI 7 system you are in the category of a 640 (or higher) basis system and therefore you have to use the Unicode ENABLED transports.
ingo

� symbol displaying with a Unicode U+00C2 character in front of it.

Using the same java application code and the same j2sdk_1.4.1_02fcs java package I get a display difference between Redhat 7.3 and Redhat EL4 AS.
nf = NumberFormat.getCurrencyInstance(new Locale(lan, con));
String res = nf.format(money);
This results in a single strange character preceeding the monetary symbol for UK pound symbol.
Instead of just displaying a � character in front of monetary values I am getting a Unicode U+00C2 character in from of is as shown :
Good = �1.23
Bad = ��1.23
I used the following simple test program to show this :
>>
import java.text.NumberFormat;
import java.util.Locale;
public class Test555
public static void main (String[] args)
NumberFormat nf = NumberFormat.getCurrencyInstance(new Locale("en", "GB"));
System.out.println(nf.format(1.23));
<<
I compiled this and ran the class on both machines...
on a Redhat 7.3 machine: �1.23
on a Redhat EL4 AS machine : ��1.23
The /bin/unicode_start program only works on the console in a VT or xwindows with a TERM type of xterm, but allows the console to properly display the characters.

The upgrade to Red Hat 8.0 and beyond changed the default character
encoding from ISO-8859-15 to UTF-8. The UTF-8 translation scheme will translate the Unicode representation of the Pound Sterling to a 16-bit UTF-8 representation by prepending an 0xC2 to the 0xA3 (the Pound). It is this 0xC2 that we see represented as the capital A circumflex or the "T" symbol we noticed earlier.
Is there a way to remove the prepending 0xC2 that was added by the 16bit UTF-8 representation ?

How to map from cid to unicode

Hello.
I'm now trying to convert cid to unicode by using the toUnicode cmap.
The toUnicode cmap I extracted is as follows:
/CIDInit /ProcSet findresource begin
12 dict begin
begincmap
/CIDSystemInfo
<< /Registry (Adobe)
/Ordering (UCS) /Supplement 0 >> def
/CMapName /Adobe-Identity-UCS def
/CMapType 2 def
1 begincodespacerange
<0000> <FFFF>
endcodespacerange
35 beginbfchar
<0F3B> <7528>
<0CA1> <8AAD>
<0F62> <5229>
<034B> <3042>
<034D> <3044>
<0358> <304F>
<027B> <3002>
<027C> <FF0C>
<035D> <3054>
<035E> <3055>
<0360> <3057>
<0369> <3060>
<0370> <3067>
<0372> <3069>
<0373> <306A>
<0294> <30FC>
<0374> <306B>
<0378> <306F>
<0388> <307F>
<0394> <308B>
<03A5> <30A9>
<02CE> <FF0A>
<03AF> <30B3>
<03B5> <30B9>
<03B9> <30BD>
<03BB> <30BF>
<03BF> <30C3>
<03C4> <30C8>
<03CD> <30D1>
<03D1> <30D5>
<03D2> <30D6>
<03DA> <30DE>
<03E8> <30EC>
<03EF> <30F3>
<08BC> <8A66>
endbfchar
endcmap CMapName currentdict /CMap defineresource pop end end
I think that the mapping process needs "beginbfrange" and "endbfrange."
But, the above cmap does not include them.
There should be a way to map from cid to unicode, because the Preview(Mac application) can search the same text.
Please let me know my lack of understanding on toUnicode cmap.

You might want to lookup ToUnicode maps in the standard. ISO 32000-1:2008 says in section 9.10.3 "ToUnicode CMaps":
The CMap defined in the ToUnicode entry of the font dictionary shall follow the syntax for CMaps introduced in 9.7.5, "CMaps" and fully documented in Adobe Technical Note #5014, Adobe CMap and CIDFont Files Specification. Additional guidance regarding the CMap defined in this entry is provided in Adobe Technical Note #5411, ToUnicode Mapping File Tutorial. This CMap differs from an ordinary one in these ways:
The only pertinent entry in the CMap stream dictionary (see Table 120) is UseCMap, which may be used if the CMap is based on another ToUnicode CMap.
The CMap file shall contain begincodespacerange and endcodespacerange operators that are consistent with the encoding that the font uses. In particular, for a simple font, the codespace shall be one byte long.
It shall use the beginbfchar, endbfchar, beginbfrange, and endbfrange operators to define the mapping from character codes to Unicode character sequences expressed in UTF-16BE encoding.

Unicode in Labels, Buttons,...

Hi, I want to use unicode in my applet. I know that I can use unicode characters
by drawString() method of Graphics class but I want to use unicode characters
in Label, Button,... components.
I tried to use bellow command to show \ufe78 th character of unicode but it doesn't work!
label1.setText(" "+'\ufe78');
Is there any way to do that?
thanks in advance!
naser

I thinc you just need to edit (or delete) section "Exclusion Range info" of "font.properties" file in "jre\lib". 
by befault, the Exclusion Range is: 

exclusion.dialog.0=0100-0400,0460-ffff 
exclusion.dialoginput.0=0100-0400,0460-ffff 
exclusion.serif.0=0100-0400,0460-ffff 
exclusion.sansserif.0=0100-0400,0460-ffff 
exclusion.monospaced.0=0100-0400,0460-ffff 

So, character with code "fe78" will newer displayed in AWT or SWING component.

Can i seperate ABAP conversion from the Unicode Migration Process ?

Hi all,
I don't have detailed technical information about ABAP programs and i need to deliver a project (unicode migration project) which needs some time arrangements. Because of that, we're trying to find out other ways to decrease the total time.
As SAP guide says; before the migration, we must convert the ABAP programs. When we check them, only ABAP programs that our developers wrote, need to be converted. So the question is; is it possible to seperate these steps ? I mean, go on the unicode migration without converting ABAP programs and meanwhile, change ABAP programs in another SAP system (similar to the live one). With this process, save some time and then import the converted ABAP programs to the migrated unicode system. Is it possible ?
Any help will be appreciated, thanks in advance.

I currently am using Postbox during the trial period because the frustration of trying to solve this ridiculous problem I posted about above was too much. Unlike most people, I won't rant and rave about OS X's decline after Lion and Mountain Lion as I never had major issues after upgrading to either of them. It's just beyond belief that the magnitude of this last problem being able to break an app within something that is just supposed to "work" that is beyond me.
Anyway, I was wondering if anyone who uses Postbox might be able to offer a suggestion or two about how to import mailboxes from a Time Machine backup as I would still like to have access to those folders which were "On My Mac" with all of the archived messages which were contained within each of them? I know that you can import from Apple Mail, but that's only if it is currently set up they way you want it to be. And as I no longer have those settings because I can't restore to the point before I deleted all the accounts, etc. from Mail in my frustration -- I'm sure I can't just point Postbox to some plist file to have it mimic my old Mail setup.
Anyone?

Unicode - again! 不 is showing up as 上, but only if read from a file!

Hi, LabVIEWers:
If you've been following some of my posts here (I have no idea why you would, but just in case), you know that a project I am working on is a translation of things from English to unicode. It's been working fabulous so far, except for one thing.
It seems that 不 is consistently showing up as 上 in any text box, caption, etc, etc, but only if I read it from a text file! It doesn't matter what position it's in, and I tried saving it in notepad (unicode option), notepad ++ (encode in UCS-2 little-endian) and Word (other encoding - Unicode) but it doesn't make a difference. And when I copy the wrong character and paste it, it really IS the wrong character, like it magically got transformed.
If I paste the right character into a any kind of text box, it works fine. Furthermore, I can wire that text box to anything that accepts a string and it shows up correctly there, too.
Here's ther weird thing:
As far as I can tell, that's the only character I am having trouble with!!!
Anyone have any ideas?
Thanks!
Oops, LV 2009 SP1.
Bill
(Mid-Level minion.)
My support system ensures that I don't look totally incompetent.
Proud to say that I've progressed beyond knowing just enough to be dangerous. I now know enough to know that I have no clue about anything at all.
Solved!
Go to Solution.

Dec 10 is linefeed and dec 13 is carriage return. Are you sure you disabled Convert EOL on the read file function? I'm not sure how you read in the Unicode strings but if you use Read Text File there is a right popup option to have LabVIEW do some automatic translation. Does Read File deal with Unicode or are you doing something yourself there? Either it does not and then of course does ASCI replacement or if it does the code for Convert EOL does not correctly treat Unicode characters.
Rolf Kalbermatter
CIT Engineering Netherlands
a division of Test & Measurement Solutions

One more Unicode question

So here's the story so far:
By enabling Unicode through the LabVIEW.ini (UseUnicode=true), I've been able to develop quite a nice translation package. Furthermore, when I build an executable and run it, it works also. But when I deploy it on another computer - either by running it from the release folder or actually installing it with an installer - it displays garbage. Not even pseudo-Chinese garbage - just squares that I think means unprintable characters.
Anyone know how to work around this issue? I'll be actively trying stuff and letting you know my progress.
Oh, dev machine is Win7 and target is WinXP.
Bill
(Mid-Level minion.)
My support system ensures that I don't look totally incompetent.
Proud to say that I've progressed beyond knowing just enough to be dangerous. I now know enough to know that I have no clue about anything at all.
Solved!
Go to Solution.

Drew_H wrote:
Do you have the proper languages enabled on the XP machine? I *think* they're disabled by default, but I haven't touched XP in quite awhile.
Control Panel >> Regional Options and Language Settings >> Languages
You know, I was thinking the same thing when I got in this morning. Only problem is - I can't find anyone who has an install disc!!! Thanks for confirming my suspicions.
Bill
(Mid-Level minion.)
My support system ensures that I don't look totally incompetent.
Proud to say that I've progressed beyond knowing just enough to be dangerous. I now know enough to know that I have no clue about anything at all.

Unicode characters 0xffff

Hi,
I note that the latest version of the unicode specification (3.1, see http://www.unicode.org/unicode/reports/tr27/ ) includes characters which are encoded beyond the original 16 bit codespace. Can anyone point me to a discussion of what Sun plans to do about supporting these additional planes?
Thanks,
Joe

Hi everyone - I remember discussing this with Brian a while ago, so here's a quote from one of his replies:
We support surrogates to the level they are defined in the version of Unicode we support. So Java 1.3 supports (only) Unicode version 2.1 which does not encode any characters using surrogate pairs. Likewise, the upcoming Java 1.4 will support Unicode 3.0 which also does not define any specific characters using surrogate pairs. What this means is that supporting surrogates so far has been fairly trivial. Our UTF8 converter can handle them and our other converters can detect when malformed surrogate pairs are present in the stream. In Java 1.4 our display code will be able to display them if you are using a font that can handle surrogates for, say, the private use area. But there really isn't much else interesting to do until Unicode 3.1 comes out. And Java will not support Unicode 3.1 until after version 1.4.
You can find the whole topic here (the discussion about this sort of thing starts about half-way down):
http://forums.java.sun.com/thread.jsp?forum=16&thread=25468
Why on earth they did away with html formatting on these forums, I will never know.
Alistair

Japanese characters, outputstreamwriter, unicode to utf-8

Hello,
I have a problem with OutputStreamWriter's encoding of japanese characters into utf-8...if you have any ideas please let me know! This is what is going on:
static public String convert2UTF8(String iso2022Str) {
   String utf8Str = "";
   try {
      //convert string to byte array stream
      ByteArrayInputStream is = new     ByteArrayInputStream(iso2022Str.getBytes());
      ByteArrayOutputStream os = new ByteArrayOutputStream();
      //decode iso2022Str byte stream with iso-2022-jp
      InputStreamReader in = new InputStreamReader(is, "ISO2022JP");
      //reencode to utf-8
      OutputStreamWriter out = new OutputStreamWriter(os, "UTF-8");
      //get each character c from the input stream (will be in unicode) and write to output stream
      int c;
      while((c=in.read())!=-1) out.write(c);
      out.flush();
     //get the utf-8 encoded output byte stream as string
     utf8Str = os.toString();
      is.close();
      os.close();
      in.close();
      out.close();
   } catch (UnsupportedEncodingException e1) {
      return    e1.toString();
   } catch (IOException e2) {
      return e2.toString();
   return utf8Str;
}I am passing a string received from a database query to this function and the string it returns is saved in an xml file. Opening the xml file in my browser, some Japanese characters are converted but some, particularly hiragana characters come up as ???. For example:
屋台骨田家は時間目離れ拠り所那覇市矢田亜希子ナタハアサカラマ楢葉さマヤア
shows up as this:
屋�?�骨田家�?�時間目離れ拠り所那覇市矢田亜希�?ナタ�?アサカラマ楢葉�?�マヤア
(sorry that's absolute nonsense in Japanese but it was just an example)
To note:
- i am specifying the utf-8 encoding in my xml header
- my OS, browser, etc... everything is set to support japanese characters (to the best of my knowledge)
Also, I ran a test with a string, looking at its characters' hex values at several points and comparing them with iso-2022-jp, unicode, and utf-8 mapping tables. Basically:
- if I don't use this function at all...write the original iso-2022-jp string to an xml file...it IS iso-2022-jp
- I also looked at the hex values of "c" being read from the InputStreamReader here:
while((c=in.read())!=-1) out.write(c);and have verified (using character value mapping table) that in a problem string, all characters are still being properly converted from iso-2022-jp to unicode
- I checked another table (http://www.utf8-chartable.de/) for the unicode values received and all of them have valid mappings to a utf-8 value
So it appears that when characters are written to the OutputStreamWriter, not all characters can be mapped from Unicode to utf-8 even though their Unicode values are correct and there should be utf-8 equivalents. Instead they are converted to (hex value) EF BF BD 3F EF BF BD which from my understanding is utf-8 for "I don't know what to do with this one".
The characters that are not working - most hiragana (thought not all) and a few kanji characters. I have yet to find a pattern/relationship between the characters that cannot be converted.
If I am missing some....or someone has a clue....oh...and I am developing in Eclipse but really don't have a clue about it beyond setting up a project, editing it and hitting build/run. It is possible that I may have missed some needed configuration??
Thank you!!

It's worse than that, Rene; the OP is trying to create a UTF-8 encoded string from a (supposedly) iso-2022 encoded string. The whole method would be just an expensive no-op if it weren't for this line: utf8Str = os.toString(); That converts the (apparently valid) UTF-8 encoded byte array to a string, using the system default encoding (which seems to be iso-2022-jp, BTW). Result: garbage.
@meggomyeggo, many people make this kind of mistake when they first start dealing with encodings and charset conversions. Until you gain a good understanding of these matters, a few rules of thumb will help steer you away from frustrating dead ends.
* Never do charset conversions within your application. Only do them when you're communicating with an external entity like a filesystem, a socket, etc. (i.e., when you create your InputStreamReaders and OutputStreamWriters).
* Forget that the String/byte[] conversion methods (new String(byte[]), getBytes(), etc.) exist. The same advice applies to the ByteArray[Input/Output]Stream classes.
* You don't need to know how Java strings are encoded. All you need to know is that they always use the same encoding, so phrases like "iso-2022-jp string" or "UTF-8 string" (or even "UTF-16 string") are meaningless and misleading. Streams and byte arrays have encodings, strings do not.
You will of course run into situations where one or more of these rules don't apply. Hopefully, by then you'll understand why they don't apply.

Unicode beyond FFFF

Similar Messages

Maybe you are looking for