Urgent: comparing multi-byte characters to a single byte character!

Let's say I have two strings, they have the same contents but use different encoding, how do I compare them?
String a = "GOLD";
String b = "G O L D ";
The method a.equals(b) doesn't seem to work.

try this:
String a = "GOLD";
String b = "G O L D ";
boolean bEqual = true;
int iLength = a.length();
int j = 0;
for (int i = 0; i < iLength; i++) {
   while(b.substring(j,1).equals(" "))
      j++;
   if(!a.substring(i, 1).equals(b.substring(j,1)) {
      bEqual = false;
      break;
}

Similar Messages

How do I convert a double-byte encoded file to single-byte ASCII?

Hello,
I am working with XML files (apparently coded in UTF-8) which encoded in double-byte characters.
The problem is the characters for end of line: 00 0D 00 0A
This double byte end of line is causing a problem with a legacy conversion tool (which deals with 0D 0A). The file itself contains no
accented/international characters, so in principle converting to single-byte should not cause any problems.
I have tried to convert this file with tools like native2ascii and the conversion tools that are part of Notepad++ but without
any luck - the "00 0D 00 0A" are still present in the output
Can anyone point me to a tool or some code that can convet this file into single-byte?
Thank you.

Amiens wrote:
native2ascii.exe -encoding UTF-16 -reverse INPUT.xml OUTPUT.xml
gives 00 00 0 0D 00 00 00 0A
so clearly that is not the required output.What you've got there is UTF-16 encoded text that's been converted to UTF-16. Get rid of the "-reverse" option and you should see the result you expect.

Oracle Multi-Bytes vs Single-Byte

Hi,
We have to add japanese to our application, i had succesfully add japanese data in our single-byte database,
so why should we use a Multi-byte DB?
what is the gain to use a Multi byte DB vs a Single Byte?
does intermedia work with japanese in Single Bytes?
Is utf8 the best way to have an international DB?
We will have to add a lot of other char-set in the future.
Thanks

so why should we use a Multi-byte DB?
what is the gain to use a Multi byte DB vs a Single Byte? What you are doing is storing invalid multibyte characters into a single byte database. So each double byte Japanese characters are being treated as 2 separate single byte characters. You are using an unsupported but common garbage in garbage out approach, so in that sense you are using Oracle as a garbage container. :)
Let's look at some of the issues that you are going to have :-
All SQL Functions are based on the property of the single byte database character set WE8ISO8859P1. So LENGTH(), SUBSTR (), INSTR (), UPPER(), NLS_UPPER etc .. will yield incorrect results . For example a column with one Japanese character and one ASCII character will return a length of 3 characters rather than 2 characters. And if you want to locate a specific character in a mix ASCII and Japanese string using the SUBSTR() it will be very difficult, because to Oracle the string consists of all single byte characters, it will not skip 2 bytes for a Japanese character. Even if you don't have mix strings, you will need to write one routine for handling ASCII only and another for Japanese strings.
Invalid Data conversion , if your need to talk to another db using dblink say ,all the character conversion will be based on the single byte character set to the target database character set mapping, so the receiver will lose all the source Japanses characters and will get 2 single byte characters for each Japanese char instead .
Export and Import will have identical problems, character set conversion are performed during these operations, so all Japanese characters will be lost. This also means that you can not load correctly encoded Japanese data into your current single byte DB using IMPORT or SQLLOADER without data corruption ...
does intermedia work with japanese in Single Bytes?No
Is utf8 the best way to have an international DB?Yes
null

Handling Double Byte characters

Hi All,
We have a requirement to read Japanese characters from a fixed length file.
The file contains both Single and Double byte characters in a single record.
The file needs to be read by an ABAP program in terms of bytes and it has to be filled into different fields of a structure.
If you have encountered a similar situation , please let me know how it was handled.
Thanks,

Hi,
It isn't possible to say which character use only ony and which two bytes. Are you sure that inside your file these one and two bytes characters are mixed?
Best regards
Marcin Cholewczuk

Faster way to migrate from Single byte to Multi byte

Hello,
We are in the process of migrating from a 9i Single byte db to a 10g Multi byte db. The size of our DB is roughly 125 GB. We have fixed everything in the source database (9i) in terms of seamlessly migrating from a single byte to a multi byte db. The only issue is the migration window - curently we are doing an export/import since there is a character set migration involved and it's taking about 20+ hrs to do the import in 10g. The management wants to cut this down to less than 10 hours, if that's possible. I know the duration it takes to import depends on many factors like the system/OS configuration, SAN, etc but I wanted to know what , in theory, is considered the fastest method of migrating a database from single byte to multi byte.
Have anybody here gone through this before?
Thanks,
Shaji

If the percentage of user tables containing some convertible data (I am assuming you will not have any truncation or lossy data) is low, you can export only those tables, truncate them, and rescan the database. This should report no convertible data, except some CLOBs in Data Dictionary. Such database can be migrated to AL32UTF8 using csalter.plb. After the migration, you import only the previously exported subset of tables.
Note, for this process to work, no convertible VARCHAR2, nor CHAR, nor LONG data can be present in the Data Dictionary.
The process should be refined by dropping and recreating indexes on the exported tables as recreating an index is faster then updating it during import. You should also disable triggers so that they do not interfere with the migration (for example, they should not update any "last_updated" timestamp columns).
If the number and size of affected tables is low compared to the overall size of the database, the time saved may be significant.
There may also be tables that require even more sophisticated approach. Let's say you have a multi-gigabyte table that stores pictures or documents in a BLOB column. The table also has a single text column that keeps some non-ASCII descriptions of the stored entities. Exporting/truncating/importing such table may be still very expensive. A possible optimization is to offload the description column to an auxiliary table (together with ROWIDs), update the original column to NULL, export the auxiliary table, drop it, rescan the database, migrate with csalter.plb, re-import the auxiliary table, and restore the original column. If pictures alone occupy, for example, 30% of the whole database, such approach should yield significant time saving.
-- Sergiusz

Multi-byte characters are garbled in SQL Server Business Intelligent Development Studio (Visual Studio) 2008

Hi,
I'm revising an existing report which was developed by my predecessor. Though it works fine in the production environment, when I open the .rdl file with SQL Server Business Intelligent Studio (Visual Studio) 2008 on my client
PC, I find all the multi-byte characters are garbled. When I open it with the BIDS (the same version) on the server, it shows everything correctly.
The fonts for the controls (labels) are Tahoma and it's originally only for alphabets, but multi-byte characters are supposed to be displayed in MSGOTHIC by Font Link as they are displayed correctly on the server.
Could anyone advise me how to solve this issue? I know I can fix it by changing the fonts from Tahoma to MSGOTHIC for all the contrls, but I don't want to do it.
Environment:
My PC：Windows7 64bit /Visual Studio 9.0.30729.1 / .NET Framework 3.5 SP1
Server：Windows Server 2003 R2 /Visual Studio 9.0.30729.1 / .NET Framework 3.5 SP1
Garbled characters sample:
FontLink - SystemLink
Please let me know if you need any more information. I would appreciate your advice!

Hi nino_miya,
According to your description, when you display the report in client side, characters are garbled.
In your scenario, please check if the Language is the same as the report on production server. Also please check if the data of Tahoma in registry on client PC is the same as server. If those two settings are the same, please specify font of the each
control as MSGOTHIC manaually on client PC.
If you have any question, please feel free to ask.
Best regards,
Qiuyun Yu
Qiuyun Yu
TechNet Community Support

JDBC2.0 API and Multi-Bytes Characters

I use the JDBC2.0 API with the thin Driver816 for jdk1.2.X,
it works well with English characters ,
but i get wrong with Multi-Bytes Characters.
Does anyone else know the reason?
Thanks in advance.

I have the same problem!!!!!!!!!!!
<BLOCKQUOTE><font size="1" face="Verdana, Arial">quote:</font><HR>Originally posted by huang Jian-chang:
I use the JDBC2.0 API with the thin Driver816 for jdk1.2.X,
it works well with English characters ,
but i get wrong with Multi-Bytes Characters.
Does anyone else know the reason?
Thanks in advance.<HR></BLOCKQUOTE>
null

Migrating Multi-Byte Characters

When Migrating from access 2000, all multibyte charcters
are coverted into single byte. The Database is runng UTF8?
Anyone done this before?
Thanks

#1 should return you the encoded string.
#2 should decode the string and return the correct characters.
If it doesn't, it's probably because the string was improperly
encoded.
#3 should cause #1 to do the same as #2, but you have to set
the property before JavaMail classes are loaded.

Unable to generate single byte character when used TO_SINGLE_BYTE

Hi All,
Can anyone help me in getting the output for the below single byte query. When tried it says INVALID NUMBER.
Step 1 :-
select RAWTOHEX('2Z') from DUAL; -- 325A
Step 2:-
SELECT TO_SINGLE_BYTE(CHR('325A')) FROM DUAL;
The above query when executed it says "ORA-01722: invalid number".
I tried using VARCHAR2 instead of CHR it throuws the below exception,
"ORA-00936: missing expression".
But the same query if no characters are passed is working fine.
SELECT TO_SINGLE_BYTE(CHR('3251')) FROM DUAL;
Thanks,
Ravi

TO_SINGLE_BYTE is used to convert multi-byte characters to single-byte characters. '325A' is not a multi-byte character so can't be converted.
Use HEXTORAW to convert the hex value back to a raw value.

To find out whether a character is of Single byte or Double bytes

hi,
is there any built-in class to find whether a character is a single byte or double byte. is there any method to do so?? If possible can someone provide a sample code snippet to do a check for single byte and double byte characters.
thanx in advance.....

If you are asking what size the char primitive is, it's 16 bits.
If you want to know the numerical value of a char, you can cast it to an int and compare it to 255 to see if it fits in 1 byte.

How best to send double byte characters as http params

Hi all
I have a web app that accepts text that can be in many languages.
I build up a http string and send the text as parameters to another webserver. Hence, whatever text I receive i need to be able to represent on a http query string.
The parameters are sent as urlencoded UTF8. They are decoded by the second webserver back into unicode and saved to the db.
Occassionally i find a character that i am unable to convert to a utf8 string and send as a parameter (usually a SJIS character). When this occurs, the character is encoded as '3F' - a question mark.
What is the best way to send double byte characters as http parameters so they always are sent faithfully and not as question marks? Is my only option to use UTF16?
example code
<code>
public class UTF8Test {
public static void main(String args[]) {
encodeString("\u7740", "%E7%9D%80"); // encoded UTF8 string contains question mark (3F)
encodeString("\u65E5", "%E6%97%A5"); // this other japanese character converts fine
private static void encodeString(String unicode, String expectedResult) {
try {
String utf8 = new String(unicode.getBytes("UTF8"));
String utf16 = new String(unicode.getBytes("UTF16"));
String encoded = java.net.URLEncoder.encode(utf8);
String encoded2 = java.net.URLEncoder.encode(utf16);
System.out.println();
System.out.println("encoded string is:" + encoded);
System.out.println("expected encoding result was:" + expectedResult);
System.out.println();
System.out.println("encoded string16 is:" + encoded2);
System.out.println();
} catch (Exception e) {
e.printStackTrace();
</code>
Any help would be greatly appreciated. I have been struggling with this for quite some time and I can hear the deadline approaching all too quickly
Thanks
Matt

Hi Matt,
one last visit to the round trip issue:
in the Sun example, note that UTF8 encoding is used in the method that produces the byte array as well as in the method that creates the second string. This is equivalent to calling:
String roundTrip = new String(original.getBytes("UTF8"), "UTF8");//sun exampleWhereas, in your code you were calling:
String utf8 = new String(unicode.getBytes("UTF8"))//Matt's code
[/code attracted
The difference is crucial. When you call the string constructor without a second (encoding) argument, the default encoding (usually Cp1252) is used. Therefore your code is equivalent toString utf8 = new String(unicode.getBytes("UTF8"), "Cp1252")//Matt's code
i.e.you are encoding with one transformation format and decoding back with a different transformation format, so in general you won't get your original string back.
Regarding safely sending multi-byte characters across the Internet, I'm not completely sure what the situation is because I don't do it myself. (When our program is run as an applet, the only interaction it has with the web server is to download various files). I've seen lots of people on this forum describing problems sending multi-byte characters and I can't tell whether the problem is with the software or with the programming. Two possible methods come to mind (of course you need to find out what your third party software is doing):
1) use the DataOutput/InputStreams writeUTF/readUTF methods
2) use the InputStreamReader/OutputStreamWriter pair with UTF8 encoding
See this thread:
http://forum.java.sun.com/thread.jsp?forum=16&thread=168630
You should stick to UTF8. It is designed so that the bytes generated by encoding non-ASCII characters can be safely transmitted across the Internet. Bytes generated by UTF16 can be just about anything.
Here's what I suggest:
I am running a version of the Sun tutorial that has a program running on a server to which I can send a string and the program sends back the string reversed.
http://java.sun.com/docs/books/tutorial/networking/urls/readingWriting.html
I haven't tried sending multi-byte characters but I will do so and test whether there are any transmission problems. (Assuming that the Sun cgi program itself correctly handles characters).
More later,
regards,
Joe
P.S.
I thought one the reasons for the existence of UTF8 was to
represent things like multi-byte characters in an ascii format?Not exactly. UTF8 encodes ascii characters into single bytes with the same byte values as ASCII encoding. This means that a document consisting entirely of ASCII characters is the same whether it was encoded as UTF8 or ASCII and can consequently be read in any ASCII document reader (e.g.notepad).

PDF acceleration F5 BigIP WA and double byte characters

We have been trying to use the F5 appliance from BigIP to accelerate the delivery of PDF files from SharePoint over the WAN. However, we encountered problems with the double-byte files many months ago and have been trying to resolve the problem with F5. We have turned off PDF acceleration on the F5 because of the problems. The problem occurs when PDF files have Kanji characters in the file name. If the file names are English (single byte) the problem does not occur, even if the content of the PDF contains Kanji characters.
After many months of working with F5, they are now saying that the problem is with the Adobe plug-in to Internet Explorer. Specifically they say:
The issue is a result of Adobe's (not F5's) handling of the linearization request of PDF’s with the Japanese character set over 300 KB when the Web Accelerator is enabled on the BigIP (F5) appliance. We assume the issue exists for all double-byte languages, not only Japanese. If a non-double byte character set is used, this works fine. “Linearization” is a feature which allows the Adobe web plug-in to start displaying the PDF file while it is still being downloaded in the background.
The F5 case number is available to anybody from Adobe if interested.
The F5 product management and the F5 Adobe relationship manager have been made aware of this and will bring this issue up to Adobe. But this is as far as F5 is willing to pursue as a resolution. F5 consider this an Adobe issue, not a F5 issue.
Anybody know if this is truly a bug with the PDF browser plug-in? Anybody else experienced this?

Your searches should have also come up with the fact that CR XI R2 is not supported in .NET 2008. Only CR 2008 (12.x) and Crystal Reports Basic for Visual Studio 2008 (10.5) are supported in .NET 2008. I realize this is not good news given the release time line, but support or non support of cr xi r2 in .net 2008 is well documented - from [Supported Platforms|https://www.sdn.sap.com/irj/sdn/go/portal/prtroot/docs/library/uuid/7081b21c-911e-2b10-678e-fe062159b453
] to [KBases|http://www.sdn.sap.com/irj/servlet/prt/portal/prtroot/com.sap.km.cm.docs/oss_notes_boj/sdn_oss_boj_dev/sap(bD1lbiZjPTAwMQ==)/bc/bsp/spn/scn_bosap/notes.do], to [Wiki|https://wiki.sdn.sap.com/wiki/display/BOBJ/WhichCrystalReportsassemblyversionsaresupportedinwhichversionsofVisualStudio+.NET].
Best I can suggest is to try SP6:
https://smpdl.sap-ag.de/~sapidp/012002523100015859952009E/crxir2win_sp6.exe
MSM:
https://smpdl.sap-ag.de/~sapidp/012002523100000634042010E/crxir2sp6_net_mm.zip
MSI:
https://smpdl.sap-ag.de/~sapidp/012002523100000633302010E/crxir2sp6_net_si.zip
Failing that, you will have to move to a supported environment...
Ludek
Follow us on Twitter http://twitter.com/SAPCRNetSup
Edited by: Ludek Uher on Jul 20, 2010 7:54 AM

How do I know that there are double-byte characters in s String?

Hi!
If I have a String that contain English and Chinese words,
How do I know that the String contain double-byte characters(Chinese words)?
Following is my method and the problem I suffered...
String A = "test(double-byte chinese)test";
byte B[] = A.getBytes();
if(A.length() != B.length)
System.out.print("String contains double-byte words");
else
System.out.print("String does not contain double-byte words");
If the String contains Chinese words,then A.length() will be smaller than B.length...
I run the program on Window NT workstation(Tradtional Chinese version) and it works...
Then I run the same program on Redhat 6.0(English version),
but the result was not the same as running on NT...
because A.length() always equal to B.length...
I guess that's because of Charset of OS...
But I don't know how to set the Charset of Linux...
Does anybody have other solution to my problem?
Any suggestion will be very appreciate!

A String is always in Unicode. You cannot see what kind of character is in the string unless you compare with the Unicode range of charcters. E.g. 3400-4DB5 is CJKUnified Ideographs extension A. Then you at least know that is is not Latin-1 or other.
Klint

XSL Transform, double-byte characters and padding

I have a stylesheet with the following variable that is being formatted to pad a parameter named textQualifierDescription to a length of 30 by calling the template called format-string.
<xsl:variable name="textQualifierDescription2">
     <xsl:call-template name="format-string">
          <xsl:with-param name="myString" select="$textQualifierDescription"/>
          <xsl:with-param name="numbatchspaces">30</xsl:with-param>
     </xsl:call-template>
</xsl:variable>
<xsl:template name="format-string">
     <xsl:param name="myString" select="' ' "/>
     <xsl:param name="numbatchspaces" select="20"/>
     <xsl:param name="direction" select="right"/>
     <xsl:variable name="spacesstr" select="string('                                              ')"/>
     <xsl:variable name="padsize" select="$numbatchspaces -string-length($myString)"/>
     <xsl:variable name="spacepad" select="substring($spacesstr, 1, $padsize)"/>
     <xsl:choose>
          <xsl:when test="$direction = 'left'">
               <xsl:value-of select="concat($spacepad,$myString)"/>
          </xsl:when>
          <xsl:otherwise>
               <xsl:value-of select="concat($myString,$spacepad)"/>
          </xsl:otherwise>
     </xsl:choose>
</xsl:template>I execute the xsl transform using the following statement in a stored procedure:
transformedData := xmldata.transform(xsldata);The xsl transform works as expected until it encounters data that contains double-byte characters. My output is supposed to contain the following three fields as a single record
textQualifierDescription - padded to a length 30
lineNumber
id
If my textQualifierDescription contains a value of "Texto de posición"
Line 1 - Texto de posición             00000001POS2005
Line 2 - Texto de posición            00000001POS2005
Line 1 is the expected result.
Line 2 is the actual result. When the "format-string" function is called and even though "Texto de posición" is 17 characters in length, it looks as if oracle counts the double-byte character as 2 and calculates the string-length as 18 to come up with a padsize of 12. It then creates a spacepad of 12 spaces which is then concatenated to the 17 characters for a total length of 29. I have tested the stylesheet in xmlspy and it produces the expected result.
Has anyone ever run into this sort of situation and is able to provide me with some sort of solution to this dilemma? This is running on 10g Release 10.2.0.4.0.

Your searches should have also come up with the fact that CR XI R2 is not supported in .NET 2008. Only CR 2008 (12.x) and Crystal Reports Basic for Visual Studio 2008 (10.5) are supported in .NET 2008. I realize this is not good news given the release time line, but support or non support of cr xi r2 in .net 2008 is well documented - from [Supported Platforms|https://www.sdn.sap.com/irj/sdn/go/portal/prtroot/docs/library/uuid/7081b21c-911e-2b10-678e-fe062159b453
] to [KBases|http://www.sdn.sap.com/irj/servlet/prt/portal/prtroot/com.sap.km.cm.docs/oss_notes_boj/sdn_oss_boj_dev/sap(bD1lbiZjPTAwMQ==)/bc/bsp/spn/scn_bosap/notes.do], to [Wiki|https://wiki.sdn.sap.com/wiki/display/BOBJ/WhichCrystalReportsassemblyversionsaresupportedinwhichversionsofVisualStudio+.NET].
Best I can suggest is to try SP6:
https://smpdl.sap-ag.de/~sapidp/012002523100015859952009E/crxir2win_sp6.exe
MSM:
https://smpdl.sap-ag.de/~sapidp/012002523100000634042010E/crxir2sp6_net_mm.zip
MSI:
https://smpdl.sap-ag.de/~sapidp/012002523100000633302010E/crxir2sp6_net_si.zip
Failing that, you will have to move to a supported environment...
Ludek
Follow us on Twitter http://twitter.com/SAPCRNetSup
Edited by: Ludek Uher on Jul 20, 2010 7:54 AM

Double Byte Characters for Japanese Kanji Language

Hello All,
How can you identify the Double Byte characters and how we can check whether they are double byte characters or not.
Can you please tell me the double byte characters from Japanese Kanji language for the characters.
This is very urgent.I will reward you with points.
Thanks,
Karan

*& Report YTEST_LOGIC
REPORT teched_unicode_solution_1               .
*** Exercise 1: Distinction between byte and character length
*** after Unicode enabling
parameter: param type c.
PERFORM test1 USING param.
* FORM test1
FORM test1 USING text TYPE c.
DATA: len1 TYPE i,
        len2 TYPE i,
        off TYPE i.
DESCRIBE FIELD text LENGTH len1 IN BYTE MODE.
DESCRIBE FIELD text LENGTH len2 IN CHARACTER MODE.
WRITE:/ LEN1, LEN2.
ENDFORM.                                                    "test1
Use the above code to find which characters are double bytes and which are not...
The double byte characters will have double the length when they are in the byte mode..
reward points for helpful answers
Edited by: Rahul Kavuri on Mar 26, 2008 6:35 PM

Urgent: comparing multi-byte characters to a single byte character!

Similar Messages

Maybe you are looking for