Unrecognised Char in GB2312 character set using java InputStreamReader??

Reading the following file chinese GB2312 html file from
http://news.xinhuanet.com/local/2007-02/13/content_5732705.htm
using the InputStreamReader with GB2312 encoding as shown below
public class readGB2312html file
//........TmpText declarations.....
public static void main( String[] args )
try
FileInputStream is = new FileInputStream( args[0] );
BufferedReader br = new BufferedReader(
new InputStreamReader( is, "GB2312" ) );
String strLine;
while ( (strLine = br.readLine()) != null )
TmpText.append(strLine);
TmpText.append("\r\n");
br.close();
bw.close();
catch ( Exception e )
e.printStackTrace();
The TmpText variable does not display the last character in the article properly （记者夏珺） it gives instead （记者夏?B）
Inside the html file the unrecognised charcter is represented by �B in the html file Why is this so
��B��
In the internet browser it is displayed and recognised as a chinese GB2312 character why not recognised by Java InputStreamReader???
Any help or explanation would be much appreciated

Yes, it is not a GB2312 character
The �B character is AC40 in hex format which is outside of the GB2312 character range, it is in GBK
Copied from wikipedia,
GBK is an extension of the GB2312 character set for simplified Chinese characters, used in the People's Republic of China.
GB stands for National Standard, while K stands for Extension. GBK not only extended the old standard GB2312 with Traditional Chinese characters, but also with Chinese characters that were simplified after the establishment of GB2312 in 1981. With the arrival of GBK, certain names with characters formerly unrepresentable, like the "rong" (�g) character in former Chinese Premier Zhu Rongji's name, are now representable.
Thanks a lot will use the GBK charset to read the file for all GB2312 file since it is a subset of it.

Similar Messages

Setting Multiple values in property set using java API

Hello All,
I want to set the properties of a profile in a property set using java API provided
in package p13n. The property can have multiple values. When I try to add the
property using ProfileManager.setProperty() method. But every time I do it this
way, it replaces the earlier value of property and not added. This, I can achieve
using portalTools but I want to use the API for user registration on the site.
I hope the query is clear.
Waiting for a response,
Thanks in advance,
Shrinivas

You need to use java.util.ArrayList.
First cast the existing value into ArrayList using getProperty method,
change values in the ArrayList and then put them back with setProperty
method.
Regards,
Michael Goldverg
"Shrinivas Rao" <[email protected]> wrote in message
news:3d64e7d9$[email protected]..
>
Hello All,
I want to set the properties of a profile in a property set using java APIprovided
in package p13n. The property can have multiple values. When I try to addthe
property using ProfileManager.setProperty() method. But every time I do itthis
way, it replaces the earlier value of property and not added. This, I canachieve
using portalTools but I want to use the API for user registration on thesite.
I hope the query is clear.
Waiting for a response,
Thanks in advance,
Shrinivas

Change character set used to write a file in application server.

Hello Experts,
I want to know if we can change the character set used to create a file in application server.(Is it posible to use a particular character set while creating a file in application server.
I will be very great full for any help.
Thanks in advance.
Sharath

Hello Sarath,
There is an extension CODE PAGE with OPEN DATASET stmt.
Can you please elaborate which character set you want to write to the application server?
BR,
Suhas

How can I see KSC5602 character set using JDBC thin driver

After I change character set from USASCII7 to KO16KSC5601, I
cannot see korean from the clients
using JDBC thin driver.
But, I can see korean clearly using sqlplus at serer, or
application using SQLNet.
I use Oracle Enterprise Server 8.0.4.1.0, jdbc thin driver
8.0.4.0.6 on Windows 98. I read that all bugs realated
to multibyte language are fixed in Oracle8. What can I do to
solve this problem?
PS.server: Oracle 8.0.4.1 on Digital Unix 4.0b, client: jdk1.1.8
on Windows98. I used the command.
null

The easiest thing to do is download it as an archive with your applet.
Otherwise, you have to have the files on every client machine.
For netscape, put the classes111.jar in the java classes folder typically:
c:\ProgramFiles\Netscape\Communicator\Program\java\classes.
I'd expect that IE would be setup in a similar way.

Default Character Set using JSObject

Hi All,
This problem has been nagging me for a while and am now resorting to this forum for an answer.
I have a jsp page with an embedded applet. Inside the applet, I read the HTML page using JSObject.
The problem is when using the JSObject to get values of controls from the HTML page with Japanese characters.
The HTML page is encoded in UTF-8, however, when I get values from the controls using JSObject in the applet, the values returns as ???. Latin characters are supported but not Japanese characters. So.. I'm wondering what character set the JSObject supports when converting a Javascript string to a Java String.
The following code is executed:
          JSObject win = JSObject.getWindow(this);
          JSObject doc = (JSObject) win.getMember("document");
          JSObject forms = (JSObject) doc.getMember("forms");
     JSObject form = (JSObject)forms.getSlot(0);
          JSObject title = (JSObject)form.getMember("title");
String titleValue = (String)title.getMember("value");
I've also tried form.eval("document.forms[0].title.value") and that returns the same ??? for japanese characters.
Any ideas?
Kent

Hi Larry,
The characters that appear at the beginning of each file - ï»¿ - is the BOM or byte order mark for UTF-8, which is automatically added to the file on creation. These files are UTF-8 encoded, to allow for the support of multi-byte characters. An updated version of the Exporter Tool removes these BOM characters. Please contact Support to obtain this updated version of the Exporter tool.
Alternatively, you can try the following:
If the character set of your Oracle database is not UTF-8, then you have two options:
1) If possible, change the character set of your database to UTF-8. To check the current database characterset, check the "NLS_DATABASE_PARAMETERS" table.
or
2) Open the generated .dat files using Notepad, then use the File | Save As menu option, and set the "Encoding" to ANSI, then save the file. The BOM will now be removed from the .dat files.
I hope this helps.
Regards,
Hilary

Need suggestion on Multi currency and Unicode character set use in ABAP

Hi All,
Need suggestion. In one of the requirement I saw 'multi-currency and Unicode character set experience in FICO'.
Can you please elaborate me how ABAPers are invlolved in multi currency as I think this is FICO fuctional area.
And also what is Unicode character set exp.? Please give me some document of you have any.
Thanks
Sreedevi
Moderator message - This isn't the place to prepare for interviews - thread locked
Edited by: Rob Burbank on Sep 17, 2009 4:45 PM

Use the default parser.
By default, WebLogic Server is configured to use the default parser and transformer to parse and transform XML documents. The default parser and transformer are those included in the JDK 5.0.
The built-in WebLogic Server DOM factory implementation class is com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl.
The DocumentBuilderFactory.newInstance method returns the built-in parser.
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();

CHANGE SECURITY SETTING USING JAVA

Hi all,
Please Help!!!
I am using, JAVA / STRUTS for developing an web application.
In the Client/ Server Technology, I need to set the client Browser Security Level.
Can it be done through JavaScript or how it can be achieved.
Thanks & Regards
Glass_Fish.........

I am using, JAVA / STRUTS for developing an web application.
In the Client/ Server Technology, I need to set the client Browser Security Level.Not possible.
Can it be done through JavaScript or how it can be achieved.By asking the end user to change their Security level if your application really needs them. For example there are ways to detect whether scripting language is enabled or not, whether cookies are enabled or not. If your application really needs that features, Ask user to enable the particular feature by displaying proper messages.

Unicode Character Sets in Java

I am trying to port code from a PowerBuilder 10.5.1 Build 6021 environment to Java and I am encountering getting the same value for the Euro character in Java and PowerBuilder.
I get a value of 20AC (8364 in decimal) in Java which is consistent with ISO-8859-15, but I get a value of 128 in PowerBuilder.
This is only a small example and perhaps not strictly a Java question, but if anyone has any suggestions, I would appreciate it.

In the ISO-8859-x encodings, each character is represented by one byte, but the numbers 128-159 are not used. The Windows extensions of those encodings, like windows-1252, map those unused numbers to useful characters like curly quotes and the Euro symbol (and create compatibility problems like this one in the process). PowerBuilder is obviously using one of these Windows encodings. However, there are several points in the development process where character encodings come into play, so we'll need more info. Does the problem occur when you compile the code, or when you run it? If it's at run time, does it happen when you're reading text from a file, a database, or some other source? And how exactly does the problem manifest?

Character recognition using java

Hi guys,
I have a fairly complex problem that I need to solve. Basically I am reading a set of pixel colours from a 3rd party client. I need to take this pixel data and recognize the characters used in it.
Does anybody know of any good character recognition tutorials I can use?
Anybody every done anything like this?
Any help will be great
Thanks
Alex

over the last month or so i have checked out those links and the downloadable software they offer, however, you can only us their trail versions which arent suitable for me.
Does anybody know where I should start if i want to write my own OCR functionality? are neural nets the best way to go?
Any advice/suggestions will be great.
Thanks
alex

Trouble Using Character Sets - Chinese GB2312

Hi,
I am trying to display my site in Simplified Chinese
(GB2312). I have verified that all of the files are encoded with
the GB2312 character set, when I open them in a language capable
editor I can see chinese characters.
I have also used <cfheader name="Content-Type"
value="text/html; charset=gb2312" /> at the top of my
Application.cfm file.
Yet, when I view the file in any browser, it shows only
question marks or odd characters in place of the chinese
characters.
When I look at the browser character encoding settings, they
are correct (a check mark next to Simplified Chinese).
Any thoughts, am I missing a step.
Thanks in advance for any help given.

lan99 wrote:
> I have also used <cfheader name="Content-Type"
value="text/html;
> charset=gb2312" /> at the top of my Application.cfm
file.
what ver of cf?
if cf6 or better, have you used cfprocessingdirective at the
top of each file?

German character set issues on Solaris

Hi,
I am facing an issue with German character settings with my Java application on a solaris box.
When I run my application on the box, and I pass an input file with German special characters they get converted as ?. However, other normal English characters are formed properly.
When I run the same application on another Solaris box with a different JRE, the German characters are formed properly.
I understand that there is a difference in the archiecture between the 2 boxes ie.e
64 bit SPARC machine v/s 32 bit x86 machine
the JRE
1.4.2_03(64bit) v/s 1.4.1_01
I am tryinbg to evaludate further differences between the 2 environments to pinpoint the issue, and get this resolved on the 1st box.
Can anyone provide me any inputs?
Lavin

When you read the file, please point out what character set using. For example:
FileInputStream fstream = new FileInputStream(url.getFile());
DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader(in, Charset.forName("ISO-8859-1")));
br.readLine();
This link possibly can help you.
http://www.velocityreviews.com/forums/t126128-jdk-14-character-set-change.html

Oracle character sets for China

Which characte sets used to store data in Chinese language?

yes, if you have a database with a UTF8 characterset you can store chinese characters. If you have a database with a single-byte characters set and you prefer not to change the characterset of the database (because it means recreating the database) you can also store the chinese characters in NCHAR columns and use the National Characterset for the chinese characters.
This way you can keep your standard single-byte characterset and only use the national characterset for specific columns.
For this, set NLS_NCHAR_CHARACTERSET to a multibyte characterset and use NCHAR as columntype

BIG5 and HKSCS Character Set Support

Hi,
We're experiencing some problems inserting a string containing both BIG5 and HKSCS characters to a 7.3.4 Oracle DB using JDBC. The underlying character set used by the DB is ZHT16BIG5 (this cannot be changed). The characters can be inserted correctly if we use SQLPlus/WorkSheet.
Take note that the BIG5 character set can be inserted correctly. The problem occurs if we include HKSCS characters in the statement.
We have tried a number of ways already but failed to convert the data properly.
We tried converting the data using ByteToCharConverter.getConverter("Big5") but this cannot handle the HKSCS properly.
We even tried using the CharacterSet.ZHT16BIG5_CHARSET provided by the NLS character set but it cannot convert all HKSCS characters correctly.
Any ideas on how to solve this problem? Or is it because the HKSCS character set is NOT supported by the JDBC driver?
Below is a sample text containg both BIG5 and HKSCS characters:
'i$h%49D$G$Q$T89 Ize _ ^ S( R @ A Y q
Any help/suggestion is most welcome.
Thanks,
Cis
null

I got the exact same problem as you.
(The Oracle I using is 8.1.7)
Can any one help??

Change default character set of JVM

Is there a way to change the default character set of JVM to say, UTF-8?
System.out.println("Default Character Set: " + new java.io.OutputStreamWriter(new java.io.ByteArrayOutputStream()).getEncoding());
System.out.println("File Encoding: " + System.getProperty("file.encoding")); On Windows
==========
Default Character Set: Cp1252
File Encoding: Cp1252
On Linux
========
Default Character Set: ASCII
File Encoding: ANSI_X3.4-1968
I would like to save on the effort of changing the many lines of code that looks like
new BufferedWriter(new OutputStreamWriter(out)); to
new BufferedWriter(new OutputStreamWriter(out, "UTF-8")); Thanks

Try this:
-Dfile.encoding=utf-8
as vm argument.
/Kaj

Character set in MDL export/import

Hi,
we are running OWB 10.1.0.2. In order to get version control, we perform MDL exports of collections from our development environment and then import them into our test and production environments. Each environment uses its own design repository, but all repositories are in the same database.
When doing an export of a collection, we always specify the character set AL32UTF8 because that is what we are running in the repository database. When later doing an import using the graphical user interface, it is not possible to specify the character set (but this can be done when using the import utility). According to the documentation, the GUI import will then assume that the character set in the collection is the character set of the client, which usually is WE8MSWIN1252. (The documentation also says that it IS possible to specify the character set during GUI import, this is obviously a documentation error).
My questions are: What is the point of specifying character set when doing exports and imports? Could an AL32UTF8 export followed by an WE8MSWIN1252 import cause problems? I assume that the character set used by export is specified in the collection file, so does the import then convert it to WE8MSWIN1252 (or the character set specified in the import utility)?
Or, to be more general: What is actually happening with the character sets during MDL export/import?
/Kjell Gullberg

Dear ski123,
I think you are not going to loose any data of yours when you migrate the database. You may proceed to the import.
Please find below documentations;
http://download.oracle.com/docs/cd/B19306_01/server.102/b14196/install003.htm#sthref81
For Database Character Set, select from one of the following options:
    *Use the Default—Select this option if you need to support only the language currently used by the operating system for all your database users and your database applications.
    *Use Unicode (AL32UTF8)—Select this option if you need to support multiple languages for your database users and your database applications.
    *Choose from the list of character sets—Select this option if you want the Oracle Database to use a character set other than the default character set used by the operating system.Choosing a Character Set;
http://download.oracle.com/docs/cd/B19306_01/server.102/b14225/ch2charset.htm#NLSPG002
AL32UTF8;
http://download.oracle.com/docs/cd/B19306_01/server.102/b14225/glossary.htm#sthref2039
Hope That Helps.
Ogan

Unrecognised Char in GB2312 character set using java InputStreamReader??

Similar Messages

Maybe you are looking for