National characters

I have written a Java cross platform program that counts the frequency of words in a given text. I output a list with the words and their frequency to a JTextArea as well as to a text file. The program works fine but I still have a problem. My text is Swedish and the special national characters are not written as they should for instance "o with two dots over" is written as "‰" and so on. (This is on a Macintosh). Even worse when I run the program on a Windows machine all the three special Swedish characters are written by one and the same character.
If I output to the console these characters are written as \xxx.
What to do???

Check to make sure you're reading the file in with the right encoding. Sadly, there's no way to do it directly with a FileReader.
BufferedReader in = new BufferedReader(new InputStreamReader(new FileInputStream("foo.txt"), "UTF-8"));
Does your program display the output graphically to the user or does it write the results to a file? If it's to a file then it might be the output encoding.
Message was edited by: clevans

Similar Messages

Problem crawling filenames with national characters

Hi
I have a big problem with filenames containing national (danish) characters.
The documents gets an entry in in wk$url but have error code 404 (Not found).
I'm running Oracle RDBMS 9.2.0.1 on Redhat Advanced Server 2.1. The
filesystem is mounted on the oracle server using NFS.
I configure the Ultrasearch to crawl the specific directory containing
several files, two of which contains national characters in their
filenames. (ls -l)
<..>
-rw-rw-r-- 1 user group 13 Oct 4 13:36 crawlertest_linux_2_fxeFXE.txt
-rw-rw-r-- 1 user group 19968 Oct 4 13:36 crawlertest_windows_fxeFXE.doc
<..>
(Since the preview function is not working in my Mozilla browser, I'm
unable to tell whether or not the national characters will display
properly in this post. But they represent lower and upper cases of the
three special danish characters.)
In the crawler log the following entries are added:
<..>
file://localhost/<DIR_PATH>/crawlertest_linux_2_B|C?C%C?C?.txt
file://localhost/<DIR_PATH>/crawlertest_linux_2_B|C?C%C?C?.txt
Processing file://localhost/<DIR_PATH>/crawlertest_linux_2_%e6%f8%e5%c6%d8%c5.txt
WKG-30008: file://localhost/<DIR_PATH>/crawlertest_linux_2_%e6%f8%e5%c6%d8%c5.txt: Not found
<..>
file://localhost/<DIR_PATH>/crawlertest_windows_B|C?C%C?C?.doc
file://localhost/<DIR_PATH>/crawlertest_windows_B|C?C%C?C?.doc
Processing file://localhost/<DIR_PATH>/crawlertest_windows_%e6%f8%e5%c6%d8%c5.doc
WKG-30008:
file://localhost/<DIR_PATH>/crawlertest_windows_%e6%f8%e5%c6%d8%c5.doc:
Not found
<..>
The 'file://' entries looks somewhat UTF encoded to me (some chars are
missing because they are not printable) and the others looks URL
encoded.
All other files in the directory seems to process just fine!.
In the wk$url table the following entries are added:
(select status url from wk$url where url like '%crawlertest%'; )
404 file://localhost/<DIR_PATH>/crawlertest_linux_2_%e6%f8%e5%c6%d8%c5.txt
404 file://localhost/<DIR_PATH>/crawlertest_windows_%e6%f8%e5%c6%d8%c5.doc
Just for testing purpose a
SELECT utl_url.unescape('%e6%f8%e5%c6%d8%c5') from dual;
Actually produce the expected resulat : fxeFXE
To me this indicates that the actual filesystem scanning part of the
crawler can sees the files, but the processing part of the crawler can
not open the file for reading and it therefor fails with error 404.
Since the crawler (to my knowledge is written in Java i did some
experiments, with the following Java program.
import java.io.*;
class filetest {
public static void main(String args[]) throws Exception {
try {
String dirname = "<DIR_PATH>";
File dir = new File(dirname);
File[] fs = dir.listFiles();
for(int idx = 0; idx < fs.length; idx++) {
if(fs[idx].canRead()) {
System.out.print("Can Read: ");
} else {
System.out.print("Can NOT Read: ");
System.out.println(fs[idx]);
} catch(Exception e) {
e.printStackTrace();
The performance of this program is very depending on the language
settings of the current shell (under Linux). If LC_ALL is set to "C"
(which is a common default) the program can only read files with
filenames NOT containing national characters (Just as the Ultrasearch
crawler). If LC_ALL is set to e.g. "en_US", then it is capable of
reading all the files.
I therefor tried to set the LC_ALL environment for the oracle user on
my oracle server (using locale_config, and .bash_profile) but that did
not seem to fix the problem at hand.
So (finally) my question is; is this a bug in the Ultrasearch crawler
or simply a mis configuration of my execution environment. If the
latter how do i configure my system correctly?
Yours sincerely
Martin Dahl Pedersen, Visanti ( mdp at visanti dot com )

I've posted my problems as a TAR on METALINK a little week ago.
And it turns out to be a new bug in UltraSearch.
It is now filed under BUG:2673282
-- mdp

How to make Reports 9i display Danish national characters?

I am running Oracle9i Reports and cannot make Reports print the Danish national characters f, F, x, X, e and E. I have a development machine with Developer Suite 9.0.2, where I can run the report in Paper Design, where the characters displays correctly, but as soon as they are uploaded to the Application Server (9.0.2), all of the national characters are replaced with some very mysterious characters. The dev. machine and the Oracle9iAS machine both connect to the same database, and when I make a boilerplate object just containing "FXE", the problem is still there, so it does not seem to be a database issue.
I read some articles on MetaLink about adding some lines in uifont.ali, but they do not seem to apply, since the article only mention East-European languages (Polish and Czech). The font used is Times New Roman. The dev. machine has NLS_LANG set to AMERICAN_AMERICA.WE8MSWIN1252, and the Oracle9iAS machine is running DANISH_DENMARK.WE8MSWIN1252 - ie. the same character set. I tried to generate the report both to HTML and PDF, but that did not make any difference regarding this issue.
How do I make Oracle9i Reports Services display the Danish national characters correctly?
Thanks in advance!

Thanks for your suggestions.
However, here's what I've done, and it did not make any difference.
1. Changed the NLS_LANG parameter to match on both server and dev. machine and recompiled and saved the RDF - no difference.
2. Installed the same model printer on the server, as the one on the development machine, and rebooted the server - no difference.
3. Checked uifont.ali on both systems - they're exactly the same...
What else might be causing this?

Editable drop down do not show national characters

Hi
I'm using DW CS3 with Developer toolbox, PHP MySql.
Problem is that Editable drop down show national characters wrongly.
actually its inserts data in to database with wrong encoding.
I use encoding "charset=utf-8", all other forms working fine.
Only Editable drop down show [squares] instead Ä Ö Ü ...
How i can do that Editable drop down will inserts data in utf-8 encoding?
(like other forms and fields in my page)
Thanks!

Does it help if you disable hardware acceleration ?
*Tools > Options > Advanced > General > Browsing: "Use hardware acceleration when available"
*https://support.mozilla.org/kb/Troubleshooting+extensions+and+themes
*https://hacks.mozilla.org/2010/09/hardware-acceleration/

Table Import Data - "Insert script" - National characters

Hi all,
it looks like that there is a problem with support of national characters in imported data file when method "Insert script" is chosen.
Table -> Import Data -> Open datafile "csv".
As far as in the preview window I'm seeing properly displayed national characters from csv data file and when I'm choosing "Insert" or "SQL Loader" method - data is properly imported to the table.
But when I'm using "Insert script" method, in generated script national characters are changed into "bushes":
http://imm.io/V0J9
SQL Developer: Version 3.2.20.09
OS: Windows XP SP3
Client code page: WIN-1250
Tested databases: 10g, 11g

This has been fixed in the latest build. The patch is now available for <a href = "http://www.oracle.com/technology/software/products/sql/index.html">download</a>.

Regards
Sue

10g client mangles national characters, 9i client is ok

We are having a strange problem with some 10.2.0.4.0 clients on Windows XP. They make an incorrect conversion of national characters while querying from a 10.2.0.4.0 database. For example, the "ä" letter in the result set is converted to "a", which must not happen. When connecting to the same 10g database with a 9i client and issuing exactly the same SELECT statement, the result is correct. How can we make the 10g client treat national characters correctly?

Thanks for your help everybody. Yes, there was a conflict between the database and client character sets. I used the NLS_LANG environment variable in Windows to instruct the client to use the same character set with the database, and this seems to solve the problem.
I just wonder how the 9i client was able to do what we wanted, while there were problems with 10g. There are exactly the same NLS_LANG values in the registry for 9i and 10g, each containing a character set part that is inconsistent with that of the database. Also, after setting NLS_LANG in Windows, 9i still gave the correct result, as if NLS_LANG had no effect on it.

OVD - special/national characters in LDAP context

Hi all,
I created integration between Active Directory and Oracle 10g via Oracle Virtual Directory 10g. All works correctly but some users have national characters in his/her AD context. For example Thomas Bjørne (cn=Thomas Bjørne,cn=Users,dc=media,dc=local). In this case this user cannot login into database. I know that problem is with special national characters in AD context but I don't know how solve it. It is not possible change AD context :-(
Can somebody help me with it?

Lets first verify that you can bind to OID using the command line
commands with an existing user in OID.
Lets assume for a moment that your users password is welcome and
their DN in OID is cn=jdoe,c=US
Try the following command and tell me what the results are.
ldapsearch -p port_num -h host_name -b "c=US" -s sub -v "cn=*"
It should return all users under c=US. If not let me know the
error message you get.

Losing NATIONAL CHARACTERS(blob- clob- table). unistr?

Hello!
I have a problem with national characters. My example is as follows:
1. A csv file is uploaded from disk to htmldb_application_files
2. This BLOB is then converted to CLOB with dbms_lob.converttoclob()
3. Data from this CLOB is copied to PL/SQL array.
4. From PL/SQL array to table in database.
The problem: Either data copied to table in database loses national characters (display strange characters instead of national), or if I set my national character set id as an argument of dbms_lob.converttoclob() function I have an error - says that file is inconvertible.
What is wrong? How can I solve my problem? Can unistr() help somewhere? Any ideas?
Tom

Duplicate posting, being addressed at:
losing NATIONAL CHARACTERS(blob->clob->table). unistr?

File adapter, File encoding national characters

Hi,
I have a problem with national characters (ÅÄÖ) when sending (receiver adapter) files with the fileadapter.
When i specify Transfere mode = Binary and File Type = Binary everything works fine but when i use Transfere mode =+ Text+ the national characters gets converted to "?". I have tried to set File Type = text and tryed File Encoding with UTF-8 and ISO-8859-1 without success.
Please help!
Regards
Claes

Hi,
Check this out: <a href="https://www.sdn.sap.comhttp://www.sdn.sap.comhttp://www.sdn.sap.com/irj/sdn/go/portal/prtroot/docs/library/uuid/502991a2-45d9-2910-d99f-8aba5d79fb42">How To Work with Character Encodings in Process Integration</a>
Regards,
Jakub

National characters in pages, accessed through Application/URl

Jambo all!
I hope you can help me with this little problem:
1. I've created application and added URL item to it
2. URL is pointing to an external ASP page (I hope that fact that this is an ASP page does not influent the behavior)
3. I've published it as a portlet and added it to a page.
Everything is working OK except our national accented characters - they rae all converted to '?' sometime in the process of rendering page. There really are question marks instead of proper characters in the source code so any usage of browser encoding or meta tags in html is useless ;(
The question is - how can I convince page (URL) rendering system to leave my national characters intact?
TNX a lot in advance!
null

Solution of this problem is shown here:
Re: Error Message: print success message checksum content error in Apex 4.0

National characters and new Java API

Hi All,
I'm looking for your experience with new java api and national characters (like: ü, ś, ć, etc.). The problem is that when record was updated using MDM Data Manager, and retrieved using new java api - national character are invalid (in java string the national character are represented incorrectly).
It's strange due to fact that when I create or update this record from java API it's looks fine. Second finding is that old java api (MDM4J) works fine on text fields with national characters.
Maybe I forget to set something in server configuration / repository / or on java api connection - any help appreciated...
Regards, marcin

While retrieving data via the Java API 2,
you should set the Unicode Normalization after the user session is authenticated.
I guess this is available in SP5 patch.
The documentation for this is available at
https://help.sap.com/javadocs/MDM/current/index.html
Package: com.sap.mdm.commands
SetUnicodeNormalizationCommand cmd = new SetUnicodeNormalizationCommand(connectionAccessor);
cmd.setSession(userSession);
cmd.setNormalizationType(SetUnicodeNormalizationCommand.NORMALIZATION_COMPOSED);
cmd.execute();
This command is used to set the Unicode normalization. This is used for the lifetime of the session. It should be set after the session is authenticated.
Unicode normalization is important when a text string is represented differently depending on the normalization used. The MDM server always store text strings in one normalization format. An user providing a text string to the MDM server and later on tries to retrieve back the same text string might get the text string back in a different normalization. To resolve this issue, the user can use this class to specify the normalization the user wants to work with. The MDM server will always return text strings in the normalization specified by this class.

National characters (code page) problem

I made JSP page with code page 1250 with characters specific to this code page. In JDeveloper everything looks OK. Compiled page (Java file) also shows good, but when I open it in Web browser all national characters are lost (question marks instead of letters). Can anybody help me to solve this problem?
Note: JDeveloper is configured to mentioned code page.

have you tried posting in the ABAP webdypro forum?

National characters problem

Hi.
I'm using AE on XE 10.2.0.1.0
I have problem with typing national characters f.e. in updatable Report Attributes Column Heading (Custom). If i type name for heading "Ilość", then push "Apply changes", name are saved without national characters, "Ilosc".
Why it is happenig ?
Should i change settings in Application ? Or on database ?
Should i use another Browser (currentlny SeaMonkey) ?
I have download "Oracle Database 10g Express Edition (Western European)".
Should I download and use "Oracle Database 10g Express Edition (Universal)" ???
My APP globalization parameters:
Application Primary Language : Polish (pl)
Application Language Derived From: Application Preference (using FSP_LANGUAGE_PRFERENCE)
Automatic CSV Encoding: no
My DB NLS settings :
NLS_CALENDAR GREGORIAN
NLS_CHARACTERSET WE8MSWIN1252
NLS_COMP BINARY
NLS_CURRENCY zl
NLS_DATE_FORMAT RR/MM/DD
NLS_DATE_LANGUAGE POLISH
NLS_DUAL_CURRENCY zl
NLS_ISO_CURRENCY POLAND
NLS_LANGUAGE POLISH
NLS_LENGTH_SEMANTICS BYTE
NLS_NCHAR_CHARACTERSET AL16UTF16
NLS_NCHAR_CONV_EXCP FALSE
NLS_NUMERIC_CHARACTERS ,
NLS_SORT POLISH
NLS_TERRITORY POLAND
NLS_TIME_FORMAT HH24:MI:SSXFF
NLS_TIMESTAMP_FORMAT RR/MM/DD HH24:MI:SSXFF
NLS_TIMESTAMP_TZ_FORMAT RR/MM/DD HH24:MI:SSXFF TZR
NLS_TIME_TZ_FORMAT HH24:MI:SSXFF TZR

N'<national symbols>', being part of an SQL statement, will be converted to the database character set (WE8ISO8859P1) before being parsed. Only if the client and the database are both 10.2 or higher, the client can encode the literal appropriately so that it survives this conversion.
In earlier versions, you can do the encoding yourself. Instead of the N'<national symbols>' literal use the UNISTR function: UNISTR('\xxxx\yyyy\zzzz'), where U+xxxx, U+yyyy, U+zzzz are Unicode code points of your national characters.
-- Sergiusz

Problem with special national characters

Hi,
How can I turn on the Oracle Application Server 10g to correct expose special national characters (ANSI 1250 Central Europe page)?
It hosted on Windows Server 2003 where are appropriate character resources.
Thanks in advance
KM

Check the available languages in SMLT (trn). In example stated below the characters coming from DI are Spanish characters, which are gettnig converted to Swedish 1s.
Please go through the following:
Re: Japanese characters

How to send Oracle rowid to servlet? | Problem with national characters.

There is same possibility how to send rowid to servlet?
I have now definition like this:
<af:image source="/imageservlet?Par1=#{bindings.Col1.inputValue}"/>
But If column contents national characters, servlet methods obtained changed these characters.
My idea is to use not primary key for row, but use oracle rowid. It is simply possible?
Use something like this:
<af:image source="/imageservlet?Rowid=#{bindings.Rowid}"/
Or Do you have ideas how to solve problem with national characters ?
Thanks
FiL

Hi,
Although your workaround works.
I think this is a simple encoding problem.
I simply need to make sure all parameters and pages are encoded with a char set which contains the national characters you mentioned.
This is a bit dependent on the exact technology your using, but most can be done via the web.xml:
<jsp-config>
 <jsp-property-group>
 <url-pattern>*.jsp</url-pattern>
 <page-encoding>UTF-8</page-encoding>
 </jsp-property-group>
</jsp-config> This forces all JSP pages to be encoded in UTF-8
Adding the following parameter sometimes helps as well, although I think this one is a bit dated:
You said your using a servlet so your servlet needs a similar block for its pattern
<context-param>
 <param-name>PARAMETER_ENCODING</param-name>
 <param-value>UTF-8</param-value>
</context-param>If you want to be 100% sure the encoding is set right make sure thepages contain:
<%@ page contentType="text/html;charset=utf-8"%>Or depending on your view technology the syntax can be a bit different
-Anton

National characters

Similar Messages

Maybe you are looking for