National characters

I have written a Java cross platform program that counts the frequency of words in a given text. I output a list with the words and their frequency to a JTextArea as well as to a text file. The program works fine but I still have a problem. My text is Swedish and the special national characters are not written as they should for instance "o with two dots over" is written as "‰" and so on. (This is on a Macintosh). Even worse when I run the program on a Windows machine all the three special Swedish characters are written by one and the same character.
If I output to the console these characters are written as \xxx.
What to do???

Check to make sure you're reading the file in with the right encoding. Sadly, there's no way to do it directly with a FileReader.
BufferedReader in = new BufferedReader(new InputStreamReader(new FileInputStream("foo.txt"), "UTF-8"));
Does your program display the output graphically to the user or does it write the results to a file? If it's to a file then it might be the output encoding.
Message was edited by: clevans

Similar Messages

  • Problem crawling filenames with national characters

    Hi
    I have a big problem with filenames containing national (danish) characters.
    The documents gets an entry in in wk$url but have error code 404 (Not found).
    I'm running Oracle RDBMS 9.2.0.1 on Redhat Advanced Server 2.1. The
    filesystem is mounted on the oracle server using NFS.
    I configure the Ultrasearch to crawl the specific directory containing
    several files, two of which contains national characters in their
    filenames. (ls -l)
    <..>
    -rw-rw-r-- 1 user group 13 Oct 4 13:36 crawlertest_linux_2_fxeFXE.txt
    -rw-rw-r-- 1 user group 19968 Oct 4 13:36 crawlertest_windows_fxeFXE.doc
    <..>
    (Since the preview function is not working in my Mozilla browser, I'm
    unable to tell whether or not the national characters will display
    properly in this post. But they represent lower and upper cases of the
    three special danish characters.)
    In the crawler log the following entries are added:
    <..>
    file://localhost/<DIR_PATH>/crawlertest_linux_2_B|C?C%C?C?.txt
    file://localhost/<DIR_PATH>/crawlertest_linux_2_B|C?C%C?C?.txt
    Processing file://localhost/<DIR_PATH>/crawlertest_linux_2_%e6%f8%e5%c6%d8%c5.txt
    WKG-30008: file://localhost/<DIR_PATH>/crawlertest_linux_2_%e6%f8%e5%c6%d8%c5.txt: Not found
    <..>
    file://localhost/<DIR_PATH>/crawlertest_windows_B|C?C%C?C?.doc
    file://localhost/<DIR_PATH>/crawlertest_windows_B|C?C%C?C?.doc
    Processing file://localhost/<DIR_PATH>/crawlertest_windows_%e6%f8%e5%c6%d8%c5.doc
    WKG-30008:
    file://localhost/<DIR_PATH>/crawlertest_windows_%e6%f8%e5%c6%d8%c5.doc:
    Not found
    <..>
    The 'file://' entries looks somewhat UTF encoded to me (some chars are
    missing because they are not printable) and the others looks URL
    encoded.
    All other files in the directory seems to process just fine!.
    In the wk$url table the following entries are added:
    (select status url from wk$url where url like '%crawlertest%'; )
    404 file://localhost/<DIR_PATH>/crawlertest_linux_2_%e6%f8%e5%c6%d8%c5.txt
    404 file://localhost/<DIR_PATH>/crawlertest_windows_%e6%f8%e5%c6%d8%c5.doc
    Just for testing purpose a
    SELECT utl_url.unescape('%e6%f8%e5%c6%d8%c5') from dual;
    Actually produce the expected resulat : fxeFXE
    To me this indicates that the actual filesystem scanning part of the
    crawler can sees the files, but the processing part of the crawler can
    not open the file for reading and it therefor fails with error 404.
    Since the crawler (to my knowledge is written in Java i did some
    experiments, with the following Java program.
    import java.io.*;
    class filetest {
    public static void main(String args[]) throws Exception {
    try {
    String dirname = "<DIR_PATH>";
    File dir = new File(dirname);
    File[] fs = dir.listFiles();
    for(int idx = 0; idx < fs.length; idx++) {
    if(fs[idx].canRead()) {
    System.out.print("Can Read: ");
    } else {
    System.out.print("Can NOT Read: ");
    System.out.println(fs[idx]);
    } catch(Exception e) {
    e.printStackTrace();
    The performance of this program is very depending on the language
    settings of the current shell (under Linux). If LC_ALL is set to "C"
    (which is a common default) the program can only read files with
    filenames NOT containing national characters (Just as the Ultrasearch
    crawler). If LC_ALL is set to e.g. "en_US", then it is capable of
    reading all the files.
    I therefor tried to set the LC_ALL environment for the oracle user on
    my oracle server (using locale_config, and .bash_profile) but that did
    not seem to fix the problem at hand.
    So (finally) my question is; is this a bug in the Ultrasearch crawler
    or simply a mis configuration of my execution environment. If the
    latter how do i configure my system correctly?
    Yours sincerely
    Martin Dahl Pedersen, Visanti ( mdp at visanti dot com )

    I've posted my problems as a TAR on METALINK a little week ago.
    And it turns out to be a new bug in UltraSearch.
    It is now filed under BUG:2673282
    -- mdp

  • How to make Reports 9i display Danish national characters?

    I am running Oracle9i Reports and cannot make Reports print the Danish national characters f, F, x, X, e and E. I have a development machine with Developer Suite 9.0.2, where I can run the report in Paper Design, where the characters displays correctly, but as soon as they are uploaded to the Application Server (9.0.2), all of the national characters are replaced with some very mysterious characters. The dev. machine and the Oracle9iAS machine both connect to the same database, and when I make a boilerplate object just containing "FXE", the problem is still there, so it does not seem to be a database issue.
    I read some articles on MetaLink about adding some lines in uifont.ali, but they do not seem to apply, since the article only mention East-European languages (Polish and Czech). The font used is Times New Roman. The dev. machine has NLS_LANG set to AMERICAN_AMERICA.WE8MSWIN1252, and the Oracle9iAS machine is running DANISH_DENMARK.WE8MSWIN1252 - ie. the same character set. I tried to generate the report both to HTML and PDF, but that did not make any difference regarding this issue.
    How do I make Oracle9i Reports Services display the Danish national characters correctly?
    Thanks in advance!

    Thanks for your suggestions.
    However, here's what I've done, and it did not make any difference.
    1. Changed the NLS_LANG parameter to match on both server and dev. machine and recompiled and saved the RDF - no difference.
    2. Installed the same model printer on the server, as the one on the development machine, and rebooted the server - no difference.
    3. Checked uifont.ali on both systems - they're exactly the same...
    What else might be causing this?

  • Editable drop down do not show national characters

    Hi
    I'm using DW CS3 with Developer toolbox, PHP MySql.
    Problem is that Editable drop down show national characters wrongly.
    actually its inserts data in to database with wrong encoding.
    I use encoding "charset=utf-8", all other forms working fine.
    Only Editable drop down show [squares] instead Ä Ö Ü ...
    How i can do that Editable drop down will inserts data in utf-8 encoding?
    (like other forms and fields in my page)
    Thanks!

    Does it help if you disable hardware acceleration ?
    *Tools > Options > Advanced > General > Browsing: "Use hardware acceleration when available"
    *https://support.mozilla.org/kb/Troubleshooting+extensions+and+themes
    *https://hacks.mozilla.org/2010/09/hardware-acceleration/

  • Table Import Data - "Insert script" - National characters

    Hi all,
    it looks like that there is a problem with support of national characters in imported data file when method "Insert script" is chosen.
    Table -> Import Data -> Open datafile "csv".
    As far as in the preview window I'm seeing properly displayed national characters from csv data file and when I'm choosing "Insert" or "SQL Loader" method - data is properly imported to the table.
    But when I'm using "Insert script" method, in generated script national characters are changed into "bushes":
    http://imm.io/V0J9
    SQL Developer: Version 3.2.20.09
    OS: Windows XP SP3
    Client code page: WIN-1250
    Tested databases: 10g, 11g

    <p>This has been fixed in the latest build. The patch is now available for <a href = "http://www.oracle.com/technology/software/products/sql/index.html">download</a>.
    </p>
    Regards
    </p>Sue

  • 10g client mangles national characters, 9i client is ok

    We are having a strange problem with some 10.2.0.4.0 clients on Windows XP. They make an incorrect conversion of national characters while querying from a 10.2.0.4.0 database. For example, the "ä" letter in the result set is converted to "a", which must not happen. When connecting to the same 10g database with a 9i client and issuing exactly the same SELECT statement, the result is correct. How can we make the 10g client treat national characters correctly?

    Thanks for your help everybody. Yes, there was a conflict between the database and client character sets. I used the NLS_LANG environment variable in Windows to instruct the client to use the same character set with the database, and this seems to solve the problem.
    I just wonder how the 9i client was able to do what we wanted, while there were problems with 10g. There are exactly the same NLS_LANG values in the registry for 9i and 10g, each containing a character set part that is inconsistent with that of the database. Also, after setting NLS_LANG in Windows, 9i still gave the correct result, as if NLS_LANG had no effect on it.

  • OVD - special/national characters in LDAP context

    Hi all,
    I created integration between Active Directory and Oracle 10g via Oracle Virtual Directory 10g. All works correctly but some users have national characters in his/her AD context. For example Thomas Bjørne (cn=Thomas Bjørne,cn=Users,dc=media,dc=local). In this case this user cannot login into database. I know that problem is with special national characters in AD context but I don't know how solve it. It is not possible change AD context :-(
    Can somebody help me with it?

    Lets first verify that you can bind to OID using the command line
    commands with an existing user in OID.
    Lets assume for a moment that your users password is welcome and
    their DN in OID is cn=jdoe,c=US
    Try the following command and tell me what the results are.
    ldapsearch -p port_num -h host_name -b "c=US" -s sub -v "cn=*"
    It should return all users under c=US. If not let me know the
    error message you get.

  • Losing NATIONAL CHARACTERS(blob- clob- table). unistr?

    Hello!
    I have a problem with national characters. My example is as follows:
    1. A csv file is uploaded from disk to htmldb_application_files
    2. This BLOB is then converted to CLOB with dbms_lob.converttoclob()
    3. Data from this CLOB is copied to PL/SQL array.
    4. From PL/SQL array to table in database.
    The problem: Either data copied to table in database loses national characters (display strange characters instead of national), or if I set my national character set id as an argument of dbms_lob.converttoclob() function I have an error - says that file is inconvertible.
    What is wrong? How can I solve my problem? Can unistr() help somewhere? Any ideas?
    Tom

    Duplicate posting, being addressed at:
    losing NATIONAL CHARACTERS(blob->clob->table). unistr?

  • File adapter, File encoding national characters

    Hi,
    I have a problem with national characters (ÅÄÖ) when sending (receiver adapter) files with the fileadapter.
    When i specify Transfere mode = Binary and File Type = Binary everything works fine but when i use Transfere mode =+ Text+ the national characters gets converted to "?". I have tried to set File Type = text and tryed File Encoding with UTF-8 and ISO-8859-1 without success.
    Please help!
    Regards
    Claes

    Hi,
    Check this out: <a href="https://www.sdn.sap.comhttp://www.sdn.sap.comhttp://www.sdn.sap.com/irj/sdn/go/portal/prtroot/docs/library/uuid/502991a2-45d9-2910-d99f-8aba5d79fb42">How To… Work with Character Encodings in Process Integration</a>
    Regards,
    Jakub

  • National characters in pages, accessed through Application/URl

    Jambo all!
    I hope you can help me with this little problem:
    1. I've created application and added URL item to it
    2. URL is pointing to an external ASP page (I hope that fact that this is an ASP page does not influent the behavior)
    3. I've published it as a portlet and added it to a page.
    Everything is working OK except our national accented characters - they rae all converted to '?' sometime in the process of rendering page. There really are question marks instead of proper characters in the source code so any usage of browser encoding or meta tags in html is useless ;(
    The question is - how can I convince page (URL) rendering system to leave my national characters intact?
    TNX a lot in advance!
    null

    Solution of this problem is shown here:
    Re: Error Message: print success message checksum content error in Apex 4.0

  • National characters and new Java API

    Hi All,
    I'm looking for your experience with new java api and national characters (like: ü, &#347;, &#263;, etc.). The problem is that when record was updated using MDM Data Manager, and retrieved using new java api - national character are invalid (in java string the national character are represented incorrectly).
    It's strange due to fact that when I create or update this record from java API it's looks fine. Second finding is that old java api (MDM4J) works fine on text fields with national characters.
    Maybe I forget to set something in server configuration / repository / or on java api connection - any help appreciated...
    Regards, marcin

    While retrieving data via the Java API 2,
    you should set the Unicode Normalization after the user session is authenticated.
    I guess this is available in SP5 patch.
    The documentation for this is available at
    https://help.sap.com/javadocs/MDM/current/index.html
    Package: com.sap.mdm.commands
    SetUnicodeNormalizationCommand cmd = new SetUnicodeNormalizationCommand(connectionAccessor);
    cmd.setSession(userSession);
    <b>cmd.setNormalizationType</b>(SetUnicodeNormalizationCommand.NORMALIZATION_COMPOSED);
    cmd.execute();
    This command is used to set the Unicode normalization.  This is used for the lifetime of the session. It should be set after the session is authenticated.
    Unicode normalization is important when a text string is represented differently depending on the normalization used. The MDM server always store text strings in one normalization format. An user providing a text string to the MDM server and later on tries to retrieve back the same text string might get the text string back in a different normalization. To resolve this issue, the user can use this class to specify the normalization the user wants to work with. The MDM server will always return text strings in the normalization specified by this class.

  • National characters (code page) problem

    I made JSP page with code page 1250 with characters specific to this code page. In JDeveloper everything looks OK. Compiled page (Java file) also shows good, but when I open it in Web browser all national characters are lost (question marks instead of letters). Can anybody help me to solve this problem?
    Note: JDeveloper is configured to mentioned code page.

    have you tried posting in the ABAP webdypro forum?

  • National characters problem

    Hi.
    I'm using AE on XE 10.2.0.1.0
    I have problem with typing national characters f.e. in updatable Report Attributes Column Heading (Custom). If i type name for heading "Ilo&#347;&#263;", then push "Apply changes", name are saved without national characters, "Ilosc".
    Why it is happenig ?
    Should i change settings in Application ? Or on database ?
    Should i use another Browser (currentlny SeaMonkey) ?
    I have download "Oracle Database 10g Express Edition (Western European)".
    Should I download and use "Oracle Database 10g Express Edition (Universal)" ???
    My APP globalization parameters:
    Application Primary Language      : Polish (pl)
    Application Language Derived From: Application Preference (using FSP_LANGUAGE_PRFERENCE)
    Automatic CSV Encoding: no
    My DB NLS settings :
    NLS_CALENDAR     GREGORIAN
    NLS_CHARACTERSET     WE8MSWIN1252
    NLS_COMP     BINARY
    NLS_CURRENCY     zl
    NLS_DATE_FORMAT     RR/MM/DD
    NLS_DATE_LANGUAGE     POLISH
    NLS_DUAL_CURRENCY     zl
    NLS_ISO_CURRENCY     POLAND
    NLS_LANGUAGE     POLISH
    NLS_LENGTH_SEMANTICS     BYTE
    NLS_NCHAR_CHARACTERSET     AL16UTF16
    NLS_NCHAR_CONV_EXCP     FALSE
    NLS_NUMERIC_CHARACTERS     ,
    NLS_SORT     POLISH
    NLS_TERRITORY     POLAND
    NLS_TIME_FORMAT     HH24:MI:SSXFF
    NLS_TIMESTAMP_FORMAT     RR/MM/DD HH24:MI:SSXFF
    NLS_TIMESTAMP_TZ_FORMAT     RR/MM/DD HH24:MI:SSXFF TZR
    NLS_TIME_TZ_FORMAT     HH24:MI:SSXFF TZR

    N'<national symbols>', being part of an SQL statement, will be converted to the database character set (WE8ISO8859P1) before being parsed. Only if the client and the database are both 10.2 or higher, the client can encode the literal appropriately so that it survives this conversion.
    In earlier versions, you can do the encoding yourself. Instead of the N'<national symbols>' literal use the UNISTR function: UNISTR('\xxxx\yyyy\zzzz'), where U+xxxx, U+yyyy, U+zzzz are Unicode code points of your national characters.
    -- Sergiusz

  • Problem with special national characters

    Hi,
    How can I turn on the Oracle Application Server 10g to correct expose special national characters (ANSI 1250 Central Europe page)?
    It hosted on Windows Server 2003 where are appropriate character resources.
    Thanks in advance
    KM

    Check the available languages in SMLT (trn). In example stated below the characters coming from DI are Spanish characters, which are gettnig converted to Swedish 1s.
    Please go through the following:
    Re: Japanese characters

  • How to send Oracle rowid to servlet? | Problem with national characters.

    There is same possibility how to send rowid to servlet?
    I have now definition like this:
    <af:image source="/imageservlet?Par1=#{bindings.Col1.inputValue}"/>
    But If column contents national characters, servlet methods obtained changed these characters.
    My idea is to use not primary key for row, but use oracle rowid. It is simply possible?
    Use something like this:
    <af:image source="/imageservlet?Rowid=#{bindings.Rowid}"/
    Or Do you have ideas how to solve problem with national characters ?
    Thanks
    FiL

    Hi,
    Although your workaround works.
    I think this is a simple encoding problem.
    I simply need to make sure all parameters and pages are encoded with a char set which contains the national characters you mentioned.
    This is a bit dependent on the exact technology your using, but most can be done via the web.xml:
      <jsp-config>
          <jsp-property-group>
              <url-pattern>*.jsp</url-pattern>
              <page-encoding>UTF-8</page-encoding>
          </jsp-property-group>
      </jsp-config>     This forces all JSP pages to be encoded in UTF-8
    Adding the following parameter sometimes helps as well, although I think this one is a bit dated:
    You said your using a servlet so your servlet needs a similar block for its pattern
      <context-param>
        <param-name>PARAMETER_ENCODING</param-name>
        <param-value>UTF-8</param-value>
      </context-param>If you want to be 100% sure the encoding is set right make sure thepages contain:
    <%@ page contentType="text/html;charset=utf-8"%>Or depending on your view technology the syntax can be a bit different
    -Anton

Maybe you are looking for

  • Safari Crashes on Yahoo Mail Log-in

    I've been experiencing this for a couple of months now. I've sent the reports to Apple, moved my Safari prefs file to the desktop, trashed the Safari icon folder and none of it has solved the problem. I've been using FireFox in the meantime, but for

  • OIM 11G, DSML integration failing  with null pointer exception

    Hi, we are facing the similar probelm while sending a request from TIBCO BW to OIM 11G (Which is weblogic) The below request from TIBCO is not working and thowing a NULL POINTER EXCEPTION <?xml version="1.0" encoding="UTF-8"?> <SOAP-ENV:Envelope xmln

  • Changing the home name

    I just bought a powermac g5 off my friend and the main user account is under my name but the home icon is still in his name and most of the computer information has his name on it. I don't want to re do the whole computer because it has a lot of prog

  • Header div not displaying correctly in Explorer

    Can someone help me with this IE browser problem?I have a header div inside the container div. The container is displaying correctly - center of page. The header div is not! It is off to the right and half the header is off the page. All other browse

  • Image "shimmers" in menu page

    hey all, we authored a dvd with dvdsp5, and the dvd came out fine except that the picture ( a normal jpeg) "shimmers" in the menu page. also, our freeze frame shots in the film project ( edited on fcehd 3.0)itself also "shimmer." we used qt coversion