HibernateTemplate: saving unicode in MSSQL

Hi,
I'm trying to save an object using the HibernateTemplate (http://www.springframework.org/docs/api/org/springframework/orm/hibernate/HibernateTemplate.html).
The String members have russian characters. When I copy paste the data manually into the database everything stays intact. However when I use HibernateTemplate.saveOrUpdate(object) all unicode chars get replaced by '?'
Is there anyway I could fix this ? I've been searching for any setting in spring to use unicode chars, without any succes.

If you can't save the characters as Unicode chars, then you have to find an encoding that the [rtf application?] can use that correctly encodes the necessary characters. Then specify the encoding when your Java program writes the output, using something like java.io.OutputStreamWriter or java.nio.charset.CharsetEncoder

Similar Messages

  • Problems saving Unicode in an Oracle CLOB

    We found a problem with saving Unicode in an Oracle CLOB. We use thin
    drivers and Oracle 9.2.0.3.0 The character set of the database is UTF8.
    The following code works:
    // Retrieve clob object into ResultSet rs
    // Data to be saved is in String str
    // Save data:
    oracle.sql.CLOB theClob = (oracle.sql.CLOB)rs.getObject(1);
    theClob.putString(1, str);
    But the following code does not save the data properly while this is
    usually recommended in code samples:
    oracle.sql.CLOB theClob = (oracle.sql.CLOB)rs.getObject(1);
    Writer out = theClob.getCharacterOutputStream();
    out.write( str.toCharArray() );
    out.flush();
    out.close();
    What could be the reasons why the second approach is failing?
    Regards,
    Joop Kaashoek

    Thanks for your reply.
    When I insert .gif files JDeveloper ends with a message saying "Process exited with exit code 0". I then go and check in the database and I find the image added to the table.
    With a word doc, JDeveloper does not give that message. No message at all regarding what the status of the process is. And the document is not added to the database. No error messages too. Could this be an issue with Oracle?

  • PreparedStatement set date sometimes sets the date one day behind

    I have a PreparedStatement that sometimes sets the date a day behind. I am saving to a MSSQL DB with a field datetime. I have two identical PreparedStatments, one for insert and one for update. When either is executed, it will sometimes set the date back one day. It's not everytime. Every other or every third one, but it's not consistent. Any help would be appreciated.
    ps.setDate(1, Util.parseSqlDate(getParam("CHARGED")));
    public class Util {
         public static java.sql.Date parseSqlDate(String datestr) {
              DateFormat sdf = null;
              if (datestr == null)
                   return null;
              if (datestr.length() > 8) {
                   sdf = new SimpleDateFormat("MM/dd/yyyy");
              else {
                   sdf = new SimpleDateFormat("MM/dd/yy");
              java.util.Date d = null;
              try {
               d = sdf.parse(datestr);
              } catch (ParseException e) {
                   return null;
              if (d != null) {
                   Calendar cal = Calendar.getInstance();
                   cal.setTime(d);
                   return new java.sql.Date(cal.getTimeInMillis());
              return null;
            protected String getParam(String name) {
              return (getParamArray(name)== null) ? null : getParamArray(name)[0];
         protected String[] getParamArray(String name) {
              return (String[])params.get(name);
         }

    traigo wrote:
    The database is a datetime field. Then you should be using the appropriate java jdbc time/date methods to access it rather than strings.
    We are only storing the date portion. Saving with today's date should produce '2009-12-28 00:00:00.000'.
    I just want to set the date to an absolute date (no time value) provided without timezones.Impossible. Since the database datatype is datetime that means that a timezone is always involved.
    And java always uses timezones. Ignoring the problem doesn't make it go away.

  • OK last hurdle on this project

    I have JTextAreas which will take Russain characters for input, only problem is when I save all I get is ??????
    any place I can read up on saving Cyrillic characters
    Jim

    Did you save your files in Unicode formats, such as UTF-8 or UTF-16? Check the source code of VietPad text editor (vietpad.sourceforge.net) for examples about saving Unicode characters.

  • Unicode MSSQL

    Hi All,
    I'm inserting japanese charecters into MSSQL server. when i do that, i can see in the insert query in th audit log with japanese charecters. but in the MSSQL server, they are inserted as ?????. the columns are already defined in the MSSQL as the data types <b>nchar</b> and <b>nvarchar</b>
    I modified the <b>URL</b> with charset <b>UTF-8</b>. i had also tried with the option sql7=true. but no use....
    Can anyone help me in this regard.
    Regards
    Anil

    Hi Anil,
    hmmm....tricky..
    For a unicode system SQL Server uses the datatype "nvarchar".
    For a non-unicode system SQL Server uses the datatype "varchar".
    SQL Server always uses UNICODE UCS-2 character set (2 bytes per
    unicode character) independent from the collation of the SQL S
    You can always store 10 unicode characters in a field defined
    nvarchar(10).
    A collation defines the code page and sort order. SAP only support
    the collation SQL_Latin1_General_CP850_BIN2 for unicode system.
    Did you install the correct collation ? See SAP note 600027.
    or maybe you can look for Microsoft Support on this case ?
    cheers,
    Vincent

  • SAVING FILES AS  UNICODE TEXT / XML / M3U / M3U8

    HI, unable to find a clear explaination for what saving a file as UNICODE TEXT / XML / M3U / M3U8  is all about .... I want to archive and EXPORT music from I tunes to a external memory or a flash drive when I use the EXPORT option from the PLAYLIST I created.  Now I have 4 formats to save with  UNICODE TEXT / XML / M3U / M3U8   I want too save all the info of that particular song stored with I tunes (gendre, EQ, BPM, notes)   the reason I'm using EXPORT is I want to remove the songs from my library after EXPORTING the song. Is there another way to TRANSFER music with all the saved info 

    I suggest you engade Wikipedia and Webopida, do some export experimentation and don't delete anything unless you have a few backups and sure of the results.
    http://www.webopedia.com/
    Most commonly used backup methods
    You might be exporting the playlist itself, the songs are not included.
    You can apply tags and groupings to the song files themselves that get transferred with the song file, to whereever it goes.
    You'l lhave to play around and learn yourself or read a online iTunes instruction manual.
    We are here to solve supoprt related issues, not engage in a huge educational endeavor.

  • Generating PDF-files from HTML-page saved as Unicode?

    I have followed this Quick Start on how to generate a PDF-file from HTML using web services in .NET: http://livedocs.adobe.com/livecycle/8.2/programLC/programmer/help/000093.html
    It works just fine when the html-page is saved as ANSI, but when it's saved as UNICODE I get problem. The code runs without errors but the PFD looks really strange. Any suggestions on how to solve this? I really need to use UNICODE as my application needs to handle different languages (including for example Chinese).

    I found out that UTF-8 worked as well so the problem is solved. :-)

  • Saving html file with Unicode character(Japnese)

    Hello,
    I am having some data in table with japnese characters.
    I want to save html file which contains above data in table format and open this file in excel.
    I am using
    new String(str.getBytes("ISO8859_1"),"UTF8");
    which gives me desired output in browser(using servlet). but when i write to file, same output using BufferedWriter.
    it gives me "????".
    Is there in other way to do it.

    don't you need "UTF16" as the encoding if you want extended Unicode?
    I think the Writer might strip off upper 8 bits if you specify UTF8.
    Just a thought, though I haven't tried it, so it may be a red herring.

  • Storing unicode (khmer) woes

    Hey all,
    I am working on an applicaiton that needs to accept the khmer language in various text inputs and mssql database. I have gotten most of it working, but still have one bug. I can display khmer characters if they are typed in. If I copy and paste khmer text directly in my database, and query for it, it comes out properly. The issue is when I take khmer text from a form field and insert it, then it is transformed into a bunch of ?????.
    Here are the steps I've taken so far to enable unicode on my website
    - Configured the datasource to accept high ascii values and unicode
    - Configured the database table columns to be of type nvarchar
    - Added
         <cfscript>
            SetEncoding("form","utf-8");
            SetEncoding("url","utf-8");
           </cfscript>
           <cfcontent type="text/html; charset=utf-8">
    to my application.cfm file.
    -Added <META http-equiv="Content-Type" content="text/html; charset=utf-8"> in the head of my pages.
    -Added <cfprocessingdirective pageEncoding="utf-8"> on my page that attempts to update the database.
    It's weird. If i copy and paste khmer directly in the DB and query for it that works fine. If I hard code some khmer on a page, that displays fine to. If I type in khmer into a form, and dump the form value back out, that works. It's only when a form value is saved to the database and pulled back out is it mangled. You can see an example here of what I'm talking about.
    http://www.psasmart.com/test.cfm
    And here is the code that makes that page.
    <cfscript>
            SetEncoding("form","utf-8");
            SetEncoding("url","utf-8");
    </cfscript>
    <cfcontent type="text/html; charset=utf-8">
    <META http-equiv="Content-Type" content="text/html; charset=utf-8">
    <cfprocessingdirective pageEncoding="utf-8">
    If you have the Khmer language pack installed this:  <h2>ម៉ោងផ្សាយ-រលកធ</h2> should appear as cambodian text.
    <hr />
    <form name="submitForm" method="post" accept-charset="utf-8">
         Now enter some Khmer text to save to the database: <input name="text" type="text" value="ម៉ោងផ្សាយ-រលកធ">
        <br />
        <input type="submit" name="submit" value="submit" >
    </form>
    <cfoutput>
         <cfif isdefined("form.submit")>
              This is the same text as entered in the form: <h2>#form.text#</h2><br />
              <cfquery name="update" datasource="#application.dsn#" >
                   Update serverSettings
                   SET khmerReadWriteTest ='#form.text#'
              </cfquery>
         </cfif>
         <Cfquery name="getKhemer" datasource="#application.dsn#">
              select khmerReadTest, khmerReadWriteTest
              from serverSettings
         </Cfquery>
         This is the same text as entered in the form but saved to the db and queried for then displayed:  <h2>#getKhemer.khmerReadWriteTest#</h2>
         This is some Sample Khmer Text Inputed Directly in the database then queried for and displayed:  <h2>#getKhemer.khmerReadTest#</h2>
    </cfoutput>

    On 2/3/2011 12:17 AM, kenji776 said:
    >
    Hey all, I am working on an applicaiton that needs to accept the khmer
    language in various text inputs and mssql database. I have gotten most of it
    working, but still have one bug. I can display khmer characters if they are
    typed in. If I copy and paste khmer text directly in my database, and query
    for it, it comes out properly. The issue is when I take khmer text from a
    form field and insert it, then it is transformed into a bunch of ?????.
    you should already know the answer to this. btw it's not just khmer, it's any
    unicode encoded text.
    first the usual suspects: what db driver? 100% sure you're using the correct dsn?
    then this caught my eye: SET khmerReadWriteTest ='#form.text#'
    uh either use cfqueryparam (good practice besides you turned on unicode in the
    dsn anyway) or unicode hinting.:
    SET khmerReadWriteTest=N'#form.text#'
    guess you didn't look close enough at my "greek test" code

  • Problem while saveing in Table Control.

    Hi
    I am geting error while saving data in table control.
    This table contol is pop up window at end of screen .
    user enter qty & uom in TC..
    Error i am getting is...
    Field symbol has not been assigned.
    Error analysis                                                                               
    The system tried to access an anasigned field symbol (data segment        
    number 32772).                                                                               
    The field symbol is no longer assigned, because a Unicode program         
    previously tried to set the field symbol using an ASSIGN statement with   
    an offset/length declaration. The memory addressed in this offset/length  
    declaration, however, no longer lay within the valid range.                                                                               
    Information on where terminated                                                                               
    The termination occurred in the ABAP program "SAPLOMCV" in                
    "CONVERSION_EXIT_MATN1_INPUT".                                           
    i Have coded like this..
    MODULE USER_COMMAND_0112 INPUT.
      CASE OKCODE.
        WHEN 'BACK' or 'CANCEL'.
          SET SCREEN 0.
    This iti is contain data which diplay in TC
         when 'DISPLAY'.
              LOOP AT ITI.
              read table iti with key ingr_code = iti-ingr_code
                                      ingr_desc = iti-ingr_desc.
              wka1-ingr_code = iti-ingr_code.
              wka1-ingr_desc = iti-ingr_desc.
              wka1-conc = iti-conc.
              wka1-quantity = iti-quantity.
              wka1-uom = iti-uom.
            append wka1 to itf.
            ENDLOOP.
         WHEN 'SAV'.
          loop at itf where check = 'x' .
                  update zacg_ns
                    set ingr_code = itf-ingr_code
                    col_name = itf-ingr_desc
                    conc = itf-conc
                    quantity = itf-quantity
                      UOM =  itf-UOM
                      ru = itf-ru
                      where ingr_code = itf-ingr_code
                      and col_name = itf-ingr_desc.
        zacg_ns-ingr_code = itf-ingr_code.
       zacg_ns-col_name = itf-ingr_desc.
       zacg_ns-conc = itf-conc.
      zacg_ns-quantity = itf-quantity.
    zacg_ns-UOM = itf-UOM.
    update zacg_ns.
                     endloop.
                     leave program.
                  ENDCASE.
              ENDMODULE.                 " USER_COMMAND_0112  INPUT
    MODULE read_table_control INPUT
    MODULE read_table_control INPUT.
      MODIFY itf  INDEX tc-current_line.
    ENDMODULE.                    "read_table_control INPUT
    *&  Include           ZACG_NS_2                                        *
    *&      Module  STATUS_0111  OUTPUT
          text
    MODULE STATUS_0111 OUTPUT.
      SET PF-STATUS 'ZNEWSHADE'.
      SET TITLEBAR 'ZNS'.
    ENDMODULE.                 " STATUS_0111  OUTPUT
    *&      Module  STATUS_0112  OUTPUT
          text
    MODULE STATUS_0112 OUTPUT.
      SET PF-STATUS 'ZTC'.
    SET TITLEBAR 'xxx'.
      DESCRIBE TABLE itf LINES lines.
      tc-lines = lines.
    ENDMODULE.                 " STATUS_0112  OUTPUt
                                                                                    PROCESS BEFORE OUTPUT.
    MODULE STATUS_0112.
    LOOP at itf WITH CONTROL TC CURSOR tc-current_line.
        MODULE TC_PBO  .
      ENDLOOP.
    PROCESS AFTER INPUT.
      MODULE CANCEL AT EXIT-COMMAND.
    LOOP at itf .
        module read_table_control.
      ENDLOOP.
    MODULE USER_COMMAND_0112.
    Can any one help me..

    Hi,
    Did you enter that filed later after creating the table control?
    You check in the element list whether for that element u have an entry or not..
    Regards,
    Nishant

  • How do i use an input file with Asian characters(Unicode)?

    /* Ardor
    * Illiteraminator.java
    * Version beta 1.0
    * December 7, 2007
    * Main interfacing class
    import java.io.*;
    import java.util.*;
    public class Illiteraminator{
      public static void main (String [] args){
      //  ArrayList<Word> dictionary = new ArrayList<Word>();
        String fileName = "Mandrin.txt";
        String character= "",definition = "",inputLine;
        try{
          Scanner fileScan = new Scanner (new File (fileName));
          while (fileScan.hasNext()){
            inputLine = fileScan.nextLine();
            Scanner sc = new Scanner(inputLine);
            character = sc.next();
            while (sc.hasNext()){
              definition = definition + " " + sc.next();
            }//end while sc
           // dictionary.add(new Word(character, definition));
            //definition = "";
            //character = "";
          }//end while fileScan
        } catch (FileNotFoundException e){
          System.out.println("File not found, dig around for Mandrin.txt");
          System.exit(1);
        }//end catch
        System.out.println(character);
        System.out.println(definition);
      }//end main
    }//end class IlliteraminatorHi, i'm a first time programmer. Never touched programming until i took a Java class in university last semester. I am currently attempting to write a program to help me move away from my illiteracy in Mandrin. So, that's my code, and i am using Dr.Java while writing it. When i tested it out the output looked something like this v
    A p p l e
    M o n k e y C a t D o n k e y
    My input file is saved in Unicode. It contains letters that cannot be saved in ANSI. I tried UTF-8, but the interactions section showed no output...
    Is this just a problem with Dr.Java? Will i encounter a similar problem when i turn this into a GUI?
    The following is a copy and pasted version of the txt file i used as input. It is saved in the Unicode format.
    &#30340;[de] <grammatical particle marking genitive as well as simple and composed adjectives>; &#25105;* w&#466;de my; &#39640;* g&#257;ode high, tall; &#26159;* sh�de that's it, that's right; &#26159;...* sh�...de one who...; &#20182;&#26159;&#35828;&#27721;&#35821;*. T&#257; sh� shu&#333; H�ny&#468;de. He is one who speaks Chinese. [d�] &#30446;* m�d� goal [d�] true, real; *&#30830; d�qu� certainly
    &#19968;(A&#22777;)     [y&#299;] one, a little; &#31532;* d�-y&#299; first, primary; &#30475;*&#30475; k�ny&#299;k�n have a (quick) look at [y�] (used before tone #4); *&#20010;&#20154; y� g� r�n one person; *&#23450; y�d�ng certain; *&#26679; y�y�ng same; *&#26376;y�yu� January [y�] (used before tones #2 and #3); *&#28857;&#20799; y�di&#462;nr a little; *&#20123; y�xi&#275; some {Compare with &#24186;(F&#20040;) y&#257;o, which also means "one"}
    &#26159;     [sh�] to be, *&#19981;*? sh�bush�? is (it) or is (it) not?; *&#21542; sh�f&#466;u whether or not, is (it) or is (it) not?

    Sorry, but i do not understand this post at all... Can anyone explain it to me? Is he saying my IDE is running on something other than Unicode?
    PS: I tried one of the Scanner constructors that takes a charset parameter. That fixed the odd output! However, every Chinese character has been replaced with a question mark. (It was a series of weird characters before i used the constructor with a charset parameter.)

  • Getting ÿþ as saved conversations from Lync in Outlook in Office 2013

    Hi,
    I've been trying to get to the bottom of this and have found similar posts, but no one seems to have an answer.
    When I IM someone using Lync 2013, they get a pop up notification but instead of the message they see ÿþ<.  Once they open the chat window, they can see my typed text.  Occasionally, certain people can't see the first line of my chat, but as
    long as they keep the chat window open, they can see everything new I type.
    All my conversations that are saved in outlook show ÿþ< for the text and are unreadable.  I've disabled the saving of conversations because they have become worthless.
    I believe it has to do with BOM but have not been able to find a way to fix this.
    If I copy a conversation from the chat window and paste it into Microsoft Word it shows ÿþ<, but if I paste it into notepad the conversation appears.
    (I had inserted a screenshot here, but am unable to because I am unable to figure out how to get my account "verified")
    I've tried changing the preferred encoding for outgoing messages: to Unicode (UTF-8) in Outlook, but this had no effect and I can't find a similar option in Lync 2013.
    (I had inserted a screenshot here, but am unable to because I am unable to figure out how to get my account "verified")
    I enabled logging for Lync and the event IDs that come up are 1, 11 and 12, to which I cannot find any information for at the moment.
    Any help and or suggestions would be appreciated.

    Hi,
    Did the issue happen only for you or for multiple users?
    Please try to delete Lync User Profile and information on Registry, then repair Office 2013.
    The path of Lync User Profile: %UserProfile%\AppData\Local\Microsoft\Office\15.0\Lync
    The path for information on Registry: HKCU\Software\Microsoft\Office\15.0\Lync\[email protected]
    Then test the issue again.
    Best Regards,
    Eason Huang
    Eason Huang
    TechNet Community Support

  • Is it possible to applescript the saving of a message to RTF in Mail.app?

    I'm fiddling with a script that takes selected messages in Mail.app and, among other things, saves the messages as RTF files. The "Save As..." option in the File menu has the ability to save as the RTF type, but I can't seem to get this done in applescript with the "save" command. I've tried passing it a window object from the message viewer, but it just barfs. Does anyone know if Mail.app can be scripted to save a message to an RTF file (with a file name/path supplied via the AS program)?
    Thanks in advance for your help.
    P.S. I've tried circumventing this problem by trying to assemble a text document from the email message parts (subject, sender, contents, etc.) using TextEdit, with no success with formatting, but I'd rather get Mail to do it because it already formats the saved email nicely if you do it interactively in the "Save As..." dialog.

    Just to make my question a little less abstract, here is the script as it stands so far:
    (* START SCRIPT *)
    tell application "Mail"
    activate
    set selectedMessages to the selected messages of front message viewer
    set saveFolder to choose folder with prompt "Please pick an empty folder for me to store the email information:"
    tell application "Finder"
    set parentFolder to folder saveFolder
    set parentFolderPath to parentFolder as Unicode text
    end tell
    repeat with aMessage in selectedMessages
    properties of aMessage
    set messageSender to sender of aMessage
    set messageSubject to subject of aMessage
    set messageSent to date sent of aMessage as string
    set messageContent to content of aMessage
    if (messageSender is "[email protected]") and (messageSubject begins with "manuscript #") then
    -- Get thingie number from message subject after '#'
    set oldDelimiter to my SwitchDelimiterTo({"#"})
    set manuscriptNumber to last text item of messageSubject
    my SwitchDelimiterTo(oldDelimiter)
    log "NOTE: Email for manuscript #" & manuscriptNumber
    -- Make the folder in which to store the stuff for this manuscript;
    -- variable "manuscriptFolder" will point to place to store files
    tell application "Finder"
    try --ignore problems with folder creation, like existing folders/files
    set manuscriptFolder to make new folder in parentFolder with properties {name:manuscriptNumber}
    on error errorName number errorNumber
    if errorNumber is -48 then -- Folder already exists
    log "NOTE: folder " & parentFolderPath & manuscriptNumber & " already exists."
    set manuscriptFolder to folder manuscriptNumber in parentFolder
    end if
    end try
    end tell
    set manuscriptFolderPath to manuscriptFolder as Unicode text
    set messageFileName to manuscriptFolderPath & "email.rtf"
    log "NOTE: Saving manuscript to file: " & messageFileName
    -- Use TextEdit to construct an RTF file of the message to print out later
    tell application "TextEdit"
    activate
    close every document saving no
    -- Assemble the document
    set docText to "Date:" & tab & messageSent & return
    set docText to docText & "Subject:" & tab & messageSubject & return & return
    set docText to docText & messageContent & return
    log "DOCTEXT: " & docText
    set aDocument to make new document at beginning of documents
    set text of the front document to docText
    -- Save the document as an RTF in the messageFileName
    tell front document
    save in messageFileName as "RTF "
    end tell
    end tell
    else
    log "Message from \"" & messageSender & "\" with subject \"" & messageSubject & "\" will not be processed."
    end if
    end repeat
    end tell
    -- Change the text item delimiter and return the existing ones
    on SwitchDelimiterTo(delimiterList)
    local x
    set x to (get AppleScript's text item delimiters)
    set AppleScript's text item delimiters to delimiterList
    return x
    end SwitchDelimiterTo
    (* END SCRIPT *)
    It successfully walks throught the selected messages, picks out the "right" ones (from a particular address and with a particular subject), makes a folder to store some information about the email, has TextEdit assemble a document from the email's text and has TextEdit save the text file.
    It's my kludgey work-around for not being able to either 1) print to a PDF file in Mail.app or 2) have Mail.app save the message itself as an RTF. It would be the bee's knees to get either 1 or 2 working, but I'd settle for figuring out how to format the text in TextEdit more nicely. I'd like to get the strings "Subject:" and "Date:" bold-faced for starters, but I have no way of knowing how to tell TextEdit to do it via AS.
    My first try was to try to "make new text" with certain font and size properties at the end of the "texts", but that was a flop. I tried the same thing with "attribute runs" and "paragraphs" with no joy. It seems that the only way to get text into a document is to assign it to the "document"'s "text" property. But then I'm left to wonder how to select bits of it to change the bits appearances (fonts and sizes).
    I've been fiddling with AS and automating Mail.app and TextEdit.app for days now, and it's just as perplexing as when I started. Any insite that you folks monitoring this discussion group would be greatly appreciated. The flotsam of script examples I've dredged up with Google haven't been enlightening. I've got a copy of O'Reilly's _AppleScript - The Definitive Guide_, and it's been a tiny bit of help, but I'm drifting.
    Thanks in advance for reading.

  • Need to read Unicode in a file

    Hi,
    My need to read Unicode from a file (on a Windows box) is due to the fact my software is used in different countries and on different keyboard naturally. All the users are not computer literate but, like me, they are all lazy and want to put their username and password in a config file my application reads. If their username or password contain Unicode characters I have a problem reading.
    They are simple users that I would like to advise them to open the config file using Windows Notepad, then type in their username and password, and save the file as Unicode. Notepad has four ways to save a file, ANSI, Unicode, Unicode big endian, and UTF-8 (I've tried them all except ANSI of course). Saving a file in a different format is as complicated as I would like it to get for them, some will have trouble even with this.
    I read the file like so:
    BufferedReader rdr =
        new BufferedReader(
            new InputStreamReader(new FileInputStream(file_name), "UTF-16"));
    String line;
    while ((line = rdr.readLine()) != null) {
        String[] pieces = line.split("[=:]");
        if (pieces.length == 2) {
            if (pieces[0].equals("PASSWORD")) {
                byte[] possibleUnicode = pieces[1].getBytes("various encodings");
                pieces[1] = new String(possibleUnicode, "various encodings");
            propertyTable.setProperty(pieces[0], pieces[1]);
    }All reading is perfect except for a username or password which can contain a real multi-byte character. I have used many variations of converting the string I get into a byte[] using string_in.getBytes("various encodings tried") and then back to a string but nothing has worked.
    I tried a regular FileReader to a BufferedReader and that didn't work. I tried a FileInputStreamto a DataInputStream and that didn't work. I accomplished the most with what I described above, FileInputStream to InputStreamReader to BufferedReader.
    Does anyone know how to read Unicode in a file on a Windows file system?
    hopi

    I have used the byte conversion technique before
    successfully when I loaded a set of properties
    from a URL openStream(). The properties load()
    method takes an InputStream and assumes ISO-8859-1
    so I converted the bytes from ISO-8859-1 to UTF-8.
    Garbage characters were cleared up perfectly.I think you just got lucky that time. For characters up to U+007F, the UTF-8 encoding is the same as ISO-8859-1 (and most other encodings, for that matter). Characters in the range U+0080 to U+00FF will be encoded with one byte in ISO-8859-1, and with two bytes in UTF-8. In most cases, each of the two bytes in the UTF-8 representation will have values that are valid in ISO-8859-1. The decoded characters will be incorrect (and there will be too many of them), but they effectively preserve the original byte values, making it possible for you to re-encode the characters and then decode them correctly. But there's a big gap in the middle where the UTF-8 bytes produce garbage when decoded as IS)-8859-1. Run the included program to see what I mean.
    I don't know what's going wrong with your application, but I do know that changing the encoding retroactively is not the solution. I also think you're right about asking users save files in a certain encoding. Considering how much trouble programmers have with this stuff, it's definitely too much to ask of users.
    import java.awt.Font;
    import javax.swing.*;
    public class Test
      public static void main(String... args) throws Exception
        JTextArea ta = new JTextArea();
        ta.setFont(new Font("monospaced", Font.PLAIN, 14));
        JFrame frame = new JFrame();
        frame.add(new JScrollPane(ta));
        frame.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
        StringBuilder sb = new StringBuilder();
        for (int i = 0xA0; i <= 0xFF; i++)
          sb.append((char)i);
        String str1 = sb.toString();
        byte[] utfBytes = str1.getBytes("UTF-8");
        String str2 = new String(utfBytes, "ISO-8859-1");
        for (int i = 0, j = 0; i < str1.length(); i++, j += 2)
          char ch = str1.charAt(i);
          byte b1 = utfBytes[j];
          byte b2 = utfBytes[j+1];
          String s1 = Integer.toBinaryString(b1 & 0xFF);
          String s2 = Integer.toBinaryString(b2 & 0xFF);
          char ch1 = str2.charAt(j);
          char ch2 = str2.charAt(j+1);
          ta.append(String.format("%2c%10s%10s%3x%3x%3c%3c\n",
              ch, s1, s2, b1, b2, ch1, ch2));
        frame.setSize(400, 700);
        frame.setLocationRelativeTo(null);
        frame.setVisible(true);
    }

  • How do I tell if a File is ANSI, unicode or UTF8?

    I have a jumble of file types - they should all be the same, but they are not.
    How do I tell which type a file has been saved in?
    (and how do I tell a file to save in a certain type?)

    "unicode or UTF-8" ?? UTF-8 is unicode !NO! UTF-8 is not UNICODE. Yes it is !!No it is not.
    And to prove it I refer to your links.........
    You simply cannot say "unicode or UTF-8" just because
    UTF is Unicode Transformation Format.UTF is a transfomation of UNICODE but it is not UNICODE. This is not playing with words. One of the big problems I see on these forums is people saying the Java uses UTF-8 to represent Strings but it does not, it uses UNICODE point values.
    You can say "UTF-8 or UTF16-BE or UTF-16LE" because
    all three are different Unicode representations. But
    all three are unicode.No! They are UNICODE transformations but not UNICODE.
    >
    So please don't play on words, I wanted to notify the
    original poster that "unicode or UTF-8" is
    meaningless, he/she would probably have said :
    "unicode (as UTF-8 or UTF-16 or...)"You are playing with words, not me. UTF-8 is not UNICODE, it is a transformation of UNICODE to a multibyte representation - http://www.unicode.org/faq/utf_bom.html#14 .

Maybe you are looking for