How to preserve the encoding

Hi,
I need to convert a string with the utf-8 encoding to an array of bytes. I have tried to do the convertion with getBytes("UTF-8") and getBytes(). The encoding is lost in either ways. What is the right way to perserve the encoding?
Thanks.

My problem is that email text is unreadable although
it is a utf-8 encoded string and processed with the
getBytes in utf-8.
Lets get one thing clear first. Strings donot have an encoding in java, byte streams have. So what do you mean by a utf-8 String? If its a String created from a UTF-8 byte stream you could get the byte array by
byte [] myBytes = myString.getBytes("utf-8");
If you again want to convert it to a String
String newString = new String(myBytes, "utf-8");
Its important to note here that utf-8 here in the string constructor is not the encoding of the String but is just letting the constructor know the encoding of the byte array so that they could be translated accordingly.

Similar Messages

  • Numbers to CSV export script: how to specify the encoding?

    Hi,
    I'm using the following script to export a Numbers document to CSV:
    # Command-line tool to convert an iWork '09 Numbers
    # document to CSV.
    # Parameters:
    # - input: Numbers input file
    # - output: CSV output file
    # Attik System, Philippe Lang
    # Creation date: 31 mai 2012
    # Modification date:
    on run argv
      # We retreive the path of the script
              set myPath to (path to me)
              tell application "Finder" to set myFolder to folder of myPath
      # We get the command line parameters
              set input_file to item 1 of argv
              set output_file to item 2 of argv
      # We retreive the extension of the file
              set theInfo to (info for (input_file))
              set extname to name extension of (theInfo)
      # Paths
              set input_file_path to (myFolder as text) & input_file
              set output_file_path to (myFolder as text) & output_file
              if extname is equal to "numbers" then
        tell application "Numbers"
          open input_file_path
          save document 1 as "LSDocumentTypeCSV" in output_file_path
          close every window saving no
        end tell
              end if
    end run
    It works fine, except that I don't know how to specify the encoding of the text in the CSV file (Latin1, MacRoman, Unicode). This option is available in the export dialog of Numbers. Any hint on how to do that is welcome. (GUI Scripting?)
    Where can I find documentation on the iWork "vocabulary" available? Is there a definitive documentation somewhere? I tried to record an manual export in the script editor, without success. Script is more or less empty.
    Thanks!
    Philippe Lang

    A further note from Yvan. He's made some revisions to the script sent earlier.
    --{code}
    --[SCRIPT export to CSV with selected encoding]
    I added some features.
    (1) Defining the encoding thru the preferences file apply only if
    the application is not in use because the file is read only once in a session.
    A test urge you to quit Numbers if it is running.
    (2) info for is deprecated so it may be removed by Apple tomorrow.
    I no longer use it.
    (3) just for the fun, I added a piece of code allowing you to select the encoding on the fly.
    Thanks to the property chooseEncodingInScript, at this time the script use Unicode (UTF-8)
    (4) I'm wondering which tool is used to launch this script,
    I don't know the way to pass arguments when I run one.
    Yvan KOENIG (VALLAURIS, France)
    2012/06/13
    property chooseEncodingInScript : false
    true = the script will ask you to select the encoding
    false = the script use the embedded encoding
    on run argv
      set input_file to (item 1 of argv) as text
      set output_file to (item 2 of argv) as text
      set myPath to (path to me) as text
              tell application "System Events"
      set theProcesses to name of every application process
      set myFolder to path of container of (disk item myPath)
      set input_file_path to myFolder & input_file
      set output_file_path to myFolder & output_file
      set extname to name extension of (disk item input_file)
      end tell
              if extname is "numbers" then
                        if "Numbers" is in theProcesses then error "Please, quit “Numbers” before running this script !"
      if chooseEncodingInScript then
                                  set theList to {"Mac OS Roman", "Unicode (UTF-8)", "Windows Latin 1"}
                                  set maybe to choose from list theList with prompt "Choose the default encoding applying to export as CSV"
      if maybe is false then
      error number -128
      else if item 1 of maybe is item 1 of theList then
                                            30 -- Mac OS Roman
      else if item 1 of maybe is item 2 of theList then
                                            4 -- Unicode (UTF-8)
      else
                                            12 -- Windows Latin 1
      end if
      else
                                  4 -- Unicode (UTF-8)
      end if
                        do shell script "defaults write com.apple.iWork.Numbers CSVExportEncoding  -int " & result
      tell application "Numbers"
      open input_file_path
                                  save document 1 as "LSDocumentTypeCSV" in output_file_path
      close every window saving no
      end tell
      end if
    end run
    --{code}
    Regards,
    Barry

  • How to check the encoding of the String?

    hi everybody
    Could anyone tell how to tell from a String object in which encoding it is written? Is it UTF-8, UTF-16 etc. Is there any class that posses such a method. Another problem is that this solution must work under java 1.3.
    Thx in advance.

    Ok, but for example when you read from the file it
    can be encoded in the UTF-8 or ANSI coding. Yes, so you have to tell Java which encoding to use, or it will use your default encoding. Some of the I/O classes have constructors or methods that let you specify which encoding to use.
    Moreover
    when read from the request in a servlet it is usually
    encoded in the UTF-8 coding.If that's specified in the HTTP headers, then the core API classes that grab it before handing it to you will do the conversion I think.
    My question is how to
    determine the encoding. Maybe i wasn't clear enough
    in my previous post, sorry for that.In general, you can't just look at a file and know the encoding. You have to know, or have an external means to find out.

  • How to change the Encoding type of a XML

    Hi all,
    I'm having a XML(generated at run time) with UTF-8 Encoding. If I'm going to parse it, getting an error saying "*Document root element is missing*".
    If I change the encoding to ANSI, it parses without error.
    How can I change the encoding type of a documnet ?
    Any comment welcome.
    Kaushalya

    There's no such thing as the "encoding of a String". If you produced a String from a sequence of bytes using the wrong encoding, you may not be able to repair that problem by hacking about in your code. You're better off to produce the String using the correct encoding in the first place. Read this for more information about XML and encodings as you appear to be misunderstanding basic concepts:
    [http://skew.org/xml/tutorial/]

  • How to get the "encoding" of a XML file using JDOM

    As in XML file, <?xml version="1.0" encoding="UTF-8" ?> indicates the encoding of this file
    while using JDOM to parse a XML file, how can I get the encoding type?
    thanx!!!

    What my program do is to get the encoding of XML files and convert them to UTF-8 encoding files, while I need this "encoding" information of the original XML document thus I can convert...
    After reading specifications and JDOM docs, the truth turns to be disappointed, no function is provided to get this information in JDOM level 2(the current released one), while it's promissed that this function will be provided in JDOM level API....
    Thanx all for your help and attention!!!

  • How to get the encoding of a XML file ...

    Hi,
    How do you get the encoding of a XML file?
    For example,
    <?xml version="1.0" encoding="SJIS"?>
    I am trying to retrieve the above encoding="SJIS", but I can't seem to locate the API for doing so.
    Thanks in advance for any help,
    Eric

    Hi ddossot,
    Thanks for your suggestion.
    However, the xerces.jar file that comes with my old tomcat server is an old version and thus, the getEncoding method is not even present in the DocumentImpl class. The option to update to a newer version of tomcat and xerces is not available. What a pity... :-(
    Well, I just have to try to find a way around. Worst case scenario, parse the first line in the xml file myself.
    Regards,
    Eric

  • ADF FACES: how to preserve the sort criteria for an af:table

    How can I preserve the sort criteria on an af:table across page invocations? I've searched all through the forum and I don't see anything on this topic.
    I simply want the sort criteria (from when the user clicks on a column header) to be remembered across multiple uses of the page. I know that the control handles this itself for multiple invocations of the same page (like when you page through the table). But I need to preserve the sort order so I can install it again when someone leaves the page and then returns to it.
    I've tried various attempts using a SortListener to record the sort criteria, but I can't figure out how to reinstall the criteria without generating exceptions from the table control.
    Any pointers on how to do this would be greatly appreciated.
    Thanks.
    Larry.

    Ok, I've solved the problems with the odd behavior by always creating a new model when the table data changes and copying the sort criteria into the new model, like this:
            // Construct our own CollectionModel from this result set
            if(_model == null) {
                // Construct the initial data model and set the starting sort criteria
                ListDataModel m = new ListDataModel(results);
                _model = new SortableModel(m);
                // Set the sort criteria to last name
                ArrayList criteria = new ArrayList();
                criteria.add(new SortCriterion("lastName", true));
                _model.setSortCriteria(criteria);
            } else {
                // Construct a new model so the table "sees" the change
                ListDataModel m = new ListDataModel(results);
                SortableModel sm = new SortableModel(m);
                sm.setSortCriteria(_model.getSortCriteria());
                _model = sm;
            }But, I end up with one final thing that doesn't work. In the "then" clause above, I try to set the initial sort criteria for the table - it has no effect. When the table is rendered, it is not sorted in any way.
    How can I specify an initial sort order for the table? Why is it ignoring the sort criteria on the model?
    Thanks.

  • How to change the encoding value in jdom?

    When I create a new xml file by using jdom, the encoding value of xml is "UTF-8"
    (<?xml version="1.0" encoding="UTF-8"?>).
    How can I change encoding value to "big5"??
    Thanks

    Use this method (JDom API)
    XMLOutputter.setEncoding(java.lang.String encoding)
    Regards,
    Darren

  • How to determine the encoding

    In my web application , we are capturing arabic name and storing it in oracle database
    // while saving
    this.arabicDesc=new String(paymentTransactionTypeDetailVO.getArabicDesc().getBytes("ISO8859_1"),"UTF8") ;
    // while retrieving from database
    paymentTransactionTypeDetailVO.setArabicDesc(new String (arabicDesc.getBytes("UTF8"),"ISO8859_1"));In the jsp pages , we are using
    In all the jsp pages , we are using
    <META HTTP-EQUIV="Content-type" CONTENT="text/html; charset=UTF-8">All these values are coming correctly in our jsp pages. But when we tried to convert this arabic data to cp864 in an applet for printing , we are getting junk.Here we are not explictly converting the data but using printstream class like this. any idea why ????? is coming..
    FileOutputStream fos =  new FileOutputStream("LPT1");  
    PrintStream pw =  new PrintStream(fos,true,"Cp864");I checked the database also...
    select * from nls_database_parameters
    where parameter='NLS_CHARACTERSET';
    NLS_CHARACTERSET     UTF8data is coming like this...
    <object
      id="ReceiptPrinterApplet"
      classid="clsid:CAFEEFAC-0015-0000-0007-ABCDEFFEDCBB"
      width="0" height="0" >
       <param name="code" value="ReceiptPrinterApplet.class">  
       <param name="printermode" value="Broad">  
       <param name="Party Name1" value="KUANJIKOMBIL VARGHESE ALEXANDER (720216)- PO Box No:245">
       <param name="Party Name Arabic1" value="����� ������ ������ �������">
    </object>Where could be the problem ? How can we resolve this issue. We need to convert to cp864 since the printer is supporting only cp864.

    Unfortunately, based on your description of how you acquired the data, I believe you may be correct in saying that there is no generic solution. If your database is encoded in UTF-8, and you can confirm that the data is correct in the database, then you should be in reasonable shape - then it is a matter of figuring out where in your code you have some issues.
    If, on the other hand, you determine that part of the data is corrupted in the database because if may have been incorrectly converted when it was stored, then it becomes much more difficult to figure out what to do with it, and how to correct it (if that is possible at all - if one of the conversions was to a target encoding where the source code point does not exist, then the data is lost forever).
    Converting UTF-8 encoded data to 8859-1 is really only possible for a small subset of data (data actually in Latin-1 scripts), and there is no good reason to do it, unless it is part of an attempt to correct a previous incorrect conversion (and that should only be attempted with extreme caution, in my opinion).

  • How to obtain the encoding scheme for an XML document

    How do you go about reading the encoding scheme for an XML document??
    More specifically how do I read the line:
    <?xml version="1.0" encoding="UTF-8"?>
    (Using Win32 C++ XML Parser 2.0.3 as SAX).
    null

    I work mostly with the Java versions of the parser so you'll have to make the translation to C++. As far as I know, you can't use the SAX API to access to the encoding.
    You need to use the DOM along with Oracle's extension to the basic DOM functionality. Oracle's package, oracle.xml.parser.v2 defines a class which implements the Document interface called XMLDocument. This class has a method, getEncoding(), which returns the encoding. You would use the method in getDocument() in the Parser base class inherited by DOMParser to retrive the XMLDocument.
    Jeff

  • How to preserve the column headings from my playlist when I transfer the list as WAV files to a flash drive?

    I have digitized several old vinyls as WAV files and have them in my MacBook Pro in an iTunes playlist. I have arranged the list under the column headings of "Grouping", "Name", "Time", "Artist", "Album" and "Genre". When I copy the files to a flash drive by drag-and-drop, none of the headings except "Name" show up. How to preserve all of the column headings when copying the file to an external device?  Thanks - RC@gnv.

    I have the same issue when trying to sync from iTunes 11 on my MacBook to my iPhone 4 iOS6.1. I blindly did the iOS upgrade to 6.1 (more fool me) and since then have not been able to use my phone as an iPod.
    I have tried restoreing it, I ve tried restoring it to a new phone but it hasn't worked for me.
    The sync starts and the capacity bar in iTunes shows capacity including and music or films etc that are to be included in the sync and then iTunes stops the sync and nothing has been transfered. As with your experience the music is shown in iTunes under Music, under Devices as greyed out with the syncing icon (dotted circle) to the left of it.
    When I try to include photos in the sync, I also get an error msg stating the connection to the iPhone has been lost. I am trying to do the sync vis USB, so I dont why I get this error.
    I can of course download any music I've bought from Apple from my iCloud to my phone but I want to be able to put any music I want on to it like I could before on iOS5.

  • How to generate the encoding attribute

    Hi,
    After reading a lot of posts on this forum and trying out suggestions on both 9.2 and 10G databases my question is stil unanswered. So I thought I'd ask my question again.
    Original post is here (dbms_xmlgen how to generate encoding attribute
    I'm currently generating XML using the dbms_xml packages. But same problem exists when using sql/xml functions.
    I get this header :
    <?xml version="1.0"?>
    My database has the WE8MSWIN1252 characterset. So I want the header to be:
    <?xml version="1.0" encoding="windows-1252"?>
    My question is how can I generate the xml header automatically, depending on the database characterset?
    The code has to work on 9.2 and 10G. I see 2 possible solutions.
    1 The encoding attribute value is generated automatically
    2 I generate it myself but then I need a to translate the database characterset name to the proper value for the encoding attribute value.
    Rene

    You could write (indeed as you mentioned option in 2) your own, for instance, as shown here:
    Re: Concatenation, Attributes, and Processing Instruction
    and/or use XMLROOT
    but that said, you can't use XMLROOT in 9.2 yet.
    but based on your requirements it is not easy / a lot of work to do, although I wonder what you are trying to achieve and don't forget that if you pick the database characterset you will probably overrule situations based on:
    (http://download.oracle.com/docs/cd/B19306_01/server.102/b14225/ch3globenv.htm#i1006415)
    NLS_SESSION_PARAMETERS shows the NLS parameters and their values for the session that is querying the view. It does not show information about the character set.
    NLS_INSTANCE_PARAMETERS shows the current NLS instance parameters that have been explicitly set and the values of the NLS instance parameters.
    NLS_DATABASE_PARAMETERS shows the values of the NLS parameters for the database. The values are stored in the database.
    or in other words if NLS settings are manually changed within a session, instance or database context.
    Message was edited by:
    Marco Gralike

  • How to set the encoding of an XML-document

    I need to change the encoding of an xml-document.
    When I convert the document into a string, UTF-8
    is used, I want to use ISO-8859-1.

    use this in your identity transform:
    transformer.setOutputProperty(OutputKeys.ENCODING, "ISO-8859-1");

  • How to find the encoding of a file?

    hi,
    Is it possible to findout the encoding of a file through programing?
    thanks,
    Mahi

    thanks tcristino for update,
    but this link contains the info about BOM. Is it possible to know the BOM data from programming?
    thanks,
    Mahender

  • How to preserve the "return" char in the datafile imported

    Is there any way to preserve the "return" char in the datafile when using SQL*Loader to import data?What I wanted is to keep some format of the original data,the sample datafile as following(with the "|" as the fields-limit)
    aaaaa|bbbbbb
    cccccc|jjjjj --this is just a physical record
    when the 2rd filed data being selected,it should be:
    bbbbbb
    cccccc
    Thanks advanced for any suggestion.
    Xiage

    You can use "str X'7c0a'" following the INFILE in your control file to specify that a record is terminated by a pipe character (7c) followed by a newline (0a), like this:
    LOAD DATA
    INFILE 'input_file_name.txt' "str X'7c0a'"
    REPLACE INTO TABLE table_name_to_load_into
    FIELDS TERMINATED BY '|'
    (column_name1,
    column_name2,
    column_name3)

Maybe you are looking for