UTF-16, unicode OK?

Can I change xml file's encode(Shift-JIS) to unicode by XSLT Transform ?
Most of XSLT Processor doesn't clealy indicate which encode they support or the generated files is bad code.

I tried xml.exe but it said "Failed to initialize XML parser, error 201".
However, also though I tested same xml-file with Xerces & Xalan, I got right result.
Oracle parser has a bug or won't support all xml format.
Thanx all.

Similar Messages

  • UTF-8 Unicode in JEditorPane doesn't work

    I do hope this is the correct forum for this question so that some forum nazi doesn't give me grief...here goes.
    I have a JEditorPane with the contentType set to "text/html; charset=UTF-8"
    I do a .setText method with this text:
    <HTML><body><font face='Arial Unicode MS' size='3'>
    Followed by some text extracted from an RSS feed (just the contents of the <description> tag)
    and then </body></html> to finish it off.
    It displays fine apart from a unicode character, it looks like one of those 'fancy' apostrophes so that in the word "We've" the appostrophe shows as an accented a and two squares as shown in this screenshot : Screenshot
    So does that mean that 'Arial Unicode MS' cannot display Unicode as the name would suggest, or am I doing something else wrong?

    When you specify the charset in the contentType setting, you're telling the JEditorPane how to convert raw bytes that it reads from a URL into a Java string. That's assuming you use one of the setPage() methods to populate the component--but you're using setText(), which takes a String. That means the text was corrupted before you put it in the JEditorPane; you need to look at how it's getting brought in from the RSS feed. It's obviously encoded as UTF-8, but being decoded as if it were a single-byte encoding like ISO-8859-1 or windows-1252 (the default for English-locale Windows systems).

  • Convert UTF-8 (Unicode) Hex to Hex Byte Sequence while reading file

    Hi all,
    When java reads a utf-8 character, it does so in hex e.g \x12AB format. How can we read the utf-8 chaacter as a corresponding byte stream (e.g \x0905 is hex for some hindi character (an Indic language) and it's corresponding byte sequence is \xE0\x45\x96).
    can the method to read UTF-8 character byte sequence be used to read any other (other than Utf 8, say some proprietary font) character set's byte sequence?

    First, there's no such thing as a "UTF-8 character". UTF-8 is a character encoding that can be used to encode any character in the Unicode database.
    If you want to read the raw bytes, use an InputStream. If you want to read text that's encoded as UTF-8, wrap the InputStream in an InputStreamReader and specify UTF-8 as the encoding. If the text is in some other encoding, specify that instead of UTF-8 when you construct the InputStreamReader. import java.io.*;
    public class Test
      // DEVANAGARI LETTER A (&#x0905;) in UTF-8 encoding (U+0905)
      static final byte[] source = { (byte)0xE0, (byte)0xA4, (byte)0x85 };
      public static void main(String[] args) throws Exception
        // print raw bytes
        InputStream is = new ByteArrayInputStream(source);
        int read = -1;
        while ((read = is.read()) != -1)
          System.out.printf("0x%02X ", read);
        System.out.println();
        is.reset();
        // print character as Unicode escape
        Reader r = new InputStreamReader(is, "UTF-8");
        while ((read = r.read()) != -1)
          System.out.printf("\\u%04X ", read);
        System.out.println();
        r.close();
    } Does that answer your question?

  • UTF-8 Unicode output

    Hi all,
    I'm at the end of my resources (and those are very limited when speaking of character sets).
    Specifically, I'm having trouble with unicode character 2628 (http://www.fileformat.info/info/unicode/char/2628/index.htm)
    The SOAPMessage response I'm receiving is fine.
    SOAPMessage response = getResponse();
    response.writeTo(System.out); // Works fine. Outputs it correctlyWhere I'm having trouble is reading the data from the attachment itself. I've tried a few hundred different things to no avail.
    Iterator attItr = response.getAttachments();
    AttachmentPart attPart = (AttachmentPart)attItr.next();                          
    String content = new String(attPart.getRawContentBytes(), "UTF-8"); // Doesn't work. Giberish.
    String contentTo = attPart.getContent(); // Doesn't work either. Giberish as wellI've tried a few other things ... but I'm really stuck.
    Any help would be greatly appreciated.
    Thanks.

    You may be able to find a text editor that can do the conversion. Alternatively, I have converted from one encoding to the another programmatically using Java as well.
    Tim Tow
    Applied OLAP, Inc

  • XDK support for UTF-16 Unicode

    Hi,
    Does the Oracle Java XDK, specifically the XSQL servlet and api,
    support UTF16-Unicode.
    Presumably, .xsql files have to be stored, read and queries
    executed in a Unicode compliant format for this to work. Is this
    possible currently???
    Thanks,
    - Manish

    If you are using XDK 9.0.1 or later, and you are using JDK 1.3,
    this combination supports UTF-16. XSQL inherits the support
    for "free" in this case.

  • Conversion utf-16 unicode - ASCII

    Hello,
    i read a utf-16 file. The read text is diplayed like this string:
    yp<#?#x#m#l# #v#e#r#s#i#o#n#=#"#1#.#0#"
    How can I convert the text? I haven't found functions to do that.

    Here's a QDAC (Quick & Dirty ABAP code) to convert UTF-16 file on app server to sap default codepage file.
    Tested on a non-unicode 640 system.
    data: gt_line type string occurs 0 with header line,
          gv_bin type xstring,
          conv_obj type ref to cl_abap_conv_in_ce.
    parameters: in_file type rlgrap-filename,
                out_file type rlgrap-filename.
    start-of-selection.
      check not in_file is initial.
      open dataset in_file for input in binary mode.
      do.
        read dataset in_file into gv_bin.
        if sy-subrc ne 0.
          close dataset in_file.
          exit.
        endif.
        if sy-index eq 1.
          perform create_conv_obj.
        endif.
        try.
            call method conv_obj->convert
              exporting
                input = gv_bin
                n     = -1
              importing
                data  = gt_line.
          catch cx_sy_conversion_codepage .
          catch cx_sy_codepage_converter_init .
          catch cx_parameter_invalid_type .
        endtry.
        append gt_line.
      enddo.
      check not out_file is initial.
      open dataset out_file for output in binary mode.
      loop at gt_line.
        transfer gt_line to out_file.
      endloop.
      close dataset out_file.
    *&      Form  create_conv_obj
    form create_conv_obj.
      data: lv_bom(2) type x,
            lv_encoding type abap_encod,
            lv_endian type abap_endia.
      lv_bom = gv_bin.
      if lv_bom eq 'FFFE'.
        lv_encoding = '4103'.          "code page for UTF-16LE
        lv_endian = 'L'.
        shift gv_bin left by 2 places in byte mode.
      elseif lv_bom eq 'FEFF'.
        lv_encoding = '4102'.          "code page for UTF-16BE
        lv_endian = 'B'.
        shift gv_bin left by 2 places in byte mode.
      else.
        message 'Byte order mark not found at the begining of the file'
                 type 'E'.
      endif.
      try.
          call method cl_abap_conv_in_ce=>create
            exporting
              encoding    = lv_encoding
              endian      = lv_endian
              replacement = '#'
            receiving
              conv        = conv_obj.
        catch cx_parameter_invalid_range .
        catch cx_sy_codepage_converter_init .
      endtry.
    endform.                    "create_conv_obj
    Regards
    Sridhar

  • Regd:UTF 8 Unicode Version

    Hi Gurus,
    How to find UTF-8 Version In Database? I need whether it is using version 1.0 or 2.0 like that.
    Regards,
    Simma...

    jeneesh wrote:
    Efficientoracle wrote:
    hi,
    s i need unicode version. How to find it in our database. Am using 11.2.0.3.
    Regards,
    Simma..It cannot change across databases.. As the documentation says, for 11g, it uses unicode version 3.0Oracle Database 11g, Release 1: 5.0
    Cheers,
    Manik.

  • UTF-8, Unicode, XML and windows problems

    Hi there,
    I'm developing an application which uses a lot of russian text.
    This russian text is stored in XML and can be sent to a server remotly
    I use the standard javax.xml libaries to parse the xml files and DBunits XMLWriter to write generate XML strings.
    The XML returned by the DBunit stuff is UTF-8, but its inside a UTF-16 string.
    So when I generate xml and print it out I get something that looks like the following:
    ��������� ������������ ���������?����Thats ok, beacause I can stick that streight into a file when writing files and it works.
    But the problem comes when sending the XML over the server.
    The sending implentation I use must be able to send java generated utf-16 and xml read utf-8.
    So I convert the XML from utf-8 to utf-16, using the following:
    byte[] stringBytesISO = isoString.getBytes("ISO-8859-1");
    utf8String = new String(stringBytesISO, "UTF-8");And that works perfectly on my linux system.
    However when I run it on windows, it only seems to convert some of the characters
    &#1055;&#1088;&#1080;&#1074;&#1099;&#1095;&#1085;&#1099;&#1084; &#65533;?&#1085;&#1086;&#1084; &#1079;&#1072;&#65533;?&#1085;&#1091;&#1090; &#1076;&#1086;&#1088;&#1086;&#1075;&#1080; &#1076;&#1086; &#1074;&#1077;&#65533;?&#1085;&#1099;,Does anyone know whats going wrong here?

    jammers1987 wrote:
    I use the standard javax.xml libaries to parse the xml files and DBunits XMLWriter to write generate XML strings.DbUnit is a testing tool; are you saying you're using it in a production system? Ideally, you should use the same library to write the XML as you do to read it, but you definitely shouldn't be using DbUnit in this context.
    The XML returned by the DBunit stuff is UTF-8, but its inside a UTF-16 string. That should never happen. XML is just text, and text can either be in the form of a Java string, or it can be stored externally using a text encoding like UTF-8. Never mind that Java strings use the UTF-16 encoding; you don't need to know or mention that. Encodings only come into play when you're communicating with something outside your program, like a file system or a database.
    When you generate the XML, you specify that the encoding is UTF-8. When you read the XML, you specify that the encoding is UTF-8. That's all.

  • File/FTP adapter, outbound channel, content conversion, UTF-8 (Unicode)?

    We would like to send "delimited" files to another application (tab-delimited, CSV, ... - the other application does not support XML-based interfaces). Obviously we will have an outbound channel that uses the file/FTP adapter and the data will be subjected to "content conversion".
    The data contains names in many languages; not all of this can be represented in ISO Latin-1, much less in US-ASCII. I suppose UTF-8 would work. The question is: how is this handled by the FTP protocol? (considering that the FTP client is part of the SAP PI file/FTP adapter and the FTP server is something on the "other" machine)

    Hi Peter,
    you can maintain the file encoding in the outbound adapter. See [Configuring the Receiver File/FTP Adapter|http://help.sap.com/saphelp_nw2004s/helpdata/en/bc/bb79d6061007419a081e58cbeaaf28/content.htm]
    For your requirements "utf-8" sounds pretty fitting.
    Regards,
    Udo

  • How do I tell if a File is ANSI, unicode or UTF8?

    I have a jumble of file types - they should all be the same, but they are not.
    How do I tell which type a file has been saved in?
    (and how do I tell a file to save in a certain type?)

    "unicode or UTF-8" ?? UTF-8 is unicode !NO! UTF-8 is not UNICODE. Yes it is !!No it is not.
    And to prove it I refer to your links.........
    You simply cannot say "unicode or UTF-8" just because
    UTF is Unicode Transformation Format.UTF is a transfomation of UNICODE but it is not UNICODE. This is not playing with words. One of the big problems I see on these forums is people saying the Java uses UTF-8 to represent Strings but it does not, it uses UNICODE point values.
    You can say "UTF-8 or UTF16-BE or UTF-16LE" because
    all three are different Unicode representations. But
    all three are unicode.No! They are UNICODE transformations but not UNICODE.
    >
    So please don't play on words, I wanted to notify the
    original poster that "unicode or UTF-8" is
    meaningless, he/she would probably have said :
    "unicode (as UTF-8 or UTF-16 or...)"You are playing with words, not me. UTF-8 is not UNICODE, it is a transformation of UNICODE to a multibyte representation - http://www.unicode.org/faq/utf_bom.html#14 .

  • Create XML File with codepage (not UNICODE)

    Hi all ,
    I'm using the SDIXML_DATA_TO_DOM function module to create a xml format from an internal table, finally I write the internal table in xml format to file in the application server.
    Finally I get a file in XML format in UTF-8 (UNICODE), but I have to convert the UTF-8 into other codepage because the file is send to other system which not familiar with the UNICODE.
    Does anyone knows how to convert the codepage for the xml internal table ?
    Thanks,
    Yaki

    Another option rather than converting it after the fact is to have it create the XML file in the encoding that you want to begin with.  It looks like this function module you mention will either accept a pointer to a XML document object or create one for you.  I would suggest that you go ahead and create a document yourself and pass it to the function module.
      ixml = cl_ixml=>create( ).
      document = ixml->create_document( ).
    You can then call the method SET_ENCODING of the class IF_IXML_DOCUMENT.  The encoding itself is another class of type IF_IXML_ENCODING.  You can set the character set in this object using method SET_CHARACTER_SET.

  • Byte Order Mark (BOM) not found in UTF-8 file download from XI

    Hi Guys,
    Facing difficulty in downloading file from XI in UTF-8 format with byte order mark.
    Receiver File adapter has been configured to download the file in UTF-8 file format. But the byte order mark is missing. Same works well for UTF-16. Could see the byte order mark at the beginning of  file "FEFF" for UTF-16BE - Unicode big endian.
    As per SAP help, UTF-8 supposed to be the default encoding for TEXT file type.
    Configuring the Receiver File/FTP Adapter in the SAP help link.
    http://help.sap.com/saphelp_nw04/helpdata/en/d2/bab440c97f3716e10000000a155106/frameset.htm
    Could you please advice on how to achieve BOM in UTF-8 file as it is very important for the outbound file to get loaded in our vendor system.
    Thanks.
    Best Regards
    Thiru

    Hi!<br>
    <br>
    Had the same problem. But here, we create a "CSV"-File which must have the BOM otherwise it will not be recogniced as UTF-8.
    <br>
    Therefore I've done the folowing:
    Created a simple destination-structure which represents the CSV and done the mapping with the graphical-mapper. The destination-Structure looks like:
    <br>
    (?xml version="1.0" encoding="UTF-8"?)<br>
    (ONLYLINES)<br>
         (LINE)<br>
              (ENTRY)Hello I'm line 1(/ENTRY)<br>
         (/LINE)<br>
         (LINE)<br>
              (ENTRY)and I'm line 2(/ENTRY)<br>
         (/LINE)<br>
    (/ONLYLINES)
    As you can see, the "ENTRY"-Element holds the data.<br>
    <br>
    Now I've created the folowing Java-Mapping and added that mapping within the Interface-Mapping as second step after the graphical mapping:<br>
    <br>
    ---cut---<br>
    package sfs.biz.xi.global;<br>
    <br>
    import java.io.InputStream;<br>
    import java.io.OutputStream;<br>
    import java.util.Map;<br>
    <br>
    import javax.xml.parsers.DocumentBuilder;<br>
    import javax.xml.parsers.DocumentBuilderFactory;<br>
    <br>
    import org.w3c.dom.Document;<br>
    import org.w3c.dom.Element;<br>
    import org.w3c.dom.NodeList;<br>
    <br>
    import com.sap.aii.mapping.api.StreamTransformation;<br>
    import com.sap.aii.mapping.api.StreamTransformationException;<br>
    <br>
    public class OnlyLineConvertAddingBOM implements StreamTransformation {<br>
    <br>
         public void execute(InputStream in, OutputStream out) throws StreamTransformationException {<br>
              try {<br>
                   byte BOM[] = new byte[3];<br>
                   BOM[0]=(byte)0xEF;<br>
                   BOM[1]=(byte)0xBB;<br>
                   BOM[2]=(byte)0xBF;<br>
                   String retString=new String(BOM,"UTF-8");<br>
                   Element ServerElement;<br>
                   NodeList Server;<br>
                   <br>
                DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance();<br>
                DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();<br>
                Document doc = docBuilder.parse(in);<br>
                doc.getDocumentElement().normalize();<br>
                NodeList ConnectionList = doc.getElementsByTagName("ENTRY");<br>
                int count=ConnectionList.getLength();<br>
                for (int i=0;i<count;i++) {<br>
                    ServerElement = (Element)ConnectionList.item(i);<br>
                    Server = ServerElement.getChildNodes();<br>
                    retString += Server.item(0).getNodeValue().trim() + "\r\n";<br>
                }<br>
                <br>
                out.write(retString.getBytes("UTF-8"));<br>
                   <br>
              } catch (Throwable t) {<br>
                   throw new StreamTransformationException(t.toString());<br>
              }<br>
         }<br>
    <br>
         public void setParameter(Map arg0) {<br>
              // TODO Auto-generated method stub<br>
              <br>
         }<br>
    <br>
    /*<br>
         public static void main(String[] args) {<br>
              File testfile=new File("c:\\instance.xml");<br>
              File testout=new File("C:\\testout.txt");<br>
              FileInputStream fis = null;<br>
              FileOutputStream fos= null;<br>
              OnlyLineConvertAddingBOM myFI=new OnlyLineConvertAddingBOM();<br>
              try {<br>
                    fis = new FileInputStream(testfile);<br>
                     fos = new FileOutputStream(testout);<br>
                    myFI.setParameter(null);<br>
                    myFI.execute(fis, fos);<br>
              } catch (Exception e) {<br>
                   e.printStackTrace();<br>
              }<br>
                    <br>
                    <br>
         }<br>
         */<br>
    <br>
    }<br>
    --cut---
    <br>
    This Mapping searches all "ENTRY"-Tags within the XML-Strucure and creates a big string which startes with the UTF-8-BOM and than combined each ENTRY-Element, separated by CR/LF.<br>
    <br>
    We use this as Payload for an Mail-Adapter (sending via SMTP) but it should also work on File-Adapter.<br>
    <br>
    Hope it helps.<br>
    Rene<br>
    <br>
    Besides: could someone tell SAP that this editor is the WORSEST editor I've ever seen. Maybe this guys should copy somethink from wikipedia :-((
    Edited by: Rene Pilz on Oct 8, 2009 5:06 PM

  • Message mapping : UDF parameter string type versus default UTF-8 encoding

    Hi,
    I'm facing an issue with character encoding when using an UDF to transform into base64 encoding.
    While thinking about the subject, I'm not 100% sure if it's possible to get it to work corerctly :
    Given :
    -The input XML is encoded UTF-8 ( with a special characeter )
    -The UDF is generated with java parameter type 'string' ( = set of 16bit unicode characters )
    Doubts :
    -What is supposed to happen when a node content ( of message encoded in UTF-8 ) is used as input for the UDF string type parameter  ? Is the node content decoded/encoded correctly by PI automatically ( at input/output versus the internal 16bit unicode character string ) ?
    ( I would assume yes )
    -Is the default charset of the underlying JVM relevant ? Or does pi always use explicit charsets when encoding/decoding ?
    ( I would assume it's not relevant )
    The UDF java code considers the string as a array of chars while processing them. It uses methods .length and .charat on the input string.
    The result is that I have a ISO-8859 encoded string ! ( after decoding it back from the base64 ) 
    What could cause this ?
    regards
    Dirk
    PS If I simply use default functions ( concat etc..) then the resulting xml stays correctly encoded...

    Hi,
    But that would mean that an UTF-8 encoded byte array would be passed unconverted to the UTF-16 unicode string parameter ?
    Shouldn't that trigger an exception ?
    I'm going to make some tests and see if it enlights my knowledge ( empirical )
    Keep you updated,
    thanks
    dirk

  • Converting Latin-1 to UTF-8

    Hi all,
    I'm looking to find a way in Dreamweaver to run a
    Search/Replace function to convert all my current Latin-1 character
    set special characters to UTF-8 Unicode. For example, I'd like to
    replace all my &aacute; which gives me this symbol: á to
    &225;. This would have to work for
    á,é,í,ó,ú, among several other characters.
    As it stands right now, I just use the Search and Replace function
    in Dreamweaver to convert each character at a time (e.g. once for
    all occurances of á, then again for all characters é,
    etc...). So the question is, is there a way that I can load up all
    the characters and what I want to change them to and run it in a
    batch perhaps using a saved Search/Replace function or using the
    Command function from the Menu Bar? Currently, I've used "Search
    and Replace" by Funduc (
    http://www.searchandreplace.com/search_replace.htm)
    which allows me to save all my queries and then run it as one
    batch. I was wondering if there was a way to do this in Dreamweaver
    by itself. This would be to replace all occurances of these letters
    on an old site that has a large number of pages.
    Thanks a bunch.
    Thank you all!
    alfred.

    Hi all,
    I'm looking to find a way in Dreamweaver to run a
    Search/Replace function to convert all my current Latin-1 character
    set special characters to UTF-8 Unicode. For example, I'd like to
    replace all my &aacute; which gives me this symbol: á to
    &225;. This would have to work for
    á,é,í,ó,ú, among several other characters.
    As it stands right now, I just use the Search and Replace function
    in Dreamweaver to convert each character at a time (e.g. once for
    all occurances of á, then again for all characters é,
    etc...). So the question is, is there a way that I can load up all
    the characters and what I want to change them to and run it in a
    batch perhaps using a saved Search/Replace function or using the
    Command function from the Menu Bar? Currently, I've used "Search
    and Replace" by Funduc (
    http://www.searchandreplace.com/search_replace.htm)
    which allows me to save all my queries and then run it as one
    batch. I was wondering if there was a way to do this in Dreamweaver
    by itself. This would be to replace all occurances of these letters
    on an old site that has a large number of pages.
    Thanks a bunch.
    Thank you all!
    alfred.

  • VBScript doesn't work for UTF-8 text files

    I'm new to scripting. When I tried to run the sample VBScript on page 8 of the PDF doc, "Adobe Introduction to Scripting," I got an error message, "Invalid Character, Line 1, Column 1."
    I'm running CS3 Design Premium on Windows XP Pro.
    The problem turned out to involve the character encoding of the .jsx text file. I use UTF-8 UNICODE encoding for all of my text files, and the VBScript interpreter has not been updated to handle UTF-8. When I saved the file with the Windows 1252 Western European encoding, the script worked as advertised.
    I use UTF-8 now, and I'm planing to go to UTF-16. I want my Web sites and text files to display the way I wrote them anywhere in the world, and I don't want character-set restrictions on text that I insert into CS3 docs using a script.
    I don't know if the problem lies with Adobe or with Windows, but it represents a fault that needs to be fixed.

    Maybe the problem is that you're trying to write a VBScript in a .jsx file? You need to rename it to .vbs .

Maybe you are looking for