Encoding issue - having UTF-16LE BOM "FF FE"

Hello Experts,
The scenario is as follows:
SAP sends IDOCS -> SAP-PI (collects IDOCS & creates IDOC xml) -> Content conversion done at Receiver CC -> Text File having pipe delimited is placed in an FTP location.
Requirement :
Currently SAP R/3 is sending Balkan and Cyrilllic chracters to PI. Both SAP-R/3 and PI are Unicode compliant. SAP-PI version being used is SAP-PI 7.1. BPM is used to collect the idocs based on time.
The SAP-PI while converting the IDOC to idoc XML, it has header as "encoding=UTF-8".
The text file that is getting created at the FTP location is an ANSI file. (If you open the text file with EDITPLUS(ver3) tool, you can check the file type as ANSI. )We need to change this as UTF-16LE.
In the receiver CC, in the first Target tab, we have maintained Transfer Mode as "Binary" in FTP connection parameters.
In the Processing tab, we have maintained the File type as Text and Encoding as "UTF-16LE".
We also switched from Binary-Binary, Binary-Text and Text-Text in both the tabs, but the file that is getting put, is still an ANSI file in the FTP location.
In PI all the characters are coming correctly. But the time of creating the file, the file is getting created as ANSI.
We need to have the file type as UTF-16 having BOM(Byte Order Mark) as "FF FE". If you open in the EDITPLUS text editor, it should show as UTF-16.
Please if any of you experts have come across any solution for this issue, please let me know the steps. It an issue in production and need your help asap.
Points to be awarded to the best answer and an answer that helps us solve the problem.
Thanks.
Deb.

from another discussion in SDN forum, I have learned that PI does not add a BOM.
UTF-16LE and UTF-16BE do not have a BOM, as the byte order is clear from declaration.
So you have to add the BOM with an OS script.
When you put UTF-16LE in receiver channel, the target file should be in UTF-16LE. if this does not work, check if UTF-16LE is installed in server, where PI is running. But if it is missing, an error message would happen in channel monitor.
You have to check the encoding of the file with a hex editor. You cannot verify this with with Notepad or any other text editor.

Similar Messages

Encoding Issue: Change UTF8 with BOM char file to UTF16LE without BOM char.

i am trying to read UTF 8 file with BOM char .. if BOM mark present then i want to remove that BOM char
and write same file with UTF16LE with out BOM char ..
Please suggest solution on this .
FileInputStream fis = new FileInputStream(file);
               long size = file.length();
               byte[] b = new byte[(int) size];
               int bytesRead = fis.read(b, 0, (int) size);
               if (bytesRead != size) {
                    throw new IOException("cannot read file");
               byte[] srcBytes = b;
                        int b0 = srcBytes[0] & 0xff;
               int b1 = srcBytes[1] & 0xff;
               int b2 = srcBytes[2] & 0xff;
               int b3 = srcBytes[3] & 0xff;
               if (b0 == 0xef && b1 == 0xbb && b2 == 0xbf) {
                    System.out.println("Hint: the file starts with a UTF-8 BOM.");
                         String srcStr = new String(b ,"UTF8");
               String      encoding= "UnicodeLittle";
                        writeFile(filePath, srcStr,encoding);// Here is writing file with UTF16LE
     But files gets written with BOM char .
how do i remove this .
Please suggest solution on this

'uncle_alice' - in the OP's other thread on this topic I posted a decorated InputStream class that will strip of any (well any I could find definitions of) BOM prefix. Using this it is almost trivial for the OP to convert a file from one encoding to another without worrying about the BOM. I showed him the water but I can't make him drink.

XML Encoding Issue - Format UTF-16 to ISO-8859-1

Dear Groupmates,
I have data in my Internal table which i am converting to XML using custom Transformation.
Data is going to third party.The third party system requires data in ISO-8859-1 Format but SAP is generating the same in UTF-16 Format.I have been able to change the format of file from
utf-16 to ISO-8859-1 format but after conversion i am getting invalid tag information in form of characters
like &lt , &gt etc..in my file.
Here is the code i have used to set the encoding to ISO-8859-1 :-
DATA: xmlout TYPE xstring.
DATA: ixml TYPE REF TO if_ixml,
streamfactory TYPE REF TO if_ixml_stream_factory,
encoding TYPE REF TO if_ixml_encoding,
ixml_ostream TYPE REF TO if_ixml_ostream.
ixml = cl_ixml=>create( ).
streamfactory = ixml->create_stream_factory( ).
ixml_ostream = streamfactory->create_ostream_xstring( xmlout ).
encoding = ixml->create_encoding(
character_set = 'ISO-8859-1' byte_order = 0 ).
ixml_ostream->set_encoding( encoding = encoding ).
Sample Output :-
<?xml version="1.0" encoding="iso-8859-1"?>
<AMS_DOC_XML_EXPORT_FILE><AMS_DOCUMENT AUTO_DOC_NUM="FALSE" DOC_CAT="CA" DOC_CD="CA" DOC_DEPT_CD="045" DOC_ID="XR10281060830400001" DOC_IMPORT_MODE="OE" DOC_TYP="CH" DOC_UNIT_CD ="NULL" DOC_VERS_NO="01">
<CH_DOC_HDR AMSDataObject="Y">
<DOC_CAT Attribute="Y"><![CDATA[CA]]></DOC_CAT>
<DOC_TYP Attribute="Y"><![CDATA[CH]]></DOC_TYP>
Please let me know if anyone has idea how i can get rid of the invalid tag information.
Thanks !
With Regards,
Darshan Mulmule

Darshan,
Did you get an answer for this question? We have same requirement to create XML file in ISO-8859-1 format with Attributes is set to "Y" and CDATA is being used for data.
Can you please let me know if you still remember how did you achieve it?
Satyen...

UTF encoding issues on file adapters and mappings

Hi,
We did some tests regarding to UTF-8 and UTF-16 encoding using file adapters. Our conclusion so far is (when using Windows OS):
1. Inbound adapter can handle UTF-8 and UTF-16 correctly, but do not specify the encoding!
2. XI mappings will set the XML encoding to UTF-8 correctly when sending an UTF-16 file to XI.
3. Outbound adapter can only handle UTF-8 (and US-ACSII and ISO-8859-1) correctly.
The exact test results are:
>>Outbound file adapter bug.
If no encoding is specified in the outbound file adapter, UTF-8 and UTF-16 are handled correctly. However if the encoding is set to UTF-16, XI mapping will fail with the error:
During the application mapping com/sap/xi/tf/_CHRIS_OUTBOUND_TO_INBOUND_ a com.sap.aii.utilxi.misc.api.BaseRuntimeException was thrown: Fatal Error: com.sap.engine.lib.xml.parser.Parser~
Part of the trace:
com.sap.aii.ibrun.server.mapping.MappingRuntimeException: Runtime exception occurred during execution of application mapping program com/sap/xi/tf/_CHRIS_OUTBOUND_TO_INBOUND_: com.sap.aii.utilxi.misc.api.BaseRuntimeException; Fatal Error: com.sap.engine.lib.xml.parser.ParserException: XMLParser: No data allowed here: (hex) a0d, a0d, 6e3c(:main:, row:3, col:2) at com.sap.aii.ibrun.server.mapping.JavaMapping.executeStep(JavaMapping.java:72) at com.sap.aii.ibrun.server.mapping.Mapping.execute(Mapping.java:91) at com.sap.aii.ibrun.server.mapping.MappingHandler.run(MappingHandler.java:78) at com.sap.aii.ibrun.sbeans.mapping.MappingRequestHandler.handleMappingRequest
>>Inbound file adapter bug.
If the encoding of an inbound file adapter is set to UTF-16 everything works ok (except the XML encoding is not set correctly, but this may be a mapping issue and not an adapter issue). However the default UTF-16 encoding seems to be UTF-16BE, where I would expect UTF-16LE since this is the most commonly used encoding.
If the encoding UTF-16LE or UTF-16BE the characterset used in the message is correct, except the BOM of the file. The BOM is empty which means UTF-8 encoded file. Since the file is UTF-16BE or UTF-16LE encoded, this is wrong and the correct BOM should be added by the adapter.
Encodings like US-ASCII and ISO-8859-1 are handled correctly.
>>Mapping bug
When we send in a message encoded in UTF-8 and want to send it out as a UTF-16 encoded message, we need to set the XML encoding to UTF-16. Normally this is done by an XSLT mapping using the <xsl:output encoding=UTF-16/> command.
The UTF-8 message will get processed by the XSLT and any special character will be converted to its UTF-16 value. However the output message is not UTF-16 encoded (1 byte in-stead off 2 bytes).
When this 1 byte message is send to the inbound adapter (encoding is set to UTF-16) the message will be translated from 1 byte to 2 byte (UTF-8 to UTF-16). The characters that were converted from UTF-8 to UTF-16 will be read as single byte characters and will be converted again. This will result in an incorrect message with illegal characters.
So basically characters will be converted to UTF-16 2 times, which is incorrect.
Maybe someone can confirm this on another XI system (maybe different OS). If you need test files or mapping, please let me know.
Kind regards,
Christiaan Schaake.

Update after carefully reading all the UTF related documents on the internet.
For UTF-16 the BOM is required and the adapter is handling this correctly. (encoding=UTF-16 will create the BOM).
For UTF-16LE and UTF-16BE the BOM must not be set. The application should be able to handle the conversion. The adapter is working correct again.
If the adapter is set to binary mode in stead of the text mode, the file will always be read correctly.
About the mapping issue, I'm still experimenting with this one.
Kind regards,
Christiaan Schaake.

When creating new table in sqllite db via Flex it become encoded as "utf-16le"

Hi Guys
I have some annoying problem with my AIR application
The application is communicating with a local DB (sqllite).
as part of initial installation I'm checking if the db exist.
in case not then:
I create one (file)
create the relevent tables inside
and populate them.
For some reason, on the tables creation step the sqllite db become encoded as UTF-16le instead of UTF-8.
The question is how can I make the tables creation step to leave the db as UTF-8
thanks in advance for your help.
This is my creation code
the "connection" is from flash.data.SQLConnection type
The "file" contain the following information
<sql>
<statement>
CREATE TABLE IF NOT EXISTS MYTABLE
      MYTABLE_VERSION                NUMBER NOT NULL,
       MYTA|BLE_INSERT_DATE                 DATE NOT NULL
</statement></sql>
The below is the relevent code:
var stream:FileStream = new FileStream();
            stream.open(file, FileMode.READ);
            var xml:XML = XML(stream.readUTFBytes(stream.bytesAvailable));
            stream.close();
            var statement:XML = null;
            try
                connection.begin(lockType);
                for each (statement in xml.statement)
                    var stmt:SQLStatement = new SQLStatement();
                    stmt.sqlConnection = connection;
                    stmt.text = statement;
                    stmt.execute();
            } catch(err:Error)
                connection.rollback();
                throw err;
            connection.commit();

It doesn't look like you're using DBSequence domain for the OrderLinesId attribute. If you are then you do not need to fill in the sequence as you've done in the create method.
Getting back to create issue, You may want to set the 'order' id (foreign key) values before calling super and then call the getOrder() (or getXXX where XXX is the order accessor in this entity) method to verify if the order of the given ID exists/found in the cache.
By the way, are you also using a similar create() in order with DBSequence as the type for the pK and you force a sequence value on top of it via setAttribute?
Yes, this is the create method inside CrpOrderLinesImpl.java
protected void create(AttributeList attributeList) {
super.create(attributeList);
SequenceImpl s = new SequenceImpl("CRP_ORDER_LINES_ID_SEQ", getDBTransaction());
setAttribute("OrderLinesId",s.getSequenceNumber());
Thanks,
Brad

Why are newer versions of Firefox having problems with UTF-16le (Windows default unicode set)

I have a website that has multiple languages on it, and so I've been using UTF-16le for it. Everything was working well on multiple browsers until the last few months, when only Firefox stopped displaying it properly. I can force the page into UTF-16le, but then some of my graphical links no longer work and I cannot navigate through the pages unless I force every single page to UTF-16le EVERY SINGLE TIME. This problem is not unique to my computer, either, as this has happened with every computer I have tried in the last few months.

As answered before a few weeks back [[/questions/770955 *]]: the server sends the pages as UTF-8 and that is what Firefox uses to display the pages. You need to reconfigure the server and make them send the pages with the correct content type (UTF-16) or with no content type at all if you want Firefox to use the content type (BOM) in the file.
A good place to ask questions and advice about web development is at the mozillaZine Web Development/Standards Evangelism forum.<br />
The helpers at that forum are more knowledgeable about web development issues.<br />
You need to register at the mozillaZine forum site in order to post at that forum.<br />
See http://forums.mozillazine.org/viewforum.php?f=25

Message Mapping Problem with UTF-16LE Encoded XML

Hello,
we have the following scenario:
IDoc > BPM > HTTP Sync Call > BPM > IDoc
Resonse message of the HTTP call is a XML file with UTF-16LE processing instruction. This response should then be mapped to a SYSTAT IDoc. However the message mapping fails "...XML Parser: No data allowed here ...".
So obviously the XML is not considered as well-formed.
When taking a look at SXMB_MONI the following message appears: "Switch from current encoding to specific encoding not supported.....".
Strange thing however is if I save the response file as XML and use the same XML file in the test tab message mapping is executed successfully.
I also tried to use a Java Mapping to switch encodings before executing message mapping, but the error remains.
Could the problem be, that the codepage UTF-16LE is not installed on the PI system ? Any idea on that ?
Thank you!
Edited by: Florian Guppenberger on Feb 2, 2010 2:29 PM
Edited by: Florian Guppenberger on Feb 2, 2010 2:29 PM

Hi,
thank your for your answer.
This is what I have tried to achieve. I apply the java conversion mapping when receiving the response message - i tried to convert the response to UTF-16, UTF-8 but none of them has helped to solve the problem.
I guess that using adapter modules is not an option either as it would modify the request message, but not the response, right?

Encoding Issue : JMS and Mapping : utf-8 iso8859-1

Hi All,
I am facing some problem with encoding issue.
Scenario : JMS --> SAP PI --> JMS
Requirment : Input plain text file contain some special characters,"©®" . Based on this condition,In Java Mapping
                   we check the Payload and changed the 'encoding' tag to UTF-8 or   ISO8859-1.
               : <?xml version="1.0" encoding="UTF-8"?>     in the target XML output.
While testing in Operation mapping our Java mapping works fine. as the encodeing tag changes from
             UTF-8 to ISO8859-1 if the special character exists.But if I test the same in Integration Directory(Test Configuration)
             or did a end to end testing. The encoding tag did'nt changes.
For testing we had to a set of Plain Text files with UTF-8 and ISO8859-1 .
I tried the options of using beans in Adapter modules in Sender JMS channel.
MessageTransformBean, TextCodepageConversionBean, XmlAnonymizerBean
These doc & threads ,was also referred[How to Handle Encoding in PI|http://www.sdn.sap.com/irj/scn/index?rid=/library/uuid/502991a2-45d9-2910-d99f-8aba5d79fb42]
Regards,
Ashutosh R

Hi
public static boolean fixSpecialCharforWeb(String text) {
        int i = 0;
        Character c = null;
        char[] ctext = null;
        StringBuffer newText = new StringBuffer("");
        //boolean encodingType = false;
        if ((text == null) || (text.trim().length() == 0)) {
            return encodingType;
        } else {
            try {
                               for (i = 0; i < text.trim().length(); i++) {
                    ctext = text.trim().substring(i, i + 1).toCharArray();
                    c = new Character(ctext[0]);
                    //Single quote
                    if ((text.trim().substring(i, i + 1).equals("'")) || (c.hashCode() == 8217) || (text.trim().substring(i, i + 1).equals("?")) || (c.hashCode() == 146) || (c.hashCode() == 145)) {
                        //newText.append("'");
                        encodingType = true;
                        return encodingType;
                    //Double quotes
                    if ((c.hashCode() == 8220) || (c.hashCode() == 8221) || (c.hashCode() == 147) || (c.hashCode() == 148)) {
                        //newText.append(""");
                        encodingType = true;
                        return encodingType;
                    // bullet point
                    if ((c.hashCode() == 8226) || (c.hashCode() == 149)){
                        encodingType = true;
                        return encodingType;
                    // tilde
                    if ((c.hashCode() == 732) || (c.hashCode() == 152)){
                        encodingType = true;
                        return encodingType;
                    // Soft Hypen
                    if (c.hashCode() == 173){
                        encodingType = true;
                        return encodingType;
                    // En-Dash
                    if ((c.hashCode() == 8211) || (c.hashCode() == 150)) {
                        encodingType = true;
                        return encodingType;
                    // Em-Dash
                    if ((c.hashCode() == 8212) || (c.hashCode() == 151)) {
                        encodingType = true;
                        return encodingType;
                    // Euro Sign
                    if ((c.hashCode() == 8364) || (c.hashCode() == 128)) {
                        encodingType = true;
                        return encodingType;
                    // Yen Sign
                    if (c.hashCode() == 165) {
                        encodingType = true;
                        return encodingType;
                    // Pound Sign
                    if (c.hashCode() == 163) {
                        encodingType = true;
                        return encodingType;
                    // 1/2 sign
                    if (c.hashCode() == 189) {
                        encodingType = true;
                        return encodingType;
                    // 1/4 sign
                    if (c.hashCode() == 188) {
                        encodingType = true;
                        return encodingType;
                    // 3/4 sign
                    if (c.hashCode() == 190) {
                        encodingType = true;
                        return encodingType;
                    // Sword/dagger
                    if ((c.hashCode() == 8224) || (c.hashCode() == 134)) {
                        encodingType = true;
                        return encodingType;
                    // Trademark
                    if ((c.hashCode() == 8482) || (c.hashCode() == 153)) {
                        encodingType = true;
                        return encodingType;
                    // Ampersand &
                    if ((text.trim().substring(i, i+1).equals("&")) || (c.hashCode() == 38)) {
                        encodingType = true;
                        return encodingType;
                    //Registered mark
                    if ((text.trim().substring(i, i + 1).equals("?")) || (c.hashCode() == 174)) {
                        //newText.append("®");
                        encodingType = true;
                        return encodingType;
                    //Copyright mark
                    if ((text.trim().substring(i, i + 1).equals("?")) || (c.hashCode() == 169)) {
                        encodingType = true;
                        return encodingType;
                    // Question.
                    if (c.hashCode() == 63 && c.toString().equals("?")){
                        //newText.append("?");
                        encodingType = true;
                        return encodingType;
                    //handling symbol ?
                    if ((text.trim().substring(i, i+1).equals("?")) || (c.hashCode() == 233)) {
                        encodingType = true;
                        return encodingType;
                    if ((text.trim().substring(i, i+1).equals("?")) || (c.hashCode() == 232)) {
                        encodingType = true;
                        return encodingType;
                    if (c.hashCode() == 144) {
                        encodingType = true;
                        return encodingType;
            } catch (Exception e) {
                e.printStackTrace();
        return encodingType;

Why did TB set the char encoding for a reply to charset=UTF-16LE ?

I got a message from Google AdWords in HTML format and wrote a reply. When I sent it I got a timeout trying to send it. I use AVG antivirus to scan outgoing messages using a "local server" at address 127.0.0.1.
Note that this was my second reply to such a message from AdWords. The only thing I could see that was different was a longer subject line (ending with three periods or dots) and a longer HTML message to which I was replying.
I tried many things to fix the problem and I cannot remember them, sorry.
Finally, I got success after truncating the message and the long subject line. The reply was sent instantly, as usual, instead of timing out.
But it wasn't really success. When I looked at the sent message, the characters in my reply (only) were in Chinese. Looking at the raw (source) message, I see that the charset was set as follows: Content-Type: text/plain; charset=UTF-16LE; format=flowed .
This seems like a strange charset; nowhere in my settings do I specify anything other than Western (ISO-8859-1).
I finally was able to send the message successfully (I think) by using Options > Character Encoding > Western (ISO-8859-1), which seems to force the message to be sent using this standard encoding instead of Little Endian.
What caused this problem to happen? Is there a TB overflow bug for long subject lines?
I realize that TB is an old and unsupported product, but it seems to be the only good email client to use with Windows 8, so I'm just hoping someone knows something about this.

Originally posted by: warren.tang.nospam.com
Warren Tang wrote:
> Warren Tang wrote:
>> Warren Tang wrote:
>>> Hi everybody,
>>>
>>> I've been trying to set the default encoding of new files as UTF-8.
>>> Here are the two settings I've set:
>>>
>>> 1. Windows > Preferences > General > Content Types, set UTF-8 as the
>>> default encoding for all content types.
>>> 2. Windows > Preferences > General > Workspaces, set "Text file
>>> encoding" to "Other : UTF-8".
>>>
>>> However when I create a new text file, the encoding is always
>>> ANSI/ISO-8859-1. What did I missed? Thanks.
>>>
>>> Regards,
>>> Warren
>>
>> I've also tried
>> Project Properties > Resource > Text file encoding = UTF-8
>> However it doesn't work either.
>>
>> The only thing that works is changing the file's encoding property,
>> but I don't want to change it every time I create a new file.
>>
>> Is it a bug?
>
> It turns out that there are other places I need to set up for HTML and
> CSS files:
>
> Windows > Preferences > Web > CSS Files > Encoding = UTF-8
> Windows > Preferences > Web > HTML Files > Encoding = UTF-8
I'm getting mad... The file (on the disk) is still not encoded in UTF-8
but ANSI.

Manually adding BOM to UTF-16LE file?

hi.
i have a bash script that needs to preform something on a string from standard input, save it in a file and convert the file to UTF-16LE with BOM for further processing by another application.
i use iconv to convert the text file to UTF-16LE, but iconv actually creates a little-endian file WITHOUT the bom. (converting to UTF-16 creates a big endian file WITH bom)
i see no way of creating LE with BOM with iconv, so i thought maybe i could simply add the byte-order marks (FF FE) to the beginning of the unicode file. how can i do that?
many thanks in advance
tench

If you want to do everything from within bash script, then you can use something like
{code}
#!/bin/sh
# I think xpg_echo is ON by default, but just in case...
shopt -s xpg_echo
cat > infile
# assume the input is in UTF-8
(echo '\xFF\xFE\c'
iconv -f UTF-8 -t UTF-16LE infile) > outfile
{code}
Of course use of infile can be omitted if you don't need it.

Need help to read and write using UTF-16LE

Hello,
I am in need of yr help.
In my application i am using UTF-16LE to export and import the data when i am doing immediate.
And sometimes i need to do the import in an scheduled formate..i.e the export and imort will happend in the specified time.
But in my application when i am doing scheduled import, they used the URL class to build the URL for that file and copy the data to one temp file to do the event later.
The importing file is in UTF-16LE formate and i need to write the code for that encoding formate.
The problem is when i am doing scheduled import i need to copy the data of the file into one temp place and they doing the import.
When copying the data from one file to the temp i cant use the UTF-16LE encoding into the URL .And if i get the path from the URl and creating the reader and writer its giving the FileNotFound exception.
Here is the excisting code,
protected void copyFile(String rootURL, String fileName) {
URL url = null;
try {
url = new URL(rootURL);
} catch(java.net.MalformedURLException ex) {
if(url != null) {
BufferedWriter out = null;
BufferedReader in = null;
try {
out = new BufferedWriter(new FileWriter(fileName));
in = new BufferedReader(new InputStreamReader(url.openStream()));
String line;
do {
line = in.readLine();
if(line != null) {
out.write(line, 0, line.length());
out.newLine();
} while(line != null);
in.close();
out.close();
} catch(Exception ex) {
Here String rootURL is the real file name from where i have to get the data and its UTF-16LE formate.And String fileName is the tem filename and it logical one.
I think i tried to describe the problem.
Plz anyone help me.
Thanks in advance.

Hello,
thanks for yr reply...
I did the as per yr words using StreamWriter but the problem is i need a temp file name to create writer to write into that.
but its an logical one and its not in real so if i create Streamwriten in that its through FileNotFound exception.
The only problem is the existing code build using URL and i can change all the lines and its very difficult because its vast amount of data.
Is anyother way to solve this issue?
Once again thanks..

Translation - XLF file with the UTF-8 BOM at the head of the file

Using APEX 3.2, I'm trying to get translations loaded for an application that I'm working on. We are going through the process of generating XLIFF files, sending them to a translator to translate, and then importing the data back in rather than manually editing the translations.
This works perfectly when I use my text editor to enter the (very poor) translations in the file. Our translator, however, uses a tool that adds a byte-order mask BOM at the head of the XLIFF file when it saves it in order to identify it as a UTF-8 encoded file. This appears to be a valid way for the tool to indicate the encoding being used though, obviously, the encoding in the header overrides it and the BOM is not the preferred method of specifying the encoding of an XML file. However, when we import the file in APEX, and hit the Apply XLIFF button, we get an error
ORA-31011: XML parsing failed ORA-19202: Error occurred in XML processing LPX-00210: expected '<' instead of '¿' Error at line 1
The third byte of the UTF-8 BOM (0xEF,0xBB,0xBF), if interpreted as an ISO-8859-1, would be the upside down question mark character in the error message. So it appears that the APEX importer is interpreting the BOM using the ISO-8859-1 character set and throwing an error that the file is not valid. That is not my reading of the proper XML parser behavior.
Obviously, we can work around the problem either by asking our translator to use a different tool or by opening up the files we get back in a hex editor and removing the BOM. Is there anything we can do to resolve the issue that doesn't involve adding an additional manual step to the process or the translator changing the tool being used? Or is there something I'm missing that would indicate that a file with the BOM in the first three bytes should not be considered a valid XML file?
Justin

Hello Justine,
>> Are you stating that that APEX translation engine more or less requires the database character set to be AL32UTF8 in order to work properly …
Not exactly.
The APEX translation engine uses the XML DB parser, and this one needs a legal XML file. The problem is that in your specific case, the encoding of the XLIFF file doesn’t match the database character set, hence a character set conversion process is involved in the middle. This conversion process – from UTF-8 to WE8MSWIN1252 embeds illegal XML characters in your stored XML file, right where the BOM is. If, for example, your XLIFF file were encoded in Windows-1252, no character conversion would be necessary and the final (database stored) version of the XLIFF would have been remained legal (assuming it started this way). In this case, the APEX translation mechanism will work just fine, even with a database character set of WE8MSWIN1252 (as you can see when you are using your own XLIFF files, which don’t include the (offending) BOM byte).
>> … even if an alternate character set encodes all the characters we need?
With your current database configuration, it seems that the alternate character set you are talking about can’t really encode all the characters that you need, as it can’t cope with the UTF-8 BOM conversion. It very well may be the only character it can’t handle, but that is enough.
As we need to be practical, and a database character set conversion is most likely out of the question, the simple solution is probably asking your translator to save your XLIFF files without including the BOM. I’m not familiar with the professional XLIFF editors you mentioned, however, as UTF-8 encoding doesn’t really need a BOM, saving a UTF-8 file without it should be an option (just like Keith mentioned with regards to very simple editors like the windows’ Notepad, or Notepad++).
Regards,
Arie.
&diams; Please remember to mark appropriate posts as correct/helpful. For the long run, it will benefit us all.
&diams; Author of Oracle Application Express 3.2 – The Essentials and More

CONVERSION FROM ANSI ENCODED FILE TO UTF-8 ENCODED FILE

Hi All,
I have some issues in conversion of ANSI encoded file to utf encoded file. let me tell you in detail
I have installed the Language Support for Thai Language on My Operating System.
now, when I open my notepad and add thai character on the file and save it as ansi encoding. it saves it perfectly and also I able to see it on opening the file again.
This file need to be read by my application , store in database and should display thai character on jsp after fetching the data from database. Currently it is showing junk character on jsp reason being that my database (UTF8 compliant database) has junk data . it has junk data because my application is not able to read it correctly from the file.
If I save the file with encoding as UTF 8 it works fine. but my business requirement is such that the file is system generated and by default it is encoded in ANSI format. so I need to do the conversion of encoding from ANSI to UTF8 . so Any of you can guide me on the same how to do this conversion ?
Regards
Gaurav Nigam

Guessing the encoding of a text file by examining its contents is tricky at best, and should only be done as a last resort. If the file is auto-generated, I would first try reading it using the system default encoding. That's what you're doing whenever you read a file with a FileReader. If that doesn't work, try using an InputStreamReader and specifying a Thai encoding like TIS-620 or cp838 (I don't really know anything about Thai encodings; I just picked those out of a quick Google search). Once you've read the file correctly, you can write the text to a new file using an OutputStreamWriter and specifying UTF-8 as the encoding. It shouldn't really be necessary to transcode files like this, but without knowing a lot more about your situation, that's all I can suggest.
As for native2ascii, it isn't for encoding conversions. All it does is replace each non-ASCII character with its six-character Unicode escape, so "voilá" becomes "voil\u00e1". In other words, it avoids the problem of character encodings by converting the file's contents to a form that can be stored as ASCII. It's mainly used for converting property or resource files to a form that can be read by the Properties and ResourceBundle classes.

Receiver File Adapter - Encoding issue.

Hi Everybody,
The file format (encoding) is different to the format generally we used to get.
Currently we are get the flat files in DOS format.The current file when we are downloading it we are getting it in the UNIX or other format.
For eg: 20 has been changed to 0D in the file.
Can somebody help me on the same.
Thanks,
Zabiulla

Hi,
Check on this for file adapters
Text
Under File Encoding, specify a code page.
The default setting is to use the system code page that is specific to the configuration of the installed operating system. The file content is converted to the UTF-8 code page before it is sent.
Permitted values for the code page are the existing Charsets of the Java runtime. According to the SUN specification for the Java runtime, at least the following standard character sets must be supported:
■       US-ASCII
Seven-bit ASCII, also known as ISO646-US, or Basic Latin block of the Unicode character set
■       ISO-8859-1
ISO character set for Western European languages (Latin Alphabet No. 1), also known as ISO-LATIN-1
■       UTF-8
8-bit Unicode character format
■       UTF-16BE
16-bit Unicode character format, big-endian byte order
■       UTF-16LE
16-bit Unicode character format, little-endian byte order
■       UTF-16
16-bit Unicode character format, byte order
Regards
Vijaya

Premiere and Media Encoder CC encoding issue

Hi all,
I am having an encoding issue with PP and ME CC. My video assets are fine, and on the timeline they appear how they should, but when I look at the rendered h264 video there are encoding errors in the video. I have attached two images, the black is how it should look and the white is the error. The video plays fine and then it flickers between the images shown.
It has done this on a few different videos I have rendered over the last few days and I don't know why. It also happens to a different machine on CC as well. Does anyone have any suggestions?

Hi James,
I've never seen this before. Can you give us more info? Answer all the questions on this FAQ: What information should I provide when asking a question on this forum?
Thanks,
Kevin

Encoding issue - having UTF-16LE BOM "FF FE"

Similar Messages

Maybe you are looking for