Specifying Character encoding while parsing

Hi
I have an XML document that contains a particular unicode character. After i parse it with Xerces DOM PArser i find that the character is changed to some other character.
Any idea what could be the reason ? Also how do i overcome this problem ?
Thanks in advance

you specify that on the top of your XML document
for example:
<?xml version="1.0" encoding="UTF-8"?>

Similar Messages

Why does Firefox 18 ignore the specified character encoding for websites?

We are developing a page on our website that will have the page crawled and a newsletter generated and sent out to a mailing list. Many email packages default to character encoding of iso-8859-1 so we have set our character encoding to this on the page via the standard meta tag.
We have a problem on the newsletters that we had until now been unsuccessful to replicate. Though now I know why.... I have just discovered that in Firefox 18, the specified character encoding is being completely ignored. It is rendering the page in UTF-8 even though we specified ISO-8859-1. Firefox 3.6 however, renders the page with the proper encoding (thank god for keeping an old version for testing).
Can anyone explain why the new Firefox is completely ignoring the meta tag? Both browsers are using the factory default (I even opened FF18 in safe mode)...

Thanks for letting me know that Firefox 18 ignores everything but the server headers... but it doesn't help me much. Our website is in UFT-8... but this page is a newsletter, one that is crawled and saved into an email and sent out to a mailing list (by a third party newsletter program) and many email readers use ISO-8859-1 hence why we want to have the page rendered in that encoding so that we can actually test the newsletter properly. We can't test through the third party software as our testing environment is behind a firewall, and you can't change the server headers for a single page... hence the meta tag.
If you explicitly choose to render a page in a specific encoding, that shouldn't be ignored by the browser. It's not a big deal, but now every time we make a code change in our test environment and reload the page we have to force the encoding manually in the browser which is a pain.
The problem is, the newsletter is already live and we have some users complaining because some characters aren't displaying properly in their email packages (Entourage for Mac is one of them), all our testing (which is encoding using UTF-8) looks fine.

Character Encoding (C++ Parser)

How do I set encoding to ISO-8859-2 when using the XMLParser for C++. I know there is a parameter in the function
XMLParser::xmlparse(oratext doc, oratext encoding, ub4 flags), but I don't know what value to take for the parameter encoding (default is 0).
Thanks
Dirk
null

Hi,
<BLOCKQUOTE><font size="1" face="Verdana, Arial">quote:</font><HR>Originally posted by Oracle XML Team ( [email protected]):
Generally, the encoding is set in the XML file. For example,
<?xml version="1.0" encoding="ISO-8859-2"?>
To parse the XML file in a different encoding, use the xmlparse function. For
example,
ecode = xmlparse((oratext *) DOCUMENT,
oratext *("ISO-8859-2"), flags);
Currently, the C++ parser resolves encoding using the NLS data files from a Oracle database install. In the next release, we will also provide stand alone NLS data file.<HR></BLOCKQUOTE>
null

How do I specify character set encoding?

Hi,
I already defined sub-class META from DOCUMENT class thru xml.
Now I am trying to upload META files thru xml and its working fine.
But when I see properties from web, I don't see "Character Set" define.
As a result I can't use BufferedReader(doc.getContentReader())
Exception in thread "main" oracle.ifs.common.IfsException: IFS-32245: Cannot con
vert content to Reader - content does not have an associated character encoding.
at oracle.ifs.server.S_Media.convertInputStreamToReader(S_Media.java:1960)
at oracle.ifs.server.S_Media.getReaderFromObject(S_Media.java:1423)
at oracle.ifs.server.S_Media.getContentReader(S_Media.java:1408)
How can I specify encoding character set from xml?
Of if possible can I specify while defining class(META in this case).
Thanks,
-Niraj

Luis,
Each take encoding name arguments using the Java naming conventions.
If you want to do this automatically when you upload your XML file, you
will need to write a server-side representation of your "META" ClassObject
and provide an 'extendedPostInsert' implementation. In this method, you can
parse the XML content and pull out an encoding if you specify it within the
XML file. Could you explain in little more details?
If you can provide me links to any resouces, that would be great.
Thanks,
-Niraj

Issue while parsing the chinese character from Mime Message

Hi,
I have a issue with the chinese characters while parsing the mime message (MimeBodypart). In the MimeMessage charset is mentioned as "gb2312". i am using the MimeBodyPart.getContent() to get the content. When mimetype is html, it will be uploaded as a file to an FTP site (wapache commons net - ftp client). When uploaded file is viewed, the content is displayed as garbage text.
i tried the following but it didnt work. i got the inputstream from the Mimebodypart. and then created InputStreamReader and used the encoding "GB18030" while initializing the content. i got the String out of it and stored in the file. i just replaced "Gb2312" with "UTF-8" in the html string. While creating the output file, i used the UTF-8 encoding. when opened this file using IE, it is displaying the character without any issues. i examined the file and the file encoding is UTF-8 as expected.
but when i upload the file to FTP site and view, the text is not displayed correctly. It seems the file encodig is ANSI. i used the Notepad++ to examine these files. Please note that we use apache comments net - FTp client to upload the file.
below are my questions:
am i doing the right thing? it seems mime message was created using outlook.
How to upload a file to FTp withe file base encoding is "UTF-8" or some other ?
below are few references
http://www.anyang-window.com.cn/tag/java-gb2312/
JavaMail: Chinese Simplified Character Problem

Thank you for the Replies. i am using the binary mode and it works fine for most of the files. i found that the issue here is not while uploading but the content itself. the characters present in the Mimemessge is not as per the charset. Hence i could not upload the content as it is. This happens only when charset is GB2312 (chinese). it seems that Mimemessage contains the characters which cannot be represented by Gb2312 but can be represented by Gb18030. Hence i converted the content using from Gb18030 to UTF-8 and created a file. i used the SetControlEncoding("utf-8") to upload the file and it is working fine.

"character conversion error" while parsing xml files

Hello,
I'm trying to parse MusicXML (Recordare) files, but I'm getting an exception.
I'm using the SAX parser (javax.xml.parsers.SAXParser).
Here is the code I use to instantiate it:
final javax.xml.parsers.SAXParserFactory saxParserFactory = javax.xml.parsers.SAXParserFactory.newInstance();
final javax.xml.parsers.SAXParser saxParser = saxParserFactory.newSAXParser();
final org.xml.sax.XMLReader parser = saxParser.getXMLReader();
I'm using my own handler, but I get the same exception even if I use org.xml.sax.helpers.DefaultHandler.
The error I get is:
Character conversion error: "Illegal ASCII character, 0xc2" (line number may be too low).
The first few lines of my xml files look like this:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE score-partwise
PUBLIC "-//Recordare//DTD MusicXML 0.6 Partwise//EN"
"http://www.musicxml.org/dtds/partwise.dtd">
<score-partwise>
[...etc...]
If I delete the <!DOCTYPE ...> line, then I don't get the exception anymore. But the MusicXML files I get (from some other program) always contain this line, and it would be quite some work to delete them from every file manually.
So does anyone know if there is a way to avoid deleting that line in every file, while still being able to parse the xml files without exceptions?
Or maybe does anyone know what the exact cause of the exception is? (because I don't know what exactly causes it)
Thank you in advance.
Greetz,
Jipo

So does anyone know if there is a way to avoid
deleting that line in every file, while still being
able to parse the xml files without exceptions?ok this is side-stepping the real problem but I've used this code to filterout DTD references for other reasons   public static InputStream filterOutDTDRef(InputStream in) throws IOException {
      BufferedReader iniReader = new BufferedReader(new InputStreamReader(in));
      StringBuffer newXML = new StringBuffer();
      for(String line = iniReader.readLine(); line!=null; line = iniReader.readLine())
         newXML.append(line+"\n");
      in.close();
      int s = newXML.indexOf("<!DOCTYPE ");
      if(s!=-1)
         newXML.replace(s,newXML.indexOf(">",s)+1,"");
      return new ByteArrayInputStream(newXML.toString().getBytes());
   }and it actually speeds up the parsing phase too (since the DTD ref.s were on the web and the XML standard mandates that there is a fetch for each xml file parsed..)
you can feed the above into the InputSource constructor that takes an InputStream argument.
Now for the real problem... 0xc2 is "LATIN CAPITAL LETTER A WITH CIRCUMFLEX" according to a unicode chart - which is not an ASCII character (as the error message correctly reports). I'm not sure why the file is being parsed as ASCII though? You could try parsing in a FileReader to the inputsource and hope it picks up the default character encoding of your system, and that that character encoding matches the file. Or you could try passing in a FileReader constructed with a explicit character encoding (eg "UTF8") and see if that does the trick?
asjf

XML parser not detecting character encoding

Hi,
I am using Jdeveloper 9.0.5 preview and the same problem is happening in our production AS 9.0.2 release.
The character encoding of an xml document is not correctly being detected by the oracle v2 parser even though the xml declaration correctly contains
<?xml version="1.0" encoding="ISO-8859-1" ?>
instead it treats the document as UTF8 encoding which is fine until a document comes along with an extended character which then causes a
java.io.UTFDataFormatException: Invalid UTF8 encoding.
at oracle.xml.parser.v2.XMLUTF8Reader.checkUTF8Byte(XMLUTF8Reader.java:160)
at oracle.xml.parser.v2.XMLUTF8Reader.readUTF8Char(XMLUTF8Reader.java:187)
at oracle.xml.parser.v2.XMLUTF8Reader.fillBuffer(XMLUTF8Reader.java:120)
at oracle.xml.parser.v2.XMLByteReader.saveBuffer(XMLByteReader.java:448)
at oracle.xml.parser.v2.XMLReader.fillBuffer(XMLReader.java:2023)
at oracle.xml.parser.v2.XMLReader.tryRead(XMLReader.java:972)
at oracle.xml.parser.v2.XMLReader.scanXMLDecl(XMLReader.java:2589)
at oracle.xml.parser.v2.XMLReader.pushXMLReader(XMLReader.java:485)
at oracle.xml.parser.v2.XMLReader.pushXMLReader(XMLReader.java:192)
at oracle.xml.parser.v2.XMLParser.parse(XMLParser.java:144)
as you can see it is explicitly casting the XMLUTF8Reader to perform the read.
I can get around this by hard coding the xml input stream to be processed by a reader
XMLSource = new StreamSource(new InputStreamReader(XMLInStream,"ISO-8859-1"));
however the manual documents that the character encoding is automatically picked up from the xml file and casting into a reader is not necessary, so I should be able to write
XMLSource = new StreamSource(XMLInStream)
Does anyone else experience this same problem?
having to hardcode the encoding causes my software to lose flexibility.
Jarrod Sharp.

An XML document should be created with 'ISO-8859-1' encoding to be parsed as 'ISO-8859-1' encoding.

"Error while parsing SOAP XML payload: no element found" received when invoking Web Service

Running PB 12.1 Build 7000. Using Easysoap. Error ""Error while parsing SOAP XML payload: no element found" received when invoking Web Service". This error does not appear to be coming from the application code. Noticed that there were some erroneous characters showing up within the header portion of the XML ("&Quot;"). Not sure where these are coming from. When I do a find within the PB code for """" it gets located within two objects, whereas they both reference a "temp_xml_letter". Not sure where or what temp_xml_letter resides???   The developer of this is no longer with us and my exposure to WSDL and Web Services is rather limited. Need to get this resolved...please.
This is the result of the search. Notice the extraneous characters ("""):
dar1main.pbl(d_as400_mq_xml)
darlettr.pbl(d_email_xml)
---------- Search: Searching Target darwin for 'temp_xml'    (9:52:41 AM)
---------- 2 Matches Found On "temp_xml":
dar1main.pbl(d_as400_mq_xml).d_as400_mq_xml: export.xml(usetemplate="temp_xml_letter" headgroups="1" includewhitespace="0" metadatatype=0 savemetadata=0 template=(comment="" encoding="UTF-8" name="temp_xml_letter" xml="<?xml version=~"1.0~" encoding=~"UTF-16LE~" standalone=~"yes~"?><EmailServiceTransaction xmlns=~"http://xml.xxnamespace.com/Utility/Email/EmailService" ~" xmlns:imc=~"http://xml.xxnamespace.com/IMC~" xmlns:xsi=~"http://www.w3.org/2001/XMLSchema-instance~" xmlns:root=~"http://xml.xxnamespace.com/RootTypes~" xmlns:email=~"http://xml.xxnamespace.com/Utility/Email~" xsi:schemaLocation=~"http://xml.xxnamespace.com/Utility/Email/EmailService http://dev.xxnamespace.com/Utility/Email/EmailService/V10-TRX-EmailService.xsd~"><EmailServiceInformation><EmailServiceDetail __pbband=~"detail~"><ApplicationIdentifier> applicationidentifier </ApplicationIdentifier><AddresseeInformation><AddresseeDetail><Number> number </Number></AddresseeDetail></AddresseeInformation><EmailMessageInformation><Ema
darlettr.pbl(d_email_xml).d_email_xml: export.xml(usetemplate="temp_xml_letter" headgroups="1" includewhitespace="0" metadatatype=0 savemetadata=0 template=(comment="" encoding="UTF-8" name="temp_xml_letter" xml="<?xml version=~"1.0~" encoding=~"UTF-16LE~" standalone=~"yes~"?><EmailServiceTransaction xmlns=~"http://xml.xxnamespace.com/Utility/Email/EmailService" ~" xmlns:imc=~"http://xml.xxnamespace.com/IMC~" xmlns:xsi=~"http://www.w3.org/2001/XMLSchema-instance~" xmlns:root=~"http://xml.xxnamespace.com/RootTypes~" xmlns:email=~"http://xml.xxnamespace.com/Utility/Email~" xsi:schemaLocation=~"http://xml.xxnamespace.com/Utility/Email/EmailService http://dev.xxnamespace.com/Utility/Email/EmailService/V10-TRX-EmailService.xsd~"><EmailServiceInformation><EmailServiceDetail __pbband=~"detail~"><ApplicationIdentifier> applicationidentifier </ApplicationIdentifier><AddresseeInformation><AddresseeDetail><Number> imcnumber </Number></AddresseeDetail></AddresseeInformation><EmailMessageInformation><Ema
---------- Done 2 Matches Found On "temp_xml":
---------- Finished Searching Target darwin for 'temp_xml'    (9:52:41 AM)

Maybe "extraneous" is an incorrect term. Apparantly, based upon the writeup within Wiki, the parser I am using does not interpret the """? How do I find which parser is being utilized and how to control it?
<<<
If the document is read by an XML parser that does not or cannot read external entities, then only the five built-in XML character entities (see above) can safely be used, although other entities may be used if they are declared in the internal DTD subset.
If the document is read by an XML parser that does read external entities, then the five built-in XML character entities can safely be used. The other 248 HTML character entities can be used as long as the XHTML DTD is accessible to the parser at the time the document is read. Other entities may also be used if they are declared in the internal DTD subset.
>>>

XML validation error while parsing MXI Manifest

Hi,
I have created an hybrid extension for Photoshop. I want to upload my extension on Adobe Exchange. during the upload process I get an error,
"XML validation error while parsing MXI Manifest: Declarations can only occur in the doctype declaration. Line: 19 Position: 791 Last 80 unconsumed characters".
The error description specifies that description in MXI file is not valid. Below are the contents of my MXI file.
<macromedia-extension
           name="yyy"
           id="com.yyy"
           version="1.0.0"
           type="object"
           requires-restart="true">
          <author name="abcd" />
          <products>
          <product familyname="Photoshop" maxversion="" primary="true" version="12.0"/>
          </products>
<description>
          <![CDATA[
<p><font size="14" color="black"><b>abcd</b> qwertyuioipafgjhkjljljklkjl
<br><br>
Open Extension via: Photoshop top menu > Window > Extensions > abcd.
<br><br>
Online support at: <a href="http://www.abcd.com/help.php">http://www.abcd.com/help.php</a></font></p>
<br>]]>
</description>
<ui-access>
          </ui-access>
<license-agreement>
</license-agreement>
<files>
            <file destination="$ExtensionSpecificEMStore/com.abcd/html/abcd.html" products="" source="zxp-support/Description/abcd.html"/>
            <file destination="$ExtensionSpecificEMStore/com.abcd/html/abcd.png" products="" source="zxp-support/Description/abcd.png"/>
            <file destination="" file-type="CSXS" products="" source="abcd.zxp"/>
            <file destination="$automate" file-type="plugin" platform="mac" products="Photoshop" source="mac/abcd.plugin"/>
            <file destination="$automate" file-type="plugin" platform="win" products="Photoshop32" source="win32/abcd.8li"/>
            <file destination="$automate" file-type="plugin" platform="win" products="Photoshop64" source="win64/abcd.8li"/>
</files>
</macromedia-extension>
Can anyone please point out why am I getting the error?
Thanks

Hi CarlSun,
Thanks for the reply. I have made the changes suggested by you.
I have few queries:
1. Can we use attribute "source" in the description tag?
     I have created a local html page and specified it in source attribute. but the Extension Manager CS6 did not render the local html page and displayed      the following:
     No description avaliable. Click the following link for more details.
     "http://www.abcd.html". Is it possible to display a local html page in Extension Manager CS6?
2. Can I display an image (png) in CDATA under description tag? If yes, then can you please guide me how can I do so?
3. As suggested in tech notes MXI file must include UTF-8 encoding as header (<?xml version="1.0" encoding="UTF-8"?>). The MXI I am using does      not have this header. Do I need to include the header?
Thanks

Character encoding conversion for marshall/unmarshall?

Hello, Java Web Services gurus,
I am wondering if there is an easy/plugin-able way to do character encoding conversion transparently in the process of marshall/unmarshall.
Basically, my input/output will always be these UTF-8 XMLs. As the backend database is ISO encoded, I hope the result of unmarshall will give me ISO strings. And when it comes to marshall, the ISO strings can be transparently turned to UTF-8 XML response. Right now I'm using JAXB's annotations to parse XML into objects.
I understand there will be chars in the input file not able to get converted, if so, I'd be be expecting an error/exception that flags the failure
Hope I sound clear. This has been a headache for a while. Really hope someone may help out a bit. Thanks a million in advance

[Duplicate Post|http://forums.sun.com/thread.jspa?messageID=10971554&tstart=0#10971554]

Character Encoding in XML

Hello All,
I am not clear about solving the problem.
We have a Java application on NT that is supposed to communicate with the same application on MVS mainframe through XML.
We have a character encoding for these XML commands we send for communication.
The problem is, on MVS the parser is not understaning the US-ASCII character encoding. And so we are getting the infamous "illegal character error".
The main frame file.encoding=CP1047 and
NT's file.encoding = us-ascii.
Is there any character encoding that is common to these two machines: mainframe and NT.
IF it is Unicode, what is the correct notation for it.
Or is there any way for specifying the parsers to which character encoding should be used.
thanks,
Sridhar

On the mainframe end maybe something like-
FileInputStream fris = new FileInputStream("C:\\whatever.xml");
InputStreamReader is= new InputStreamReader(fris, "ASCII");//or maybe "us-ascii" "US-ASCII"
BufferedReader brin = new BufferedReader(is);
Or give inputstream/buffered reader to whatever application you are using to parse the xml. The input stream reader should allow you to set your encoding even if the system doesnt have the native encoding. Depends though on which/whose jvm using you are using jdk1.2 at least supports following on this page http://as400bks.rochester.ibm.com/pubs/html/as400/v4r4/ic2924/info/java/rzaha/javaapi/intl/encoding.doc.html

Error occurred while parsing: Start of root element expected.

This my code i use to try inserting xml data:
v_BFile := BFILENAME('XML_DIR',v_xmlfile);
DBMS_LOB.createTemporary(v_clob, cache=>false);
DBMS_LOB.open(v_bFile, DBMS_LOB.lob_readonly);
DBMS_LOB.loadFromFile(v_clob,v_BFile,DBMS_LOB.getLength(v_bfile));
DBMS_LOB.close(v_bfile);
v_parser := XMLPARSER.newParser;
XMLPARSER.parseClob(v_parser,v_clob);
On the parseClob is get the error message: Error occurred while parsing: Start of root element expected.
My xml is as follows
<?xml version="1.0" ?>
<EMPLOYEES>
<EMP>
<EMPNO>8000</EMPNO>
<ENAME>JONES</ENAME>
<JOB>DBA</JOB>
<MGR>7839</MGR>
<HIREDATE>07-MAY-2002</HIREDATE>
<SAL>100</SAL>
<COMM>10</COMM>
<DEPTNO>10</DEPTNO>
</EMP>
<EMP>
<EMPNO>8001</EMPNO>
<ENAME>SMITH</ENAME>
<JOB>PROG</JOB>
<MGR />
<HIREDATE>01-MAY-2002</HIREDATE>
<SAL>200</SAL>
<COMM>10</COMM>
<DEPTNO>10</DEPTNO>
</EMP>
</EMPLOYEES>
What do i do wrong. Can someone help me

Kurt
Can you answer the following questions
Which Release of the database are you using ?
What is the database character set ?
What is the character set encoding of the source document ?
Remember that the loadFromFile procedure is designed to load binary data, and does not convert data into the database character set. If the database character set is UTF8, then the CLOB data has to be UCS2.
Can you dump the contents of the CLOB
If you are using 9iR2 then you can use the new procedure GetCLOBFromFile which will perform the correct coversions.
Also in 9iR2 we would recommend the use of the DBMS_XMLPARSER package, rather than the XMLPARSER package. The reason for this is the DBMS_XMLPARSER makes use of a 'C' based parser, running as native compiled code, which XMLPARSER still uses the Java based version of the parser. Performance with DBMS_XMLPARSER is much better.
Hope this helps

Character encoding in Netweaver Developer Studio

Hi all!
I've migrated a EP5E Project to P6 and it worked fine. But now I use another workstation and while trying to open a java-file of migrated project I got
"Error Encoding Problem, this file is unreading using the UTF-8 character encoding".
The java-file contains german characters like "ä".
I'm using SAP NetWeaver Developer Studio Version: 2.0.5
Build id: 200404200353
Does anyone know, how to set Character Encoding in NetWeaver Developer Studio?
Thank You

I've found the solution:
Changing the encoding used to show the source
To change the encoding used by the Java editor to display source files:
With the Java editor open, select Edit > Encoding from the menu bar
Select an encoding from the menu or select Others and, in the dialog that appears, type in the encoding's name.
Note: this setting affects only the way the source is presented.
To change the encoding that the Java editor uses when saving files, specify a text file encoding preference on Window > Preferences > Workbench > Editors.

Error while parsing or executing XML-SQL document

friends,
my scenario is based on file to jdbc.i am facing an error in receiver CC in RWB.
The error states that '
Error while parsing or executing XML-SQL document: Error processing request in sax parser: Error when executing statement for table/stored proc. 'MATMAS' (structure 'STATEMENT'): java.sql.SQLException: [Microsoft][SQLServer 2000 Driver for JDBC][SQLServer]String or binary data would be truncated.'
My SOAP xml message is
- <SOAP:Envelope xmlns:SOAP="http://schemas.xmlsoap.org/soap/envelope/">
- <SOAP:Header>
- <sap:Main xmlns:sap="http://sap.com/xi/XI/Message/30" versionMajor="3" versionMinor="0" SOAP:mustUnderstand="1" xmlns:wsu="http://www.docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-utility-1.0.xsd" wsu:Id="wsuid-main-92ABE13F5C59AB7FE10000000A1551F7">
<sap:MessageClass>ApplicationMessage</sap:MessageClass>
<sap:ProcessingMode>asynchronous</sap:ProcessingMode>
<sap:MessageId>18f17dd0-d503-11dc-cb4d-001635b02bfd</sap:MessageId>
<sap:TimeSent>2008-02-06T22:30:21Z</sap:TimeSent>
- <sap:Sender>
<sap:Party agency="http://sap.com/xi/XI" scheme="XIParty" />
<sap:Service>ECC</sap:Service>
</sap:Sender>
- <sap:Receiver>
<sap:Party agency="http://sap.com/xi/XI" scheme="XIParty" />
<sap:Service>BS_JDBC</sap:Service>
</sap:Receiver>
<sap:Interface namespace="http://file_to_jdbc">MI_JDBC_RECEIVER</sap:Interface>
</sap:Main>
- <sap:ReliableMessaging xmlns:sap="http://sap.com/xi/XI/Message/30" SOAP:mustUnderstand="1">
<sap:QualityOfService>ExactlyOnce</sap:QualityOfService>
</sap:ReliableMessaging>
- <sap:DynamicConfiguration xmlns:sap="http://sap.com/xi/XI/Message/30" SOAP:mustUnderstand="1">
<sap:Record namespace="http://sap.com/xi/XI/System/File" name="Directory">
sapecc50\sapmnt\trans</sap:Record>
<sap:Record namespace="http://sap.com/xi/XI/System/File" name="FileEncoding">UTF-8</sap:Record>
<sap:Record namespace="http://sap.com/xi/XI/System/File" name="FileType">txt</sap:Record>
<sap:Record namespace="http://sap.com/xi/XI/System/File" name="FileName">matmas1.txt</sap:Record>
</sap:DynamicConfiguration>
- <sap:HopList xmlns:sap="http://sap.com/xi/XI/Message/30" SOAP:mustUnderstand="1">
- <sap:Hop timeStamp="2008-02-06T22:30:21Z" wasRead="false">
<sap:Engine type="AE">af.e6e.sapecc6eval</sap:Engine>
<sap:Adapter namespace="http://sap.com/xi/XI/System">XIRA</sap:Adapter>
<sap:MessageId>18f17dd0-d503-11dc-cb4d-001635b02bfd</sap:MessageId>
<sap:Info />
</sap:Hop>
- <sap:Hop timeStamp="2008-02-06T22:30:21Z" wasRead="false">
<sap:Engine type="IS">is.01.sapecc6eval</sap:Engine>
<sap:Adapter namespace="http://sap.com/xi/XI/System">XI</sap:Adapter>
<sap:MessageId>18f17dd0-d503-11dc-cb4d-001635b02bfd</sap:MessageId>
<sap:Info>3.0</sap:Info>
</sap:Hop>
- <sap:Hop timeStamp="2008-02-06T22:30:22Z" wasRead="false">
<sap:Engine type="AE">af.e6e.sapecc6eval</sap:Engine>
<sap:Adapter namespace="http://sap.com/xi/XI/System">XIRA</sap:Adapter>
<sap:MessageId>18f17dd0-d503-11dc-cb4d-001635b02bfd</sap:MessageId>
</sap:Hop>
</sap:HopList>
- <sap:Diagnostic xmlns:sap="http://sap.com/xi/XI/Message/30" SOAP:mustUnderstand="1">
<sap:TraceLevel>Information</sap:TraceLevel>
<sap:Logging>Off</sap:Logging>
</sap:Diagnostic>
</SOAP:Header>
- <SOAP:Body>
- <sap:Manifest xmlns:sap="http://sap.com/xi/XI/Message/30" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:wsu="http://www.docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-utility-1.0.xsd" wsu:Id="wsuid-manifest-5CABE13F5C59AB7FE10000000A1551F7">
- <sap:Payload xlink:type="simple" xlink:href="cid:[email protected]">
<sap:Name>MainDocument</sap:Name>
<sap:Description />
<sap:Type>Application</sap:Type>
</sap:Payload>
</sap:Manifest>
</SOAP:Body>
</SOAP:Envelope>
and payload message is
<?xml version="1.0" encoding="UTF-8" ?>
- <ns0:MT_JDBC_RECEIVER xmlns:ns0="http://file_to_jdbc">
- <STATEMENT>
- <ROW action="INSERT">
<TABLE>MATMAS</TABLE>
- <access>
<MATNR>38</MATNR>
<MTART>HALB</MTART>
<MATKL>00107</MATKL>
<MEINS>pc</MEINS>
<ERSDA>2008.04.05</ERSDA>
<BRGEW>10</BRGEW>
<NTGEW>12</NTGEW>
<GEWEI>KG</GEWEI>
</access>
- <access>
<MATNR>88</MATNR>
<MTART>FERT</MTART>
<MATKL>02004</MATKL>
<MEINS>PC</MEINS>
<ERSDA>2008.04.05</ERSDA>
<BRGEW>12</BRGEW>
<NTGEW>13</NTGEW>
<GEWEI>KG</GEWEI>
</access>
- <access>
<MATNR>89</MATNR>
<MTART>FERT</MTART>
<MATKL>02004</MATKL>
<MEINS>PC</MEINS>
<ERSDA>2008.03.02</ERSDA>
<BRGEW>12</BRGEW>
<NTGEW>14</NTGEW>
<GEWEI>KG</GEWEI>
</access>
- <access>
<MATNR>98</MATNR>
<MTART>HALB</MTART>
<MATKL>2</MATKL>
<MEINS>PC</MEINS>
<ERSDA>2006.09.01</ERSDA>
<BRGEW>12</BRGEW>
<NTGEW>12</NTGEW>
<GEWEI>KG</GEWEI>
</access>
- <access>
<MATNR>170</MATNR>
<MTART>NLAG</MTART>
<MATKL>4</MATKL>
<MEINS>PC</MEINS>
<ERSDA>2005.03.02</ERSDA>
<BRGEW>2</BRGEW>
<NTGEW>3</NTGEW>
<GEWEI>KG</GEWEI>
</access>
- <access>
<MATNR>178</MATNR>
<MTART>NLAG</MTART>
<MATKL>4</MATKL>
<MEINS>PC</MEINS>
<ERSDA>2007.03.06</ERSDA>
<BRGEW>3</BRGEW>
<NTGEW>4</NTGEW>
<GEWEI>KG</GEWEI>
</access>
- <access>
<MATNR>188</MATNR>
<MTART>NLAG</MTART>
<MATKL>5</MATKL>
<MEINS>PC</MEINS>
<ERSDA>2007.05.02</ERSDA>
<BRGEW>2</BRGEW>
<NTGEW>3</NTGEW>
<GEWEI>KG</GEWEI>
</access>
- <access>
<MATNR>288</MATNR>
<MTART>HALB</MTART>
<MATKL>101</MATKL>
<MEINS>PC</MEINS>
<ERSDA>2006.02.11</ERSDA>
<BRGEW>5</BRGEW>
<NTGEW>4</NTGEW>
<GEWEI>KG</GEWEI>
</access>
- <access>
<MATNR>358</MATNR>
<MTART>HAWA</MTART>
<MATKL>2</MATKL>
<MEINS>PC</MEINS>
<ERSDA>2007.09.09</ERSDA>
<BRGEW>500</BRGEW>
<NTGEW>500</NTGEW>
<GEWEI>G</GEWEI>
</access>
- <access>
<MATNR>359</MATNR>
<MTART>HAWA</MTART>
<MATKL>2</MATKL>
<MEINS>PC</MEINS>
<ERSDA>2007.08.01</ERSDA>
<BRGEW>20</BRGEW>
<NTGEW>10</NTGEW>
<GEWEI>G</GEWEI>
</access>
</ROW>
</STATEMENT>
</ns0:MT_JDBC_RECEIVER>
Could anybody help me in sorting out this issue.My advance thanks

hi,
ypur structure is bad defined.
if you want to do an insert, the DT should be
<ns0:MT_JDBC_RECEIVER xmlns:ns0="http://file_to_jdbc">
___<StatementName>
______<dbTableName action=INSERT>
_____<table>MATMAS</table>
_______ <access>
___________<MATNR>38</MATNR>
___________<MTART>HALB</MTART>
___________<MATKL>00107</MATKL>
___________<MEINS>pc</MEINS>
___________<ERSDA>2008.04.05</ERSDA>
___________<BRGEW>10</BRGEW>
___________<NTGEW>12</NTGEW>
___________<GEWEI>KG</GEWEI>
______</access>
_____</dbTableName>
__ </StatementName>
</ns0:MT_JDBC_RECEIVER>
the ROW field is used when you wait receive data from DB for example you execute and SQL Query from Sender communication channel "SELECT name FROM TABLE Names"
so, the result of this query would be, for example:
<row>
____<name>joge</name>
</row>
<row>
____<name>pepe</name>
</row>
<row>
____<name>nicola</name>
</row>
See this link
http://help.sap.com/saphelp_nw04/helpdata/en/2e/96fd3f2d14e869e10000000a155106/frameset.htm
Thanks
Rodrigo
Edited by: Rodrigo Pertierra on Feb 8, 2008 8:40 AM
Edited by: Rodrigo Pertierra on Feb 8, 2008 8:42 AM

What every developer should know about character encoding

This was originally posted (with better formatting) at Moderator edit: link removed/what-every-developer-should-know-about-character-encoding.html. I'm posting because lots of people trip over this.
If you write code that touches a text file, you probably need this.
Lets start off with two key items
1.Unicode does not solve this issue for us (yet).
2.Every text file is encoded. There is no such thing as an unencoded file or a "general" encoding.
And lets add a codacil to this – most Americans can get by without having to take this in to account – most of the time. Because the characters for the first 127 bytes in the vast majority of encoding schemes map to the same set of characters (more accurately called glyphs). And because we only use A-Z without any other characters, accents, etc. – we're good to go. But the second you use those same assumptions in an HTML or XML file that has characters outside the first 127 – then the trouble starts.
The computer industry started with diskspace and memory at a premium. Anyone who suggested using 2 bytes for each character instead of one would have been laughed at. In fact we're lucky that the byte worked best as 8 bits or we might have had fewer than 256 bits for each character. There of course were numerous charactersets (or codepages) developed early on. But we ended up with most everyone using a standard set of codepages where the first 127 bytes were identical on all and the second were unique to each set. There were sets for America/Western Europe, Central Europe, Russia, etc.
And then for Asia, because 256 characters were not enough, some of the range 128 – 255 had what was called DBCS (double byte character sets). For each value of a first byte (in these higher ranges), the second byte then identified one of 256 characters. This gave a total of 128 * 256 additional characters. It was a hack, but it kept memory use to a minimum. Chinese, Japanese, and Korean each have their own DBCS codepage.
And for awhile this worked well. Operating systems, applications, etc. mostly were set to use a specified code page. But then the internet came along. A website in America using an XML file from Greece to display data to a user browsing in Russia, where each is entering data based on their country – that broke the paradigm.
Fast forward to today. The two file formats where we can explain this the best, and where everyone trips over it, is HTML and XML. Every HTML and XML file can optionally have the character encoding set in it's header metadata. If it's not set, then most programs assume it is UTF-8, but that is not a standard and not universally followed. If the encoding is not specified and the program reading the file guess wrong – the file will be misread.
Point 1 – Never treat specifying the encoding as optional when writing a file. Always write it to the file. Always. Even if you are willing to swear that the file will never have characters out of the range 1 – 127.
Now lets' look at UTF-8 because as the standard and the way it works, it gets people into a lot of trouble. UTF-8 was popular for two reasons. First it matched the standard codepages for the first 127 characters and so most existing HTML and XML would match it. Second, it was designed to use as few bytes as possible which mattered a lot back when it was designed and many people were still using dial-up modems.
UTF-8 borrowed from the DBCS designs from the Asian codepages. The first 128 bytes are all single byte representations of characters. Then for the next most common set, it uses a block in the second 128 bytes to be a double byte sequence giving us more characters. But wait, there's more. For the less common there's a first byte which leads to a sersies of second bytes. Those then each lead to a third byte and those three bytes define the character. This goes up to 6 byte sequences. Using the MBCS (multi-byte character set) you can write the equivilent of every unicode character. And assuming what you are writing is not a list of seldom used Chinese characters, do it in fewer bytes.
But here is what everyone trips over – they have an HTML or XML file, it works fine, and they open it up in a text editor. They then add a character that in their text editor, using the codepage for their region, insert a character like ß and save the file. Of course it must be correct – their text editor shows it correctly. But feed it to any program that reads according to the encoding and that is now the first character fo a 2 byte sequence. You either get a different character or if the second byte is not a legal value for that first byte – an error.
Point 2 – Always create HTML and XML in a program that writes it out correctly using the encode. If you must create with a text editor, then view the final file in a browser.
Now, what about when the code you are writing will read or write a file? We are not talking binary/data files where you write it out in your own format, but files that are considered text files. Java, .NET, etc all have character encoders. The purpose of these encoders is to translate between a sequence of bytes (the file) and the characters they represent. Lets take what is actually a very difficlut example – your source code, be it C#, Java, etc. These are still by and large "plain old text files" with no encoding hints. So how do programs handle them? Many assume they use the local code page. Many others assume that all characters will be in the range 0 – 127 and will choke on anything else.
Here's a key point about these text files – every program is still using an encoding. It may not be setting it in code, but by definition an encoding is being used.
Point 3 – Always set the encoding when you read and write text files. Not just for HTML & XML, but even for files like source code. It's fine if you set it to use the default codepage, but set the encoding.
Point 4 – Use the most complete encoder possible. You can write your own XML as a text file encoded for UTF-8. But if you write it using an XML encoder, then it will include the encoding in the meta data and you can't get it wrong. (it also adds the endian preamble to the file.)
Ok, you're reading & writing files correctly but what about inside your code. What there? This is where it's easy – unicode. That's what those encoders created in the Java & .NET runtime are designed to do. You read in and get unicode. You write unicode and get an encoded file. That's why the char type is 16 bits and is a unique core type that is for characters. This you probably have right because languages today don't give you much choice in the matter.
Point 5 – (For developers on languages that have been around awhile) – Always use unicode internally. In C++ this is called wide chars (or something similar). Don't get clever to save a couple of bytes, memory is cheap and you have more important things to do.
Wrapping it up
I think there are two key items to keep in mind here. First, make sure you are taking the encoding in to account on text files. Second, this is actually all very easy and straightforward. People rarely screw up how to use an encoding, it's when they ignore the issue that they get in to trouble.
Edited by: Darryl Burke -- link removed

DavidThi808 wrote:
This was originally posted (with better formatting) at Moderator edit: link removed/what-every-developer-should-know-about-character-encoding.html. I'm posting because lots of people trip over this.
If you write code that touches a text file, you probably need this.
Lets start off with two key items
1.Unicode does not solve this issue for us (yet).
2.Every text file is encoded. There is no such thing as an unencoded file or a "general" encoding.
And lets add a codacil to this – most Americans can get by without having to take this in to account – most of the time. Because the characters for the first 127 bytes in the vast majority of encoding schemes map to the same set of characters (more accurately called glyphs). And because we only use A-Z without any other characters, accents, etc. – we're good to go. But the second you use those same assumptions in an HTML or XML file that has characters outside the first 127 – then the trouble starts. Pretty sure most Americans do not use character sets that only have a range of 0-127. I don't think I have every used a desktop OS that did. I might have used some big iron boxes before that but at that time I wasn't even aware that character sets existed.
They might only use that range but that is a different issue, especially since that range is exactly the same as the UTF8 character set anyways.
>
The computer industry started with diskspace and memory at a premium. Anyone who suggested using 2 bytes for each character instead of one would have been laughed at. In fact we're lucky that the byte worked best as 8 bits or we might have had fewer than 256 bits for each character. There of course were numerous charactersets (or codepages) developed early on. But we ended up with most everyone using a standard set of codepages where the first 127 bytes were identical on all and the second were unique to each set. There were sets for America/Western Europe, Central Europe, Russia, etc.
And then for Asia, because 256 characters were not enough, some of the range 128 – 255 had what was called DBCS (double byte character sets). For each value of a first byte (in these higher ranges), the second byte then identified one of 256 characters. This gave a total of 128 * 256 additional characters. It was a hack, but it kept memory use to a minimum. Chinese, Japanese, and Korean each have their own DBCS codepage.
And for awhile this worked well. Operating systems, applications, etc. mostly were set to use a specified code page. But then the internet came along. A website in America using an XML file from Greece to display data to a user browsing in Russia, where each is entering data based on their country – that broke the paradigm.
The above is only true for small volume sets. If I am targeting a processing rate of 2000 txns/sec with a requirement to hold data active for seven years then a column with a size of 8 bytes is significantly different than one with 16 bytes.
Fast forward to today. The two file formats where we can explain this the best, and where everyone trips over it, is HTML and XML. Every HTML and XML file can optionally have the character encoding set in it's header metadata. If it's not set, then most programs assume it is UTF-8, but that is not a standard and not universally followed. If the encoding is not specified and the program reading the file guess wrong – the file will be misread.
The above is out of place. It would be best to address this as part of Point 1.
Point 1 – Never treat specifying the encoding as optional when writing a file. Always write it to the file. Always. Even if you are willing to swear that the file will never have characters out of the range 1 – 127.
Now lets' look at UTF-8 because as the standard and the way it works, it gets people into a lot of trouble. UTF-8 was popular for two reasons. First it matched the standard codepages for the first 127 characters and so most existing HTML and XML would match it. Second, it was designed to use as few bytes as possible which mattered a lot back when it was designed and many people were still using dial-up modems.
UTF-8 borrowed from the DBCS designs from the Asian codepages. The first 128 bytes are all single byte representations of characters. Then for the next most common set, it uses a block in the second 128 bytes to be a double byte sequence giving us more characters. But wait, there's more. For the less common there's a first byte which leads to a sersies of second bytes. Those then each lead to a third byte and those three bytes define the character. This goes up to 6 byte sequences. Using the MBCS (multi-byte character set) you can write the equivilent of every unicode character. And assuming what you are writing is not a list of seldom used Chinese characters, do it in fewer bytes.
The first part of that paragraph is odd. The first 128 characters of unicode, all unicode, is based on ASCII. The representational format of UTF8 is required to implement unicode, thus it must represent those characters. It uses the idiom supported by variable width encodings to do that.
But here is what everyone trips over – they have an HTML or XML file, it works fine, and they open it up in a text editor. They then add a character that in their text editor, using the codepage for their region, insert a character like ß and save the file. Of course it must be correct – their text editor shows it correctly. But feed it to any program that reads according to the encoding and that is now the first character fo a 2 byte sequence. You either get a different character or if the second byte is not a legal value for that first byte – an error.
Not sure what you are saying here. If a file is supposed to be in one encoding and you insert invalid characters into it then it invalid. End of story. It has nothing to do with html/xml.
Point 2 – Always create HTML and XML in a program that writes it out correctly using the encode. If you must create with a text editor, then view the final file in a browser.
The browser still needs to support the encoding.
Now, what about when the code you are writing will read or write a file? We are not talking binary/data files where you write it out in your own format, but files that are considered text files. Java, .NET, etc all have character encoders. The purpose of these encoders is to translate between a sequence of bytes (the file) and the characters they represent. Lets take what is actually a very difficlut example – your source code, be it C#, Java, etc. These are still by and large "plain old text files" with no encoding hints. So how do programs handle them? Many assume they use the local code page. Many others assume that all characters will be in the range 0 – 127 and will choke on anything else.
I know java files have a default encoding - the specification defines it. And I am certain C# does as well.
Point 3 – Always set the encoding when you read and write text files. Not just for HTML & XML, but even for files like source code. It's fine if you set it to use the default codepage, but set the encoding.
It is important to define it. Whether you set it is another matter.
Point 4 – Use the most complete encoder possible. You can write your own XML as a text file encoded for UTF-8. But if you write it using an XML encoder, then it will include the encoding in the meta data and you can't get it wrong. (it also adds the endian preamble to the file.)
Ok, you're reading & writing files correctly but what about inside your code. What there? This is where it's easy – unicode. That's what those encoders created in the Java & .NET runtime are designed to do. You read in and get unicode. You write unicode and get an encoded file. That's why the char type is 16 bits and is a unique core type that is for characters. This you probably have right because languages today don't give you much choice in the matter.
Unicode character escapes are replaced prior to actual code compilation. Thus it is possible to create strings in java with escaped unicode characters which will fail to compile.
Point 5 – (For developers on languages that have been around awhile) – Always use unicode internally. In C++ this is called wide chars (or something similar). Don't get clever to save a couple of bytes, memory is cheap and you have more important things to do.
No. A developer should understand the problem domain represented by the requirements and the business and create solutions that appropriate to that. Thus there is absolutely no point for someone that is creating an inventory system for a stand alone store to craft a solution that supports multiple languages.
And another example is with high volume systems moving/storing bytes is relevant. As such one must carefully consider each text element as to whether it is customer consumable or internally consumable. Saving bytes in such cases will impact the total load of the system. In such systems incremental savings impact operating costs and marketing advantage with speed.

Specifying Character encoding while parsing

Similar Messages

Maybe you are looking for