Encoding Type XML Document

Hi,
I have an XML document which is essentially data pulled from the craigs list website. It contains alot of Ampersands and pound signs and other stuff which i have been told does not follow the UTF-8 encoding which i put in the tag at the top of the document. Could someone tell me what the correct encoding would be so that i can parse the file using the Java DOM parser without it throwing out errors such as;
Invalid byte 1 of 1-byte UTF-8 sequence.
Thanks Alot
Richard.

Sounds like you don't know too much about XML and encoding. Try reading this tutorial, it's long but it's thorough:
[http://skew.org/xml/tutorial/|http://skew.org/xml/tutorial/]
That should take care of the XML part. Now you need to know the Java part: the simple rule is to not create XML documents if you don't know what encoding they will have. Your best bet is to write them using UTF-8 in the first place. I don't know how you're writing the documents but my bet is you're using plain old Java I/O. In that case use an OutputStreamWriter which specifies UTF-8 for the encoding. Here's a link to the Java I/O tutorial if you don't know how to do that:
[http://java.sun.com/docs/books/tutorial/essential/io/|http://java.sun.com/docs/books/tutorial/essential/io/]

Similar Messages

  • Problem with encoding of xml document

    while parsing an xml document with SAX parser, i found that encoding of the xml document received as input stream is "ISO-8859-1" . After parsing certain fields has to be stored in the mysql table where table character set is "utf8" . Now what i found that ceratin characters in the original XML document are stored as question mark (?) in the database.
    1. I am using mysql 4.1.7 with system variable character_set_database as "utf8". So all my tables have charset as "utf8".
    2. I am parsing some xml file as inputsream using SAX parser api (org.apache.xerces.parsers.SAXParser ) with encoding "iso-8859-1". After parsing certain fields have to be stored in mysql database.
    3. Some XML files contain a "iso-8859-1" character with character code 146 which appears like apostrophes but actually it is : - � and the problem is that words like can�t are shown as can?t by database.
    4. I notiicied that parsing is going on well and character code is 146 while parsing. But when i reterive it from the database using jdbc it shows character code as 63.
    5. I am using jdbc to prepared statement to insert parsed xml in the database. It seems that while inserting some problem occurs what is this i don't know.
    6. I tried to convert iso-8859-1 to utf-8 before storing into database, by using
    utfString = new String(isoString.getBytes("ISO-8859-1"),"UTF-8");
    But still when i retreive it from the databse it shows caharcter code as 63.
    7. I also tried to retrieve it using , description = new String(rs.getBytes(1),"UTF-8");
    But it also shows that description contains character with code 63 instead of 146 and it is also showing can�t as can?t
    help me out where is the problem in parsing or while storing and retreiving from database. Sorry for any spelling mistakes if any.

    duggal.ashish wrote:
    3. Some XML files contain a "iso-8859-1" character with character code 146 which appears like apostrophes but actually it is : - ’ and the problem is that words like can’t are shown as can?t by database.http://en.wikipedia.org/wiki/ISO8859-1
    Scroll down in that page and you'll see that the character code 146 -- which would be 92 in hexadecimal -- is in the "unused" area of ISO8859-1. I don't know where you got the idea that it represents some kind of apostrophe-like character but it doesn't.
    Edit: Actually, I do know where you got that idea. You got it from Windows-1252:
    http://en.wikipedia.org/wiki/Windows-1252
    Not the same charset at all.

  • Set encoding in xml document

    Hi, I create org.dom4j.Document in java and I need set encoding to
    windows-1250, but I don't know how it do. I have always default encoding UTF-8.
    Document document = DocumentHelper.createDocument();
    Element zakazky = document.addElement(ZAKAZKY);
    <?xml version="1.0" encoding="UTF-8"?>
    <zakazky><zakazka><znacka_form>40708050001</znacka_form>......
    Thanks.

    This reply is a bit late (better late than never), but if anyone is stuck on this issue hopefully you can find this answer:
    You set the encoding when you use the Transformer class (when you output the XML)
    File file = new File("file.xml");
    OutputStreamWriter osw = new OutputStreamWriter(new FileOutputStream(file), "windows-1250"); //I don't think you need to set it here, but you can if you like.
    TransformerFactory tf = TransformerFactory.newInstance();
    Transformer output = tf.newTransformer();
    StreamResult sr = new StreamResult(osw);
    output.setOutputProperty("encoding", "windows-1250");
    output.transform(ds, sr);By the way why do people say that Document has a method called getEncoding (and hence setEncoding) when it clearly doesn't - read your documents first before you start talking utter nonsense.
    Benjamin Black

  • How to change the Encoding type of a XML

    Hi all,
    I'm having a XML(generated at run time) with UTF-8 Encoding. If I'm going to parse it, getting an error saying "*Document root element is missing*".
    If I change the encoding to ANSI, it parses without error.
    How can I change the encoding type of a documnet ?
    Any comment welcome.
    Kaushalya

    There's no such thing as the "encoding of a String". If you produced a String from a sequence of bytes using the wrong encoding, you may not be able to repair that problem by hacking about in your code. You're better off to produce the String using the correct encoding in the first place. Read this for more information about XML and encodings as you appear to be misunderstanding basic concepts:
    [http://skew.org/xml/tutorial/]

  • XML encoding for type XML B/L transaction property

    I need to change the XML encoding type from UTF-8 to UTF-16 for XML documents being sent to an external system via the web service action block. Is there any way to change this encoding in XMII?

    Musarrat,
    Of course it's possible....
    Tim,
    Instead of using the WebService action us the Post action and set the body of the post to be the WebService SOAP XML.  Since the operation of a WebService and POST are the same thing behind the scenes this will work without a problem.
    Hope this helps.
    Sam

  • Problem to validate XML document if the type of root element is abstract

    I have the following XML document:
    <?xml version="1.0" encoding="UTF-8"?>
    <ct013/>
    It corresponds to the following XSD Schema:
    <?xml version="1.0" encoding="UTF-8"?>
    <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
         <xs:element name="ct013" type="foo"/>
         <xs:complexType abstract="true" name="foo"/>
         <xs:complexType name="fixedType">
              <xs:complexContent>
                   <xs:restriction base="foo"/>
              </xs:complexContent>
         </xs:complexType>
    </xs:schema>
    Please take attention to the fact that the type of root element of that BDD is abstract.
    XML Schema provides a mechanism to force substitution for a particular element or type. When an element or type is declared to be "abstract", it cannot be used in an instance document. When an element is declared to be abstract, a member of that element's substitution group must appear in the instance document. When an element's corresponding type definition is declared as abstract, all instances of that element must use xsi:type to indicate a derived type that is not abstract.
    Declaring an element as abstract requires the use of a substitution group. Declaring a type as abstract simply requires the use of a type derived from it (and identified by the xsi:type attribute) in the instance document.
    For more information of using abstract types please see chapter 4.7 Abstract Elements and Types of XML Schema Part 0: Primer Second (http://www.w3.org/TR/xmlschema-0/#abstract).
    In this case there is Oracle bug when I try to validate this XML document using Oracle XDK:
    String validate(String xml, String schema)
    throws XSDException, XMLParseException, SAXException, IOException
    System.setPropert("oracle.xml.parser.debugmode", "true");
    XSDValidator xsdValidator = new XSDValidator();
    XMLError xmlError = new XMLError();
    xmlError.setErrorHandler(new DocErrorHandler());
    XMLDocument xmlDocument = parseXMLDocument(xml);
    XMLDocument schemaXMLDocument = parseXMLDocument(schema);
    XMLSchema xmlSchema = (XMLSchema) new XSDBuilder().build(schemaXMLDocument, null);
    xsdValidator.setError(xmlError);
    xsdValidator.setSchema(xmlSchema);
    xsdValidator.validate(xmlDocument);
    return getValidationError(xsdValidator);
    I get the following error:
    Can't find resource for bundle oracle.xml.mesg.XMLResourceBundle, key XSD-2046.
    I tried to validate this XML document using two other libraries - XSD Schema Validator (http://apps.gotdotnet.com/xmltools/xsdvalidator/Default.aspx) and xsdvalid-29 (http://www.w3.org/XML/Schema#XSDValid). Both libraries pointed me on the error that the type of root element is abstract and it cannot be used for doing validation.
    I think that Oracle should return me explaining message but not to throw exception.
    Am I right? Is there really Oracle bug or I miss something?
    Any help, hits, advices would be gratfully apriciated.

    Define Element1 as follows:
    <xs:element name="Element1">
    <xs:complexType>
    <xs:complexContent>
    <xs:restriction base="xs:string"/>
    </xs:complexContent>
    </xs:complexType>
    </xs:element>
    Does the XML document get validated if the element is specified as
    <Element1></Element1>

  • How to set the encoding of an XML-document

    I need to change the encoding of an xml-document.
    When I convert the document into a string, UTF-8
    is used, I want to use ISO-8859-1.

    use this in your identity transform:
    transformer.setOutputProperty(OutputKeys.ENCODING, "ISO-8859-1");

  • Help - Inserting an XML document into a Oracle 8i Column (CLOB Type)

    Hi JavaGurus,
    I am looking for a simple java code which will take my XML document as input and insert the same into a Oracle 8i database's column which is of type CLOB.
    Insert statement won't work and I can not use SQL Loader.
    Any one?
    JK

    Maybe you can adapt some of the code in Oracle's "LOB Datatype" example, which is a complete working program that stores and retrieves BLOBs and CLOBs.
    http://www.oracle.com/technology/sample_code/tech/java/sqlj_jdbc/files/advanced/advanced.html

  • Japanese characters alone are not passing correctly (passing like ??? or some unreadable characters) to Adobe application when we create input variable as XML data type. The same solution works fine if we change input variable data type to document type a

    Dear Team,
    Japanese characters alone are not passing correctly (passing like ??? or some unreadable characters) to Adobe application when we create input variable as XML data type. The same solution works fine if we change input variable data type to document type. Could you please do needful. Thank you

    Hello,
    most recent patches for IGS and kernel installed. Now it works.

  • How to obtain the encoding scheme for an XML document

    How do you go about reading the encoding scheme for an XML document??
    More specifically how do I read the line:
    <?xml version="1.0" encoding="UTF-8"?>
    (Using Win32 C++ XML Parser 2.0.3 as SAX).
    null

    I work mostly with the Java versions of the parser so you'll have to make the translation to C++. As far as I know, you can't use the SAX API to access to the encoding.
    You need to use the DOM along with Oracle's extension to the basic DOM functionality. Oracle's package, oracle.xml.parser.v2 defines a class which implements the Document interface called XMLDocument. This class has a method, getEncoding(), which returns the encoding. You would use the method in getDocument() in the Parser base class inherited by DOMParser to retrive the XMLDocument.
    Jeff

  • Identifying an XML document type

    I am reading XML message off a queue using JMS and I want to determine the XML document type so I can use the proper schema for validation. Since different XML documents will be coming in off the same queue I need to determine the document type so I can use the proper schema for validation.
    Is there a good way to do this? So far I have been doing the following but I am looking for a better solution:
    *Use the JMS headers to store the XML message type.  Not always possible.
    *Read the first few lines of the XML as a file and parse for the root node or .xsd.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           

    What I am refering to is a situation where I have one queue that has multiple incoming XML documents.
    For example:
    PurchaseOrder.xml needs to be read off a queue validated via PurchaseOrder.xsd
    PartsOrder.xml needs to be read off the same queue and validated with PartsOrder.xsd. I am trying to find and eloquent solution to identify if I have a PartsOrder or a PurchaseOrder, so I can set the proper schema for validation when parsing the XML doc.

  • Character encode HTML in XML document...

    The user will input some text in a text area which could include HTML tags. An XML document will be created and inserted into the database.
    How would i encode the HTML so that it works correctly? What would i need to do when i retrieve the XML from the database?

    enclose the text in <![CDATA[ ... ]]>, e.g.
    <![CDATA[ <b>bold</b> ]]>

  • XMLTYPE as CLOB storage "inserting large xml document in xml type column"

    Hi All,
    i have a table containing an xml datatype(non schema based)
    i would like to insert a large xml document in it
    but an exception is thrown-->"string literal too long"
    i tried to use bind variables as a solution"prepared statements as i write in java"
    but it didn't work....as xml document is large
    when i tried to change the column type to CLOB,it worked but without xml validataion,
    although the xml type is mapped to a CLOB in storage, xml type couldn't insert the document
    if anyone have a solution plz tell,i needed it urgently
    thanks,in advance :-)

    thx it was very useful :-)
    but i am not having any success getting the following statement working using a JDBC connection pool rather than a hard coded URL connection
    tempClob = CLOB.createTemporary(conn, true, CLOB.DURATION_SESSION);
    it works with:
    "jdbc:oracle:thin:@server:port:dbname"
    Does NOT work with:
    datasource.getConnection()
    if anyone colud help...

  • How to ftp XML document into a XML type which is not created by itself.

    Hi,
    1.
    I have a table call SNPLEX_DESIGN which is created automaticly when I register a snplex_design.xsd XML schema.( Oracle creates it through the xdb:defaultTable="SNPLEX_DESIGN attribute). and it is created using SNPLEX user account.
    2.
    I also created a folder (resource) call /home/SNPLEX/Orders. which is used to hold all the incoming XML document.
    3.
    I created another user account call SNPLEX_APP, which is the only user account allowed to FTP XML document into /home/SNPLEX/Orders folder.
    Isuues,
    If I login as SNPLEX user, I can ftp XML document into the folder and TABLE (the file size = 0). But If I login as SNPLEX_APP user account, I can only ftp XML document into the folder, but Oracle doesn't store the document into the table( becuase the files size shows a number).
    I have granted all the ACL privileges on the /home/SNPEX/Orders folder to SNPLEX_APP hrough OEM.
    DO I miss anything. Any helps will be great appreciated. Resolve this issues is very import to us, sicne we are on a stage to roll system into production.
    Regards,
    Jinsen

    IN order for a registered schema to be available to other users the schema must be registered as a GLOBAL, rather than a LOCAL Schema. This is controlled by the third agument passed to registerSChema, and the default is local. Note that you will also need to explicity grant appropriate permissions on any tables created by the schema registration process to other users who will be loading or reading data from these tables.

  • Very urgent help needed- Error while passing XML document to Oracle stored

    Hi !
    I have been struggling a lot to call Oracle 9i stored procedure passing Stringbuilder object type from ASP.NET
    I am using Visual Studio 2008 Professional, OS: Windows XP and Oracle: 9.2.0.1.0
    Following is the procedure:
    CREATE or REPLACE PROCEDURE loadCompanyInfo (clobxml IN clob) IS
    -- Declare a CLOB variable
    ciXML clob;
    BEGIN
    -- Store the Purchase Order XML in the CLOB variable
    ciXML := clobxml;
    -- Insert the Purchase Order XML into an XMLType column
    INSERT INTO companyinfotbl (companyinfo) VALUES (XMLTYPE(ciXML));
    commit;
    --Handle the exceptions
    EXCEPTION
    WHEN OTHERS THEN
    raise_application_error(-20101, 'Exception occurred in loadCompanyInfo procedure :'||SQLERRM);
    END loadCompanyInfo ;
    And following is the ASP.net code:
    StringBuilder b = new StringBuilder();
    b.Append("<?xml version=\"1.0\" encoding=\"utf-8\" ?>");
    b.Append("<item>");
    b.Append("<price>500</price>");
    b.Append("<description>some item</description>");
    b.Append("<quantity>5</quantity>");
    b.Append("</item>");
    //Here you'll have the Xml as a string
    string myXmlString1 = b.ToString();
    //string result;
    using (OracleConnection objConn = new OracleConnection("Data Source=testdb; User ID=testuser; Password=pwd1"))
    OracleCommand objCmd = new OracleCommand();
    objCmd.Connection = objConn;
    objCmd.CommandText = "loadCompanyInfo";
    objCmd.CommandType = CommandType.StoredProcedure;
    //OracleParameter pmyXmlString1 = new OracleParameter("pmyXmlString1", new OracleString(myXmlString1));
    objCmd.Parameters.Add("myXmlString1", OracleType.clob);
    objCmd.Parameters.Add(myXmlString1).Direction = ParameterDirection.Input;
    //objCmd.Parameters.Add("result", OracleType.VarChar).Direction = ParameterDirection.Output;
    try
    objConn.Open();
    objCmd.ExecuteNonQuery();
    catch (Exception ex)
    Label1.Text = "Exception: {0}" + ex.ToString();
    objConn.Close();
    When I am trying to execute it, I am getting the following error:
    Exception: {0}System.Data.OracleClient.OracleException: ORA-06550: line 1, column 7: PLS-00306: wrong number or types of arguments in call to 'LOADCOMPANYINFO' ORA-06550: line 1, column 7: PL/SQL: Statement ignored at System.Data.OracleClient.OracleConnection.CheckError(OciErrorHandle errorHandle, Int32 rc) at System.Data.OracleClient.OracleCommand.Execute(OciStatementHandle statementHandle, CommandBehavior behavior, Boolean needRowid, OciRowidDescriptor& rowidDescriptor, ArrayList& resultParameterOrdinals) at System.Data.OracleClient.OracleCommand.ExecuteNonQueryInternal(Boolean needRowid, OciRowidDescriptor& rowidDescriptor) at System.Data.OracleClient.OracleCommand.ExecuteNonQuery() at Default.Button1Click(Object sender, EventArgs e)
    I understand from this that the .net type is not the correct one, but I am not sure how to correct it. I could not find any proper example in any documentation that I came across. Most of the examples give information on how to read but not how to insert XML into Oracle table by calling Stored Procedure.
    Can you please help me to solve this problem? I hope that you can help solve this.
    Also, can you please give me an example of passing XML document XMLdocument to Oracle Stored procedure.
    In both the cases, if you can provide the working code then it would be of great help.
    Thanks,

    Hi ,
    Additional to the Above error details my BPEL code looks like this:
    <process name="BPELProcess1"
    targetNamespace="http://xmlns.oracle.com/Application10/Project10/BPELProcess1"
    xmlns="http://schemas.xmlsoap.org/ws/2003/03/business-process/"
    xmlns:client="http://xmlns.oracle.com/Application10/Project10/BPELProcess1"
    xmlns:ora="http://schemas.oracle.com/xpath/extension"
    xmlns:bpelx="http://schemas.oracle.com/bpel/extension"
    xmlns:bpws="http://schemas.xmlsoap.org/ws/2003/03/business-process/">
    <partnerLinks>
    <partnerLink name="bpelprocess1_client" partnerLinkType="client:BPELProcess1" myRole="BPELProcess1Provider" partnerRole="BPELProcess1Requester"/>
    </partnerLinks>
    <variables>
    <variable name="inputVariable" messageType="client:BPELProcess1RequestMessage"/>
    <variable name="outputVariable" messageType="client:BPELProcess1ResponseMessage"/>
    </variables>
    <sequence name="main">
    <receive name="receiveInput" partnerLink="bpelprocess1_client" portType="client:BPELProcess1" operation="process" variable="inputVariable" createInstance="yes"/>
    <invoke name="callbackClient" partnerLink="bpelprocess1_client" portType="client:BPELProcess1Callback" operation="processResponse" inputVariable="outputVariable"/>
    </sequence>
    </process>
    Kindly help if anyone has faced this Issue before.
    Regards,
    Rakshitha

Maybe you are looking for