Processing XML files that contain Special Characters

Hello:
Before I explain my problem I think I should briefly explain what I am trying to do. I have a JSP page that invokes a Java method (the code is attached). This java method takes in an XML file and an XSLT file. It parses the XSLT and also the XML file. If the parsing went through fine, it then processes the XML file and applies the XSLT to the XML file and returns a XMLDocumentFragment Object back to JSP and the JSP renders it.
This mechanism works well. However off late I have encountered a few XML files containing characters such as &Ecirc (Capital E with circumflex accent). Whenever my Java method tries to parse/process this .xml file it gives me the following error.
ORG.oclc.da.utilities.ifs.ReportException: An Error Occured While Parsing the Report: Missing entity 'Ecirc'.     at ORG.oclc.da.archive.userinterface.ReportHelper.retrieveReport(Unknown Source)     at /ViewReport.jsp._jspService(/ViewReport.jsp.java:87) (JSP page line 65)     at com.orionserver[Oracle9iAS (1.0.2.2) Containers for J2EE].http.OrionHttpJspPage.service(OrionHttpJspPage.java:54)     at com.evermind[Oracle9iAS (1.0.2.2) Containers for J2EE].server.http.HttpApplication.serviceJSP(HttpApplication.java:5458)     at com.evermind[Oracle9iAS (1.0.2.2) Containers for J2EE].server.http.JSPServlet.service(JSPServlet.java:31)     at com.evermind[Oracle9iAS (1.0.2.2) Containers for J2EE].server.http.ServletRequestDispatcher.invoke(ServletRequestDispatcher.java:501)     at com.evermind[Oracle9iAS (1.0.2.2) Containers for J2EE].server.http.ServletRequestDispatcher.forwardInternal(ServletRequestDispatcher.java:170)     at com.evermind[Oracle9iAS (1.0.2.2) Containers for J2EE].server.http.HttpRequestHandler.processRequest(HttpRequestHandler.java:576)     at com.evermind[Oracle9iAS (1.0.2.2) Containers for J2EE].server.http.HttpRequestHandler.run(HttpRequestHandler.java:189)     at com.evermind[Oracle9iAS (1.0.2.2) Containers for J2EE].util.ThreadPoolThread.run(ThreadPoolThread.java:62)
It seems like the Oracle Parser/XSLT Processor (oracle.xml.parser.v2.DOMParser) I am using is not able to handle special characters such &Ecirc. I was wondering if there is anyway around this problem.
Attached is the Java Method that handles both the parsing and processing of the XML file.
/** The method parses the Report Data and applies the Style Sheet to this data
* @param The InputStream (Report Contents - .xml file), Name of the StyleSheet that needs to be applied
* @return A sub-section of the report data (DOM DocumentFragment is returned)
private XMLDocumentFragment parseReport(InputStream reportStream,String strStyleSheet) throws Exception
DOMParser parser;
XMLDocument xml, xsldoc, out;
URL urlStyleSheet;
//Get the URL for the Style Sheet
urlStyleSheet = new URL(strStyleSheet);
//Create an instance of the Dom Parser
parser = new DOMParser();
parser.setPreserveWhitespace(true);
//Parse the XSL document and create a DOM Object
parser.parse(urlStyleSheet);
xsldoc = parser.getDocument();
//Parse the report document (a .xml) and create a DOM Object
parser.parse(reportStream);
xml = parser.getDocument();
// instantiate a stylesheet
XSLStylesheet xsl = new XSLStylesheet(xsldoc, urlStyleSheet);
XSLProcessor processor = new XSLProcessor();
// display any warnings that may occur
processor.showWarnings(true);
// processor.setErrorStream(System.err);
// Process XSL
XMLDocumentFragment result = processor.processXSL(xsl, xml);
return result;
If you have any suggestions please let me know. If you need more information I will be to furnish it.
thanks
Mathangi

Hello,
I just had the same problem, you need to include the approprate entity sets so that the xsl parser will recognize them (and you won't
get the "missing entity" error:
if you already haven't you need to add a DOCTYPE processing instruction for your dtd, to the top of your xml files to be parsed, for ex.:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "file:///mydir/mydtd.dtd">
then in "mydtd.dtd", add references to these 3 entity sets (if you have a dtd - if not then create one just with these entries):
<!ENTITY % HTMLlat1 PUBLIC
"-//W3C//ENTITIES Latin 1 for XHTML//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent">
%HTMLlat1;
<!ENTITY % HTMLspecial PUBLIC
"-//W3C//ENTITIES Special for XHTML//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent">
%HTMLspecial;
<!ENTITY % HTMLsymbol PUBLIC
"-//W3C//ENTITIES Symbols for XHTML//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent">
%HTMLsymbol;
Or, grab the ".ent" files from the www.w3.org site and put them on your server in the dtd dir, and change the "http:... reference, to
"file:...", it will be faster to parse (that's what I did). FYI, "Ecirc" is in xhtml-lat1.ent.
Also, after I did this,I developed another problem where my xsl parser and xmlmarkup tag-converting function converts certain
html entities to their octal counterparts, and I don't want this and don't know how to stop it (I have a posting out for this also).
Additionally, thanks for posting your parseReport method, it just so happens that I was looking for a way to do something like that,
it should be helpful to me.
-JK
Hello:
Before I explain my problem I think I should briefly explain what I am trying to do. I have a JSP page that invokes a Java method (the code is attached). This java method takes in an XML file and an XSLT file. It parses the XSLT and also the XML file. If the parsing went through fine, it then processes the XML file and applies the XSLT to the XML file and returns a XMLDocumentFragment Object back to JSP and the JSP renders it.
This mechanism works well. However off late I have encountered a few XML files containing characters such as J (Capital E with circumflex accent). Whenever my Java method tries to parse/process this .xml file it gives me the following error.
ORG.oclc.da.utilities.ifs.ReportException: An Error Occured While Parsing the Report: Missing entity 'Ecirc'.     at ORG.oclc.da.archive.userinterface.ReportHelper.retrieveReport(Unknown Source)     at /ViewReport.jsp._jspService(/ViewReport.jsp.java:87) (JSP page line 65)     at com.orionserver[Oracle9iAS (1.0.2.2) Containers for J2EE].http.OrionHttpJspPage.service(OrionHttpJspPage.java:54)     at com.evermind[Oracle9iAS (1.0.2.2) Containers for J2EE].server.http.HttpApplication.serviceJSP(HttpApplication.java:5458)     at com.evermind[Oracle9iAS (1.0.2.2) Containers for J2EE].server.http.JSPServlet.service(JSPServlet.java:31)     at com.evermind[Oracle9iAS (1.0.2.2) Containers for J2EE].server.http.ServletRequestDispatcher.invoke(ServletRequestDispatcher.java:501)     at com.evermind[Oracle9iAS (1.0.2.2) Containers for J2EE].server.http.ServletRequestDispatcher.forwardInternal(ServletRequestDispatcher.java:170)     at com.evermind[Oracle9iAS (1.0.2.2) Containers for J2EE].server.http.HttpRequestHandler.processRequest(HttpRequestHandler.java:576)     at com.evermind[Oracle9iAS (1.0.2.2) Containers for J2EE].server.http.HttpRequestHandler.run(HttpRequestHandler.java:189)     at com.evermind[Oracle9iAS (1.0.2.2) Containers for J2EE].util.ThreadPoolThread.run(ThreadPoolThread.java:62)
It seems like the Oracle Parser/XSLT Processor (oracle.xml.parser.v2.DOMParser) I am using is not able to handle special characters such J. I was wondering if there is anyway around this problem.
Attached is the Java Method that handles both the parsing and processing of the XML file.
/** The method parses the Report Data and applies the Style Sheet to this data
* @param The InputStream (Report Contents - .xml file), Name of the StyleSheet that needs to be applied
* @return A sub-section of the report data (DOM DocumentFragment is returned)
private XMLDocumentFragment parseReport(InputStream reportStream,String strStyleSheet) throws Exception
DOMParser parser;
XMLDocument xml, xsldoc, out;
URL urlStyleSheet;
//Get the URL for the Style Sheet
urlStyleSheet = new URL(strStyleSheet);
//Create an instance of the Dom Parser
parser = new DOMParser();
parser.setPreserveWhitespace(true);
//Parse the XSL document and create a DOM Object
parser.parse(urlStyleSheet);
xsldoc = parser.getDocument();
//Parse the report document (a .xml) and create a DOM Object
parser.parse(reportStream);
xml = parser.getDocument();
// instantiate a stylesheet
XSLStylesheet xsl = new XSLStylesheet(xsldoc, urlStyleSheet);
XSLProcessor processor = new XSLProcessor();
// display any warnings that may occur
processor.showWarnings(true);
// processor.setErrorStream(System.err);
// Process XSL
XMLDocumentFragment result = processor.processXSL(xsl, xml);
return result;
If you have any suggestions please let me know. If you need more information I will be to furnish it.
thanks
Mathangi

Similar Messages

  • How to validate string that contains special characters.

    Hi,
    I am new to Data Services.I took excel sheet as a data source .Now i have do some validation functions like vendorgroup should not contain special characters ,name cleansing operations etc.. So,please anyone help me redarding this.
    Thanks in advance

    Anonymous,
    No this is not a bug. Working with multi-select lists or any item which stores colon delimited values can be tricky to work with in APEX. There are a few work arounds for this. One is to replace the colons with something else like you described and then replace the colons on the landing page. Another solution is to send the primary key for that record and have an automatic row fetch or a computation pull the colon delimited value.
    Cheers,
    Tyson Jouglet

  • Problem retrieving data that contains special characters in Collections

    Hi fellow Apexers,
    If I load strings that contain characters such as ":" or "," into a collection and then read the data back off the collection into page items via URL e.g.
    f?p=&APP_ID.:4:&SESSION.::&DEBUG.::P4_INJURED_FIRST_NAME,P4_INJURED_OTHER_NAME,P4_INJURED_SURNAME,P4_INJURED_SHOP_NO,P4_INJURED_BUILDING,: #SEQ_ID#,#C001#,#C002#,#C003#,#C004#,#C005#
    the data after the ":" or "," gets screwed up and ends up in the wrong fields. How do I escape these characters?
    I have tried using a replace statement when inserting into the collection e.g.
    BEGIN
    IF htmldb_collection.collection_exists('INJURED_PERSON') = FALSE THEN
    htmldb_collection.create_collection( p_collection_name => 'INJURED_PERSON');
    htmldb_collection.add_member(p_collection_name => 'INJURED_PERSON',
    p_C001 => :P4_INJURED_FIRST_NAME,
    p_C002 => :P4_INJURED_OTHER_NAME,
    p_C003 => :P4_INJURED_SURNAME,
    p_C004 => :P4_INJURED_SHOP_NO,
    p_C005 => replace(:P4_INJURED_BUILDING,':','')
    etc.........
    but I just don't want to have nested replace statements. Also I'll loose "," and ":" when I retrieve the back off the collection
    Does anyone know of a better workaround?
    Regards
    Paul P

    Hi Paul,
    Apex uses colons in the url to separate the parameters. You could replace these with another character (for example, ~) and then have a Before Header page process on the second page that converts that back again.
    However, as a collection is available until you clear it or log out, you could just pass the SEQ_ID and get the second page to retrieve the values directly into the page items
    Andy

  • Help on querying data that contains special characters (#, &, @).

    Hi there,
    I am using forms 6i to query on a table whose primary key starts with a # (pound or hash sign). If I do the query trough sqlplus it comes back fine, but if I do trough forms 6i it gives me the following error messages: FRM-40831: Value too long for the field and then ORA-00920: Invalid relation operator.
    Well I have followed the instructions on the note 1007499.6 published by metalink. They explaind that when in insert the # into a field forms is thinking that you are actually going to insert a relation operator like WHERE, BETWEEN and etc. So query that is run is somewhere like this:
    select rowid
    from emp
    where (dname research)
    In that case dname would be the field that start with the #.
    They recommend as a workaround to write a pre-query trigger on the detail block to substr the # and add it the # againg with substr of the rest of characters, on note 1064159.6 by metalink.
    I have written one with no success, the trigger seems to never be fired the errors fire first.
    Can anyone help with that one ???
    Thanks
    gleisson silva

    Try below in the pre-query, would also need a none DB block and variable :blocka.keyv:
    IF substr(:dname,1,1) IN ('#','&','@')
    THEN
    :blocka.keyv:= substr(:dname,1,1)||:dname;
    :dname:=NULL;
    Set_block_property('tablename',DEFAULT_WHERE,'dname=:blocka.keyv');
    ELSE
    Set_block_property('tablename',DEFAULT_WHERE,'');
    END IF;

  • Parsing xml that contains special character

    this is my xml file.
    <?xml version="1.0" encoding="ISO-8859-1" ?>
    <grid_assessment>
         <chart>
              <range>5</range>
              <sharing_charging>
                   <score>3</score>
                   <label><![CDATA[SHARING &#38; CHARGING POLICIES]]></label>
    this is my java code.
              SAXParser parser = new SAXParser();
              parser.setContentHandler(this);
              parser.setErrorHandler(this);
              try
                   fis = new FileInputStream(xmlFile);
              catch(IOException ioe)
                   System.out.println( ioe.getMessage() );
              InputSource inputSource = new InputSource((InputStream)fis);
              try
                   parser.parse(inputSource);
              catch(SAXException saxe)
                   System.out.println( saxe.getMessage() );
              catch(IOException ioe)
                   System.out.println(ioe.getMessage());
    when i parse the xml file, i got an error message like this.
    org.xml.sax.SAXParseException: The entity name must immediately follow the '&' in the entity reference.Stopping after fatal error: The entity name must immediately follow the '&' in the entity reference.
    is there anybody who knows how to parse the xml file that contais special character?
    i tried "&#38;" instead of "&". but it didn't work...
    please help...
    thanks in advance...

    this is my java code.
              SAXParser parser = new SAXParser();If that is javax.xml.parsers.SAXParser then your code won't compile; i suspect you might be using some non-standard parser?
              parser.setContentHandler(this);
              parser.setErrorHandler(this);These methods are deprecated.
    when i parse the xml file, i got an error message like this.
    org.xml.sax.SAXParseException: The entity name must
    immediately follow the '&' in the entity
    reference.Stopping after fatal error: The entity name
    must immediately follow the '&' in the entity
    reference.I don't; maybe it's your parser. What parser are you using, and what version?
    is there anybody who knows how to parse the xml file
    that contais special character?There is nothing more you need to do; the codeimport javax.xml.parsers.*;
    import org.xml.sax.helpers.DefaultHandler;
    import org.xml.sax.*;
    import java.io.*;
    public class CeasarKim1Parser extends DefaultHandler {
      public static void main (String[] args) {
        try {
          SAXParser parser = SAXParserFactory.newInstance().newSAXParser();
          FileInputStream fis = null;
          fis = new FileInputStream(args[0]);
          InputSource inputSource = new InputSource((InputStream)fis);
          parser.parse(inputSource, new CeasarKim1Parser () );
        } catch(SAXException saxe) {
          saxe.printStackTrace();
        } catch(IOException ioe) {
          ioe.printStackTrace();
        } catch (ParserConfigurationException pce) {
          pce.printStackTrace();
      public void characters(char[] ch, int start, int length) {
        System.out.println(new String(ch, start, length));
    // similar implementations of the other content handler methods
    }which is basically a compilable and non-deprecated version of yours, with the file<?xml version="1.0" encoding="ISO-8859-1" ?>
    <grid_assessment>
      <chart>
        <range>5</range>
        <sharing_charging>
          <score>3</score>
          <label><![CDATA[SHARING & CHARGING POLICIES]]></label>
        </sharing_charging>
      </chart>
    </grid_assessment>does not generate any exceptions and parses the CDATA section correctly. That is using 1.4.1 and the default org.apache.crimson.jaxp.SAXParserImpl parser.
    Pete

  • Unable to call report from jsp - password contains special characters

    Hi
    I used the following url to call my oracle report from my JSP webpage but got the error mentioned below. It seems that this error occurs when i use the login id with password that contains special characters only. How can I overcome this problem?
    Any help appreciated. Thx.
    Regards,
    Siti
    URL used: -
    "http://pc-325:8889/reports/rwservlet?server=pc-325&report=prodeff80120i&P_JDBCPDS="+vlogin1+"&destype=cache&desformat=pdf&paramform=no&p_type="+p_type;
    Error encountered: -
    REP-163: Invalid value for keyword DESTYPE.
    Valid options are FILE, PRINTER, MAIL, INTEROFFICE, or CACHE.

    Hi Stefan,
    Many of the customers are located in hungary and they have created the userid using their keyboard. Hence for now I already have a userid with that hungarian characters, in the SAP system.
    Only I would request for the help on how to interface these characters in SAP Business connector to call RFC.
    Thanks,

  • Need help in setting file name with special characters in attachment

    Hi
    We have a requirement where we need to set the file name that contains special characters (like Russian) and send mauil using Java mail.
    If we set the file name as such, the attachment in the email contains garbled filename
    Can you pl let me know how to resolve this?
    We should use the file name as attachment name and this will have say special characters. The receiver who gets the mail should get with the correct attachment name
    One important point.. the attachments are opened from MS outlook.
    Thanks and regards
    Ram
    Edited by: 884910 on 13 Sep, 2011 5:00 AM

    Read the FAQ carefully. You don't need to call encodeText unless you're using a really
    old version of JavaMail.
    And, it depends on whether the mail reader you're using is handling encoded parameters
    according to the (new) MIME standard, or according to the (old) non-standard hack.
    Sadly, without knowing what mail reader the recipient is using, it's impossible to use
    encoded filenames that will work everywhere.

  • How do I find file names containing special character

    hi all
    I need your help with a code that deals with finding file names containing special characters like * / \ : ? " |
    the reason is I am doing a project that transfer files and folders from Mac to Windows. But names containing above characters cant be moved to windows.
    the part of the code I write is like this
    set illegal_syntax to paragraphs of (do shell script "find " & quoted form of POSIX path of oneFolder & " -name '*'")
    if (illegal_syntax is not equal to {""}) then
    repeat with each_record in illegal_syntax
    do shell script "/usr/bin/ditto -c -k -rsrc --keepParent " & quoted form of each_record & space & quoted form of (each_record & ".zip")
    end repeat
    end if
    First of all that '' will gives me an error.
    Second, what i tried to do is once the file/folder is found, zip it at current location. But the problem is how can I zip it without the special character? or should I just give it a name, like "originalfile.zip";?
    Last, or maybe you have better idea to deal with this situation? BTW, I can not rename or delete the file/folder because customer won't allow me to do it.

    The backslash is used to escape characters in AppleScript, so if you want to use it in a string you need to escape the escape character, for example "\\*".
    You could set the name of the archive by replacing the illegal character with another one, such as an underscore:
    <pre title="this text can be pasted into the Script Editor" style="font-family: Monaco, 'Courier New', Courier, monospace; font-size: 10px; padding: 5px; border: 1px solid #000000; width: 720px; color: #000000; background-color: #FFDDFF; overflow: auto">set each_record to "some*file*name" -- example
    set {TempTID, AppleScript's text item delimiters} to {AppleScript's text item delimiters, "*"}
    set the ItemList to text items of each_record
    set AppleScript's text item delimiters to "_"
    set each_record to the ItemList as text
    set AppleScript's text item delimiters to TempTID
    log each_record --> some_file_name
    </pre>

  • Why does text on certain portions of websites, usually when adjacent text contains special characters, become jumbled into seemingly random sets of characters that are not in any way jumbled when viewing the source of the webpage? How can I fix this?

    When viewing most text on most websites, it displays properly. However, there are two instances where text will either tend to, or consistently, become jumbled into a mess of seemingly random characters. Oddly enough, these seemingly random characters are not, in fact, random. The same weird character will be used to replace the same regular English text character consistently across the entire area that has been jumbled.
    The two instances where this tends to occur most often, or consistently in some cases, are, first, when a paragraph or particular section of formatted text contains special characters, such as Chinese or Japanese characters, or accented letters. When this happens, usually the paragraph that contains the special characters is completely jumbled, while the rest of the text on the page will have intermittent jumbling on a word or two. Most often, the word "the" is jumbled in this case.
    The second instance where this happens is when a website uses specially formatted text in some form or another. I, not being an expert at web development, am not sure what kind of formatting causes it, but I can provide consistent examples in lieu of my experience:
    - Example 1:
    [http://img408.imageshack.us/img408/9564/firefoxcharencodingissu.jpg]
    Example 1 shows a portion of a screen-shot of the website "Joystiq.com". Every single article title on the front page of this blog is consistently jumbled, while the text of the article itself remains untouched. Please note that when this jumbled text is highlighted, it is visible un-jumbled in the right-click menu as well as in the source code of the page. Other consistent instances can be found within many search fields on various websites. For instance, the search bar located at the top right of "Kotaku.com" consistently displays jumbled characters both on its default text of "Search" and on any text that is typed into the search box itself.
    - Example 2:
    [http://img822.imageshack.us/img822/9564/firefoxcharencodingissu.jpg]
    Example 2 shows both the jumbling of the paragraph containing the character "☆" as well as the subsequent peppering of the rest of the article's text with small jumbled words. Below this is the DOM Source of the selected text which shows how the text itself is being rendered properly within the site's source. Additionally, for convenience, I have edited on to the bottom of the image a small snippet of what the search bar on the same page looks like. Notice how the grayed-out text that normally would read "Search" is instead jumbled.
    This issue has been plaguing my browser for the past year or so, and I had hoped that it would go away with subsequent Firefox updates. It has not gone away.
    Thank you for reading! Please help!

    This issue can be caused by an old bitmap version of the Helvetica or Geneva font or (bitmap) fonts that Firefox can't display in that size.
    Firefox can't display some old bitmap fonts in a larger size and displays gibberish instead.
    You can test that by zooming out (View > Zoom > Zoom Out, Ctrl -) to make the text smaller.
    Uninstall (remove) all variants of that not working font to make Firefox use another font or see if you can find a True type version that doesn't show the problem.
    There have also been fonts with a Chinese name reported that identify themselves as Helvetica, so check that as well.
    Use this test to see if the Helvetica font is causing it (Copy & Paste the code in the location bar and press Enter):
    <pre><nowiki>data:text/html,
    Helvetica<br><font face="Helvetica" size="25">abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ</font><br>
    Helvetica Neue<br><font face="Helvetica Neue" size="25">abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ</font>
    </nowiki></pre>
    You should reset the network.http prefs that show user set on the about:config page.<br />
    Not all websites support http pipelining and if they do not then you can have issues with images or other problems.
    See also http://kb.mozillazine.org/Images_or_animations_do_not_load#First_steps

  • XML Parsing Base64 Content - Special Characters Umlaut etc

    Hello!
    I am currently parsing an XML file which has it's content encoded in Base64. My application parses the XML file (from a provider) and pass on the content to a GUI where the user can change the value of the nodes and store the same back in a new XML (which is then to be sent back to the provider). The Application uses only "UTF-8" . I am currently using the Base64 class provided at http://iharder.net/base64
    The problem I have is that the special characters like �, � etc, are being depicted as (?) i.e. unreadable. Only when I control the decoding process i.e. String resoolt = new String (result.getBytes(), "UTF-8") do I get the correct content, and the same can be posted on the GUI.
    The application produces an XML file (The application in question is a translation app), which when re-opened in the application for retranslation has the same problem... it is shown with teh special characters with (?).
    After a lot of debugging I found that the XML produced by the application does not need the forcing of content into UTF-8 using the code above, it is read automatically correctly (with all the content correct), but due to the coding I have done... String resoolt = new String (result.getBytes(), "UTF-8") the nodes are decoded with the earlier problem umlauts et al missing.
    Now I could use a test to differntiate between the two cases XML from the provider or the XML which is produced by my own app, and doesn"t work on the Provider and treat them differently, but I don't want this solution since the XML which my app produces is not accepted as input at the providers.
    So my query.. what is the best way to go about it now... I am quite stymied >(
    Thanks!
    Tim

    Amazingly enough some other user has the identical problem to yours and has expressed it in exactly the same words. Has the same name as you too.
    http://forum.java.sun.com/thread.jspa?threadID=606929&tstart=0
    (Cross-posting is frowned on here.)

  • Word 2007: Error when saving file that contains equations

    Using Microsoft Word 2007, version 12.0.6425.1000 with Service Pack 2
    I am a High School Math teacher, and when working on a document that contains a number of math equations. When I try to save the document, Word refuses to save. It displays the following error: "A file error has occurred. <filename>. OK",
    and I am unable to save the file. 
    Reading the forum, I have found this to be a very common issue going back to 2009, but I have not seen a definitive solution or patch. 
    I have found a workaround:
    Creat a new document
    Copy and paste from the corrupted doc each page, saving the new document after each page
    Until I hit the error, then I know the page that has the issue, and can usually narrow it down to a specific math equation. Nothing fancy just some basic math. 
    Manually rewrite the equation in the new document and it will successfully save.
    But the issue has been occurring more and more frequently now and it is becoming very frustrating. Any help would be greatly appreciated! 

    There has been a long time known issue with the equation editor.  it causes document corruption by "losing" XML tags.
    Apparently fixes for some causes of these problems have been rolled out in Windows updates. But the problem still exists.  You've got it easy. A lot of people running into this problem are students working on documents, like a thesis, that is on a short
    deadline ...
    One identified cause of these problems is editing equations
    after you created them.  DON'T DO THAT! 
    If you have to change an existing equation, create a new correct one and delete the old one.
    Simpler fix. Don't use Word (or Office) for creating equations.  Find some other product, like a pencil.
    Over in the "Answers" forum there are a couple volunteers, Tony Jollans and Jeeped
    , who have been manually fixing this type of error if you post a new question or add a reply to one of the ongoing discussions on this issue.
    Sorry.
    If you want to invest the time to figure out what your error looks like you may be able to come up with a fix other than copy /paste method.
    Can't open Word File because of end tag/start tag mismatch error... XML Tag 
    - XML Error – Fix It tool - “The name in the end tag of the element must match the element type in the start tag”
    This error is caused when Word either “forgets” to write an XML tag, or writes them in the wrong order.
    Tony Jolans was the first person that I heard of with home made tool to fix the problem. Now MS has released a Fix It for one specific variation of the problem.
    If the tools don’t fix your problem, the file will have to be fixed manually, repairing the tag order.
    The Fix It article notes that the document is still in a fragile state. You have to do some addition fixing to avoid repeats of the problem.
    https://blogs.technet.com/b/wordonenotesupport/archive/2011/03/24/error-when-opening-a-word-2007-or-2010-document.aspx
    http://support.microsoft.com/kb/2528942- FIX IT
    This fix it will work for one specific tag error where there are equations and graphics in the same paragraph AND Office 2010 SP1 has not been applied.
    Preventative suggestions
    <snip Jeeped>
    I don't think that anyone can completely stop editing equations, but pre-planning them should avoid unnecessary edits.
    While I have no concrete, reproducible evidence that editing equations is a cause, I have made these empirical observations:
    I cannot state precisely what many of the DOCX XML tags do, but basic XML syntax rules would suggest that code like:                   <m:oMath><m:e><m:ctrlPr/></m:e></m:oMath>
    ... does nothing at all since it closes everything it opens and offers no content. It looks to me that this is a result of deleting one or more characters from an equation.
    While Word 2010 reports these as a problem at:                  Line: 2  Column: 0 ... Word 2007 will still report the actual position, e.g.:                
     Line: 2  Column: 2726981 I keep a copy of Word 2007 side-by-side Word 2010 for no other purpose. It's the only Office 2007 program on my computer. This seems like it is actually a step backwards from a resolution since Word 2010 no longer seems
    to be able to parse its own error.
    Whether the syntax is truly useless and non-effective for any intent or purpose is actually beside the point. The syntax passes conformity tests and the DOCX
    should launch. Why an 85 page document is 'broken' due to a few empty XML formatting tags while retaining legal syntax structure is beyond me.
    I try to pass along the area that the problem was in and several times people have remarked that the area I report was the place they were last making modifications/deletions to equations (not additions) when the no-load corruption
    occurred.
    When no indication from the OP is offered on what was worked on last, the Line: 2 Column: 0 corruption often comes within a formula at the very end of the unfinished Word/Document.xml file and it may be inferred
    that this was the last place being worked upon.
    The corrupt DOCX files I've worked on are very commonly at the last stages of development with a large complex document. While there couldn't be a worse time for a corruption to appear, it would seem that small edits to existing content are causing
    it and not large scale new content generation.
    Not one of these points is a definitive 'smoking gun', but put together they seem to indicate
    formula editing and not formula writing as a cause for the Line: 2 Column: 0 corruption. If it looks like a duck and quacks like a duck, it is most often a duck.
    </snip>
    Copy “True Autosave Macros for Office” to this place in reply
    Let me fix it myself
    If you are familiar with editing XML, you can try to fix the problem yourself by correcting the sequence of the mismatched oMath tags in the document. See the following example:
    Incorrect tags:
    <mc:AlternateContent>
    <mc:Choice Requires=”wps”>
    <m:oMath>
    </mc:AlternateContent>
    </m:oMath>
    Correct tags:
    <m:oMath>
    <mc:AlternateContent>
    <mc:Choice Requires=”wps”>
    </mc:AlternateContent>
    </m:oMath>
    Note: You will have to use an application such as Notepad to edit the XML.
    Manual Technique
    <snip>  A DOCX document is actually a .ZIP file that contains many internal components. There is an internal folder called word which will always contain a document.xml file. This file is the basis of the document's layout
    and content.
    Assuming that the DOCX archive structure has not been corrupted, it can be opened in an archive utility. I use
    WinRAR for this. Once opened in
    WinRAR, you can drill down into the
    word folder and see the document.xml file. I use
    Notepad++ as a text editing tool and have the .xml file extension associated with this program so i can simply double-click
    document.xml in WinRAR to open an editing session.
    Notepad++ has cursor positioning in the right-hand side of the status bar and finding the position of the error (supplied by a failed open in Word) is pretty straightforward. I look for empty formatting
    tags first and only remove content if removing empty tags does not allow the document to be opened. I try to note existing content in the area that I make modifications in order that I can supply position reference information to the owner of the DOCX.
    When editing, I assume the philosophy that kess is more. My target is to get the document to open in Word with as little modification as possible and let the original owner of the DOCX make any necessary adjustments.
    I should mention that after making a change to word/document.xml you need to save it in
    Notepad++ then go to
    WinRAR and acknowledge that you want to update the file in the DOCX archive. Once the archive is updated, you can attempt to open the DOCX in Word to see if your efforts are successful.
    WinRAR                    
    Notepad++                 
    </snip>
    Further Fixes
    The Fix it solution in this article should let you recover your Word document. However, the symptoms will reappear when you make any further edits to the document unless the core problem in the structure of the document is resolved.
    To try to correct the core problem, follow one of these workarounds:
    Install Office 2010 Service Pack 1
    Office 2010 Service Pack 1 resolves this issue for new files. It will also prevent the problem from recurring with any files that were recovered with the Fix it solution in this article.
    To download Office 2010 Service Pack 1, follow the steps provided in this Microsoft knowledge base article:
    2460049 - Description of Office 2010 SP1
    Grouping Objects
    The steps provided work best under Word 2010:
    After you open the recovered document, turn on the Selection pane. This can be found in the
    Home tab of the ribbon. The editing group of the
    Home tab has a dropdown button named Select.
    Click the Select button, and then click
    Selection Pane...
    Press the Ctrl button on your keyboard and then click each text box in the selection pane.
    Click the Group button under the Format tab. This will group all the objects together.
    As soon as you have all objects grouped on each page, save the document under a new name.
    Save the document in the .RTF file format
    The steps provided work for both Word 2007 and Word 2010:
    After you open the recovered document, click File and select
    Save (for Word 2007 click the Office button and select
    Save As)
    In the Save As dialog box, click "Save as type:" dropdown and select
    Rich Text format (*.rtf).
    Click Save.
    Click to view this
    blog for more information about this issue.
    Bonus tip: Win7 Win8 Math Equation Input Panel / Math input Panel
    http://www.lytebyte.com/2009/07/24/guide-to-math-input-panel-in-windows-7/
    http://www.7tutorials.com/windows-7s-tablet-input-panel-text-entry-and-handwriting-recognition
    http://www.7tutorials.com/training-tablet-input-panel-work-even-better
    http://www.7tutorials.com/do-math-easy-way-math-input-panel
    Not an answer to the problem, just a bonus that may make it easier to input formula’s in Win7.
    Here’s another one of those didn’t-know-it-existed-until-I-clicked-it-by-accident tools in Win7. It’s called the
    Math Input Panel.
    To access it, simply click Start, and in the Search Box that appears above, type in
    Math Input Panel.
    The Window should look like this:
    Let the fine folks at Microsoft explain what it’s used for:
    “Math Input Panel uses the math recognizer that’s built into Win7 to recognize handwritten math expressions. You can then insert the recognized math into a word-processing or computational program. “
    Tony Jolan’s Automatic Fix
    Download  http://www.wordarticles.com/temp/Rebuilder.dotm Microsoft Office Word Macro-Enabled Template (.dotm) and open it.
    Click Options button on the Security warning and select Enable this content.       
    Click the Broken Documents tab at the far right of the ribbon.      
    Click the Rebuild button in the left-hand side      
    Locate and open your corrupt document in the file open dialog.
    That's it. The process will repair your document if possible and create a new document with (Rebuilt)
    appended to the filename. Be patient as it may take a few minutes. If a repair is not possible, you can then post to a public file area and someone here can attempt a manual repair.
    Manual Fix
    XML Maker V1.1 is free. It will allow you to open the document.xml file and edit it. It also marks errors and warnings. 
    I just didn’t have much luck working with it.
    A poster used XML Maker V2.1 (US$125, 30day free trial, enough for average person to fix a file)
    Notepad ++ is a good, free editor for this type of task
    Make a copy of the file
    Rename the copy from DOCX to ZIP
    Open … .ZIP/word/document.xml in notepad
    Copy the contents of the file to clipboard
    Open Word
    Paste a copy of the copied XML into Word
    (optional) the XML is one long string too hard to read, you can replace some tags, with that tag plus a para mark to break up the text to make it more people readable.
    Open an XML validator, ie this site on the internet:
    http://www.w3schools.com/xml/xml_validator.asp
    Paste another copy of the XML into the “Syntax Check Your XML” input window
    Click on “validate” button
    Copy the missing tag, ie </mc:Fallback>  (yours will be different)
    Return to word Find: mc:Fallback>  (without the </ so you find both open and closing tags). 
    Repeat find until you hit 2 open tags in a row.  Then you just have to figure out where to put the closing tag between them. 
    Look for other tags before and after a proper closing tag so you can match the problem area to a good area.
    Discussion by many affected people, a couple in discussions are also fixing problem if Tony’s fix doesn’t work
    http://social.answers.microsoft.com/Forums/en-US/wordcreate/thread/581159d0-9ebc-4522-b30c-53e33e8268e1
    Document Recovery
    http://www.wordarticles.com/Shorts/Corruption/Formats.php
    This page has the most readable description of Word file structures, DOC and DOCX, I have seen so far
    The logical structure of a Word 97‑2003 format document is one of a series of elements arranged in a hierarchy, much like a mini file system. As an example, here is the structure of a simple Word 97‑2003 (.doc) format document:
    MyDocument.doc
    1Table
    *CompObj
    Word Document
    *SummaryInformation
    *DocumentSummaryInformation
    The physical structure of the complete file bears little relation to the logical structure; it is, again, of a proprietary design, a compound, or structured storage, file. Briefly, and loosely, the separate logical elements of the file are broken up into
    blocks; these blocks are treated as individual units, which units are then organised without regard for their logical arrangement, and catalogued, catalogue and organisation detail being held alongside the blocks themselves, to enable recombination into logical
    components when necessary.
    Just to give you a flavour, here are some views of three small parts of such a document, viewed in a hex editor:
    Views of a Word 97-2003 format Document
    The logical structure of a Word 2007 format document is one of a series of elements arranged in a hierarchy, much like a mini file system. As an example, here is the structure of a simple Word 2007 (.docx) format:
    MyDocument.docx
    _rels
    rels
    docProps
    app.xml
    core.xml
    word
    _rels
    document.xml.rels
    theme
    theme1.xml
    document.xml
    settings.xml
    fontTable.xml
    webSettings.xml
    styles.xml
    [Content_Types].xml
    As briefly as before, the [Content_Types] file and the _rels folders, along with the subordinate files therein, contain information about the logical structure, and the two files in the docProps folder contain much the same as the two Information files
    in the old format. The document.xml element within the word folder holds the bulk of the document content and the other files within that same folder hold formatting details.
    So, you might say, the internal structure of a document has changed a little, so what? There are, however, other changes that make a bigger difference. The first is that, although both logical formats are conceptually similar, they are wrapped up in
    completely different ways to make a single file. Instead of the proprietary physical structure used for Word 97‑2003 format documents, a fairly standard, and open, Zip Archive format is used for Word 2007 format documents. The second change is that instead
    of using obscure binary codes, everything in Word 2007 format documents, well almost everything, is held in XML format.
    All data held as XML? In a standard Zip Package? It should be much easier to work with, then? Judge for yourself; here are some views of parts of a Word 2007 format document taken from a hex editor:
    Views of a Word 2007 format Document
    FreeFileViewer – reads 100+ text, Office, audio, video format file types – Can open some XML tag error files
    http://www.freefileviewer.com/formats.html
    Can't open Word file due to undeclared prefix, Location: Part: /word/document.xml, Line:91, Column: 49921
    The most likely cause of your particular problem is that you are missing a
    schema prefix reference within the opening <document ...> XML tag (usually the second one). Different
    schema references are required for various types of specialty content. Here is a sample opening <document ...> tag with a large number of various schema prefix codes. If I remove one or two of these, i can reproduce
    your error message.
    <w:document xmlns:ve="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"
    xmlns:v="urn:schemas-microsoft-com:vml" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml">
    It's impossible to blindly say exactly what you are missing but if you post an unadulterated copy of your document to a public file area (as noted above) and post the location back here, someone may be able to help. If you are going to go this route, send
    a copy that has not already been subjected to repair attempts.
    FWIW, while it may be hard to determine exactly which schema you may be missing, I do not believe that having extras causes any problems.
    BTW, you don't actually have to rename the DOCX file extension if you have an archiving program. I use WinRAR and simply
    <right-click>, Open With... to expose the archive.

  • Export Database With Password Containing Special Characters

    Hi,
    I'm doing a database export using the command:
    exp userid=user/pass@instance parfile=exp_parms.dat
    The problem is that the password contains special characters so the command fails with a syntax error. The command works perfectly fine when I run it without the password and just type it in when prompted, but I need to include the password in the command because I am running it in a batch file. I've tried using the escape character (\), but this results in a username-password mismatch.
    Is there some way of declaring a password containing special characters in the 'exp' command?
    ...Thanks in advanced!

    Hi,
    I'm using linux and this is how I did it.
    I created a user named ddg with a password ddg)ddg
    I did my expdp command like:
    expdp ddg/"ddg\)ddg" directory=dpump_dir dumpfile=ddg.dmp
    Put the password in double quotes and escape the funny character with a \
    Hope this helps.
    Dean

  • PL/SQL procedure to process XML file

    I am just starting to work on xml, and we are using PL/SQL stored procedures to process xml file. I did find the sample code in the package (family.sql). This sample print out all element names and attributes. My questions are :
    (1) I tried to modify the code to print out the element values (or Node values). Here is the code:
    -- get all elements and values
    nl := xmlparser.getElementsByTagName(doc, '*');
    len := xmlparser.getLength(nl);
    -- loop through elements
    for i in 0..len-1 loop
    n := xmlparser.item(nl, i);
    dbms_output.put_line('NodeName: ' &#0124; &#0124; xmlparser.getNodeName(n) &#0124; &#0124; ' ');
    dbms_output.put_line('NodeValue: ' &#0124; &#0124; xmlparser.getNodeValue(n) &#0124; &#0124; ' ');
    end loop;
    However, it did not print out the values, although it did print out the name. what's wrong with it? how can I get the value?
    (2) I have the following xml file:
    <?xml version="1.0"?>
    <profile>
    <id>1</id>
    <name>Company</name>
    <des>master profile</desc>
    <subprofile>
    <subid>1</subid>
    <subname>group1</subname>
    <subdesc>group profile</subdesc>
    </subprofile>
    </profile>
    now, I'd like to print out all the children of the <subprofile> node, that is <subid>, <subname> and <subdesc>. Here is my code:
    -- get subprofile nodelist
    qnlist := xmlparser.getElementsByTagName(doc, 'subprofile');
    -- get length of the nodelist
    len := xmlparser.getLength(qnlist);
    -- loop through elements
    for i in 0..len-1 loop
    qnode := xmlparser.item(qnlist, i);
    qnnode := xmlparser.getFirstChild(qnode);
    qnchild := xmlparser.getFirstChild(qnnode);
    dbms_output.put_line(xmlparser.getNodeName(qnnode));
    dbms_output.put_line(xmlparser.getNodeValue(qnchild));
    LOOP
    if (xmlparser.isNULL(xmlparser.getNextSibling(qnnode))) then exit;
    END IF;
    qnnode := xmlparser.getNextSibling(qnnode);
    qnchild := xmlparser.getFirstChild(qnnode);
    dbms_output.put_line(xmlparser.getNodeName(qnnode));
    dbms_output.put_line(xmlparser.getNodeValue(qnchild));
    END LOOP;
    end loop;
    when I execute the procedure, I get the following output:
    #text
    I am not sure what's wrong with it. Basically, I didn't know the procedure to traverse the tree since here it's a little different from OO programming. Could someone give a sample code which demonstrate the procedure which can get a specific element's name and values? (for my exapmle, the name and value of the <subname>, or <subid>, <subdesc>).
    Also, is there a way to insert a part of xml file into a DB table? in my case, insert the subprofile into a table. I know the xmlgen package has a procedure to insert a xml file into a table, but not a part of xml. And can I insert a xml file into several tables instead of one table, using xmlgen package?
    looking forward to hearing from you. any suggestion and sample code would be helpful. thank you very much.
    null

    I sloved my first question: to get the Nodevalue, I need to use getFirstChild(n);
    But, I still didn't figure out the second
    problem. Actually, It works when I modified my xml file as following:
    <?xml version="1.0"?>
    <profile>
    <id>1</id>
    <name>Company</name>
    <des>master profile</desc>
    <subprofile><subid>1</subid><subname>group1</subname><subdesc>groupprofile</subdesc></subprofile>
    </profile>
    All the <subprofile>....</subprofile> must be in one line without any return. This is unbelievable! It suppose that xml does not matter new lines. I tested my code, it seems space is fine, but new line. Something must be wrong in my code.
    please give any suggestion. Thanks,
    Yudong
    null

  • Saving a file in a with a file name containing Japanese Characters

    Hi,
    I hope some genius out there comes up with the solution to this problem
    Here it is :
    I am trying to save some files using a java program from the command console , and the file name contains japanese characters. The file names are available to me as string values from an InputStream. When I try to write to a File Object containing the japanese characters , I get something like ?????.txt . I found out that I am able to save the files using the unicode value of the java characters.
    So I realize that the trick is to convert the streaming japanese characters , character by character into their respective unicode value and then Create A File Object with that name. The problem is -> I cant find any standard method to convert these characters into their unicode values. Does anyone have a better solution ? Remember , its not writing japanese characters to a file , but creating a file with japanese characters in the file name !!!!
    Regards
    Chandu

    retrive a byte array out of the input Stream and store the values in String using the condtructor
    String(byte [] bytes, String enc)
    where encoding would be Shift_Jis for japanese I guess.
    Now to understand this concept basically all the Strings are unicode however when you are passing a byte array String has no means to know what is the encoding of the byte array, which is being used to instantiate the String value so if no encoding is specified it takes the System value which is mostly iso-8859-1. This leads to displaying ?
    However in case you know the encoding of the array specifying that in the constructor would be a real help.

  • Column name in a query contains special characters

    Hi folks,
    The column name in a query contains special characters. For example ~ or ^. The creator of the table put these column names under double quotation while creating the table. When I get the column names form the result set meta data object it returns without quotation. Is there any way to tell the jdbc driver so that it return those column names as it was created, I mean in double quotation.
    The help is urgent. I will appreciate any suggestions. Thanks �.
    [using oracle driver for 10g]
    Thanks
    Angelina

    Just because the column names were in quotations when the database was created doesn't mean that the quotes are actually part of the names. What's inside the quotes is what makes the column name in the database. If I created a column as "abcd" and put it in quotes just like that ("abcd"), it would go in the database as abcd since SQL would strip off the quotes.
    And there's your answer. Just put all column names in quotes whenever you need to talk to SQL. It will strip off the quotes and understand.
    I think SQL will also accept square brackets ([ and ]).

Maybe you are looking for

  • How tostored data  after clicking checkbox save in data base table

    REPORT  zreport                                 . TABLES:mseg,mard,mkpf. TYPE-POOLS:slis. DATA:BEGIN OF itab OCCURS 0,      mblnr LIKE mseg-mblnr,      matnr LIKE mseg-matnr,      werks LIKE mard-werks,      lgort LIKE mard-lgort,      lgpbe LIKE mar

  • How do I transfer music purchases from one apple id to another

    I purchased music from Itunes years ago with an old apple id and old email address.  I am trying to play that music in my Itunes on my Dell Win7 laptop that uses a new apple id, and though the computer is authorized, it is authorized under the new ap

  • Print with ultiboard 10.0 doesn't work with hp laserjet 4

    I'm trying to print my ultiboard 10.0 design, but I get every time a blank page. It's a HP LaserJet 4 Plus network printer, an other printer  HP LaserJet P3005 PCL 6 work perfect, but isn't installed in my office. Word, Multisim etc.print without pro

  • CIN for credit memo and debit memo

    HI I have manufacturing scenario with excise(CIN) where i need to issue credit and debit memo to customers Please let me know how this has to be handled in excise point of view,how excise document has to be created, What is the ETT for credit and deb

  • How do I get the lg monitor to work with my macbook

    I can see the home screen from my computer but nothing else shows up