Parsing International Characters

Hi folks,
I am trying to parse an xml document which has international characters like "�" (accentuated e used in french). But my parser crashes trying to parse a document containing these characters:
System.out.println("******************* 1");
DocumentBuilderFactory lFactory = DocumentBuilderFactory.
newInstance();
System.out.println("******************* 2");
DocumentBuilder lDB = lFactory.newDocumentBuilder();
System.out.println("******************* 3");
lDoc = lDB.parse(new FileInputStream(pFileName));
System.out.println("******************* 4");
The exception occures after 3rd println. Here is what I get:
[17/May/2005 08:50:14:640] info: The Exception Stack Trace is : The element type "FirstName" must be terminated by the matching end-tag "</FirstName>".: org.xml.sax.SAXParseException: The element type "FirstName" must be terminated by the matching end-tag "</FirstName>".
     at org.apache.xerces.framework.XMLParser.reportError(XMLParser.java:1213)
     at org.apache.xerces.framework.XMLDocumentScanner.reportFatalXMLError(XMLDocumentScanner.java:579)
     at org.apache.xerces.framework.XMLDocumentScanner.abortMarkup(XMLDocumentScanner.java:628)
     at org.apache.xerces.framework.XMLDocumentScanner$ContentDispatcher.dispatch(XMLDocumentScanner.java:1136)
     at org.apache.xerces.framework.XMLDocumentScanner.parseSome(XMLDocumentScanner.java:381)
     at org.apache.xerces.framework.XMLParser.parse(XMLParser.java:1098)
     at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:195)
     at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:76)
     at com.exult.andy.mbcommon.utilities.AncestorXMLUtil.fileToDoc(AncestorXMLUtil.java:328)
     at com.exult.andy.importadapter.base.AncestorDeployImportAdapter.main(AncestorDeployImportAdapter.java:62)
The element is indeed correctly terminated.
I appreciate any help. Thanks in advance.
-r

Then you don't have a well-formed XML document. If it doesn't declare its encoding in its prolog <?xml version="1.0" ?> then it should be encoded in UTF-8 (or, less likely, some variant of UTF-16) and it's probably encoded in ISO-8859-1 or something like that. If that's the case then fix the prolog to declare the encoding: <?xml version="1.0" encoding="ISO-8859-1" ?> or encode the document in UTF-8.

Similar Messages

  • XMLTYPE and International Characters

    Database: Oracle9i
    Characterset: US7ASCII
    I am trying to insert a very small xml document into an XMLTYPE column in a table.
    My xml document, however, contains international characters like French and Spanish letters with their accompanying accent marks.
    I have tried setting various encoding schemes in the encoding attribute on the xml document but none of them work.
    The parser stops at the first international character and complains that it is invalid.
    Is there anything I can do to insert these characters into my XMLTYPE column without complaint?
    Thanks.

    Please post this message at:
    Forums Home » Oracle Technology Network (OTN) » Products » Database » XML DB

  • Problem with international characters showing up as junk

    Hi All,
    Little question.
    I've made a xml data template which executes a query to fetch person names from the e-business suite tables.
    However there are international characters in the names which are showing up incorrectly. When executing the query in the database everything shows up correctly. But when the query is executed via XML publisher the produced XML contains junk characters.
    This is happening with for example o umlaut characters.
    The database characterset is: WE8ISO8859P1
    Version of XML publisher: 5.6.3
    Patrick

    This turned out to be an extra property which was set in the data template:
    property scalable_mode with value "on"
    This caused the special characters to be mangled.
    Patrick

  • BIG PROBLEM: CSS files not loading because of international characters in file name

    I have Muse Beta 7 in Spanish.
    The program creates a css file called master_a-página-maestra.css in css folder. It is referenced in the resulting HTML code here:
    <!-- CSS -->
      <link rel="stylesheet" type="text/css" href="css/site_global.css?3951792836"/>
      <link rel="stylesheet" type="text/css" href="css/master_a-p%c3%a1gina-maestra.css?fileVersionPlaceholder"/>
      <link rel="stylesheet" type="text/css" href="css/index.css?3948175564"/>
    When you work locally in your Windows formatted har drive, everything looks fine, but when you upload your files to a server, everything is screwed up. The server doesn't recognizes the URL and returns an error page, resulting in style errors in the entire site.
    This can also happens with html files if the title of the page contains international characters or with images if the image file name in your Hard Drive contains international characters.
    I already pointed out long time ago Muse was generating file names with international characters like á, ñ, etc but nobody cared about it. Too bad. I had to activate file name customization but I think that Muse should replace automatically this characters in the same way that it replaces other conflictive characters like commas or ampersands.
    This is not a fault of the FTP client. Accented characters are not web safe regardless of the FTP client you are using. It is a server side issue. Some servers support it some others don't. I don't mind if it works in Adobe Catalyst server because the final website is going to be in another server and maybe next year when paid hosting is ended the client may move it to another server.
    It makes more sense to replace accented characters in file names by their not accented equivalents ('a' instead of 'á', 'N' instead of 'Ñ', etc) and avoid all this problem.

    Zak, It is funny you mention it, because the site I am talking about is hosted in 1and1. try this: http://www.artofwalls.com/rosannawalls/biografía.html
    As you can see, the offending "í" in the file name causes 1and1 server to throw a "page not found" error. And this has happened with many other servers I have tried since.
    Muse boast of generating code fully compatible with all major web browsers but by using international characters in file names it ggenerates code suitable only for very few web servers. International characters have been always a no-no for internet URLs. Internet was designed by people who didn't care about ascii codes beyond 127 so using international characters in html file names is just call for problems.
    "to work with your hosting provider to determine how to enable Unicode encoded (UTF-8) file names and HTML files on their servers" is not a viable solution most of the time unless you are a Very Important Client of your host provider. If not, making changes means money for them and if you are the only one who complains, they are going to just tell you to not use international characters in your names.

  • SAPSCRIPT: Printing international characters on ZEBRA; How to do?

    Hi,
    I use software NiceLabel software to design barcode forms. I upload the design to so10 Sapscript text and print it on the Zebra ptinter. I used device tape ASCIIPRI. The SAP system is unicode.
    Now I need to print chinese pallet labels and I get unexpected problems. I found a lot information but no solution. Is it possible to print international charcters form SAPScript on Zebra?
    I got the information from Zebra's White Paper: Solution for Printing International Characters. There it says:
    "Unicode UTF-8 is embedded within Zebra printers."
    "SAP Forms can be universal. Labels and forms ... do not need to be modified or recreated to print in different languages."
    "SAP-developed UTF-8 device type and code page support for SAPscript users"
    "Label design software that can generate ZPL with support for Unicode ZPL commands"
    Do you now which device type I have to use? I think I need an UTF-8 device type. Do you know how to go on?
    Please help. Thanks
    Frank

    Hi Frank,
    as far as I know, it might be possible when using SMARTFORMS instead of SAPScript!
    In that case, it depends of the device type and the printer type, of course.
    Have a look on SAP Note 750002 SmartForms: Support für Zebra Etikettendrucker (ZPL2).
    Cheers
    Klaus

  • International characters not showing up in certain apps?

    I'm using Dreamweaver CS3 and international characters aren't showing up. There are blank spaces where they used to be. The characters are still there, it's just that they appear as blank spaces (I can copy and paste them and see them elsewhere).
    This change occurred after I installed 10.6. Is there any way to fix this?

    I've seen the same thing on Photoshop and Fireworks CS3, and even just the other day on iMovie '09, but I was still on Leopard – so are you sure it's Snow Leopard-related?
    In my case, CS3 apps would show squares, and iMovie wouldn't even move the cursor (it wouldn't even let me enter curly quotes). With iMovie it was easily fixed by typing in TextEdit, then copying and pasting to iMovie. No success in CS3 apps, which categorically refused to enter any glyphs that weren't more or less standard Latin.

  • How to load the international characters by using the SQL*Loader(UNIX)?

    Hi Everyone,
    I am not able to load the international characters thru SQL*Loader which is calling from Unix. Whenever I load these characters , appears in DB such as Square box. Please help me how to resolve the issue.
    Using version is:
    Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi
    PL/SQL Release 10.2.0.4.0 - Production
    CORE     10.2.0.4.0     Production
    TNS for IBM/AIX RISC System/6000: Version 10.2.0.4.0 - Productio
    NLSRTL Version 10.2.0.4.0 - Production
    Thanks in advance.
    Regards,
    Vissu.....

    This may help
    SQL> CREATE TABLE test_sqlldr_unicode (id INTEGER, name VARCHAR2(100 BYTE));
    Table created.Now my data file.
    1,"ABóCD"
    2,"öXYZó"
    3,"EFGÚHIJK"
    4,"øøøøøøøøøøøøøøø"My control file.
    LOAD DATA
    CHARACTERSET WE8ISO8859P1
    INFILE 'C:\test_sqlldr_unicode.txt'
    REPLACE
    INTO TABLE test_sqlldr_unicode
    FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
    (id INTEGER EXTERNAL , name )
    {code}
    Running the sqlldr
    {code}
    C:\>sqlldr USERID=hr/hr CONTROL=test_sqlldr_unicode.ctl LOG=test_sqlldr_unicode.
    log
    SQL*Loader: Release 10.2.0.1.0 - Production on Thu Dec 30 19:38:22 2010
    Copyright (c) 1982, 2005, Oracle.  All rights reserved.
    Commit point reached - logical record count 5
    C:\>
    {code}
    The table
    {code}
    SQL> SELECT * FROM test_sqlldr_unicode;
            ID NAME
             1 ABóCD
             2 öXYZó
             3 EFGÚHIJK
             4 øøøøøøøøøøøøøøø
    SQL>
    {code}                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           

  • Firefox Chrome Showing ???? instead of International Characters

    Hi,
    I have a flash movie at this site
    http://preview.tinyurl.com/5289rz
    When you click on the rendered thumb nail (with a link
    containing international characters) it takes you to a URL with
    those same characters in IE 7 it displays international characters
    and takes you to a correct page but in Firefox, Chrome and Opera
    the international characters are displayed correctly in the flash
    movie, but the URL turns those characters into question marks ????
    and the pages shows a 404 not found error.
    Firefox
    http://www.site.com/video/??????_??????_???????_???????_???????_???_????????

    Does that mean we should ignore those millions of users
    around the world that would be plain foolish. I am here to get this
    problem solved not to be told to just ignore users who use other
    browsers.

  • Problem with the International characters u00E0, u00E8, u00EC, u00F2, u00F9

    Dear Experts,
    My requirement is to send data files from SAP to Hyperion(Data Warehouse Tool) via application server. Here few fields(can be material description/ Name/ Address) contain international characters like à, è, ì, ò, ù (this is an example). So I need to send their equivalent characters (i.e a, e, i , o, u) to Hyperion. that is when I create a file in application server characters a, e, i, o, u should
    contain in the file.
    I used  ENCODING NON-UNICODE, UTF-8, DEFAULT but no use.
    Pls assist me.
    Thanks,
    Dharmendra Gali

    If you just have the couple of characters mentioned, use Jürgen's suggestion. Otherwise I'd recommend usage of SAP function module SCP_REPLACE_STRANGE_CHARS, which is much more comprehensive. Note that depending on your invocation though you might get multiple characters for some, e.g. ä to ae. To some degree you can control this, see my comments in Re: Removing diacritical (special & accented) characters in SAP.
    Cheers, harald

  • Jforum international characters problem

    Hi all,
    Has anyone used jforum (www.jforum.net)? It is an open source forum software
    I am having problems with international characters.
    Please let me know if you have had so I'll post my question.
    Thank you.

    Hi,
    So here is the problem. JFORUM is a very easy to use forum that can be set up within a short period of time. It is all nice UI configuration.
    In admin control I reconfigure the forum to be in Russian.
    When I store messages in DB and later on retrieve them they come up as question marks :(
    I tried to alter the table but it did not work. Here is an example of one table:
    create table jforum_posts_text
         post_id int(11) not null default '0',
         post_text text CHARACTER SET utf8,
         post_subject varchar(100) default NULL,
         primary key (post_id)
    )engine=InnoDB default charset=utf8;Any ideas, help, suggestions?
    Thank you.

  • Displaying International Characters

    Some users have been concerned about the fact that Buzzword
    does not display some international characters - ranging from Greek
    to Russian. This is accentuated by the fact that we have Buzzword
    users in well over 100 countries.
    The problem occurs when users attempt to insert some
    international characters - say, the Greek letter omega - and
    Buzzword instead displays a dot on the screen. Here's what's going
    on, for anyone interested:
    Like virtually all modern software, Buzzword adheres to the
    Unicode standard, where characters are defined with 16 bits,
    resulting in a total of over 65,000 possible characters.
    However, unlike most desktop software, Buzzword must use
    something called "embedded fonts". This means that we can't read
    fonts off a user's computer, but instead we have to download fonts
    from our server.
    This is where our challenge begins. A font family contains
    characters - called "glyphs" when drawn on the screen - for some
    portion of the 65,000 possible characters defined by Unicode. Each
    available character is downloaded as a small program containing
    instructions on how to draw the glyph. The instructions are
    relatively small, but each takes time to download - you can see
    evidence of this in our "loading fonts" progress bar.
    For Buzzword to load relatively quickly, we need to limit the
    number of characters downloaded with each of our seven font
    families. Most people use far fewer than 65,000 characters, so for
    our first phase of deployment, we identified a couple hundred
    characters to download for each font family. Because our initial
    market focus was North America, we chose characters from Latin-1,
    the Western European character set.
    The result: when a user attempts to enter the Greek letter
    omega, Buzzword recognizes the Unicode character but does not have
    the downloaded instructions to display the glyph on the screen. The
    little dot that is displayed instead is an indication that the
    requested glyph has not been downloaded with the font set.. If the
    user were to export the document to be read by a desktop program,
    the glyph would probably be displayed using the computer's fonts.
    Longer term, we'll handle this differently by downloading
    fonts dynamically, based on the document's contents and a user's
    settings. In the meantime, we apologize to everyone who uses
    characters outside the Western European set. We will work to get
    you a solution as soon as we possibly can.

    quote:
    Like virtually all modern software, Buzzword adheres to the
    Unicode standard, where characters are defined with 16 bits,
    resulting in a total of over 65,000 possible characters.
    Actually, Unicode (the standard) does not care about the
    number of bits.
    It has enough space to encode more than one million
    characters, and the current version (Unicode 5.1) already encodes
    more than 100,000 characters (
    http://www.unicode.org/versions/Unicode5.1.0/)
    quote:
    Buzzword must use something called "embedded fonts".
    Nothing prevents Flash/Flex from using fonts "html style".
    In fact, Buzzword can add a "Generic sans-serif" font as an
    option (font-family: Verdana, Arial, Helvetica, sans-serif;) with
    zero effort.
    The document will not look the same on all computers, but
    this might be better than the current bullets.
    So this is not a "must".

  • Typing international characters

    I have not been able to find any international characters such as umlauts, accents etc. Does any one know if they can be created as they don't seem to be part of the symbol set, or am I missing something?
    Also, it would be great if there were a simpler way to add periods (on my blackberry i could simply press the space bar twice).
    MacPro, Powerbook G4, iPhone   Mac OS X (10.4.10)  

    Input is English only for this version of the iPhone:
    http://m10lmac.blogspot.com/2007/06/iphone-language-capabilities-seem.html
    Tell Apple you want more here:
    http://www.apple.com/feedback/iphone.html

  • International Characters with Netmail

    Hi all,
    I'm using Sun One portal server 6. I have set the platform charset to Iso-8859-7 so that every portal page displays greek characters correctly.
    My only problem is with NetMail. When I get mail I can see greek characters correctly with NetMail Lite. However when I'm sending email using NetMail Lite if I write Greek characters they turn to question marks when I read the email with any client (Netmail Lite, Outlook express etc).
    Any ideas
    Thanks
    -George

    Has anyone know if you can type international characters with the iPhone keyboard.
    Yes.
    http://m10lmac.blogspot.com/2007/09/iphone-input-keyboard-gets-accented.html

  • International characters not showing

    When some international character set is selected in Language
    bar (windows), and typing text in the Freehand MX, garbage
    characters are showed.
    This is related to Freehand only, becouse when typing in any
    other aplication (ms word, publisher, even notepad), proper
    characters are shown.
    Is there any settings that I need to change or any other
    intervention for international characters to be used in FH ?
    BTW even when pasting text containing international
    characters to FH, the same problem is happening.
    Thanks
    Nicky

    > When some international character set is selected in
    Language bar (windows),
    > and typing text in the Freehand MX, garbage characters
    are showed.
    >
    > This is related to Freehand only, becouse when typing in
    any other aplication
    > (ms word, publisher, even notepad), proper characters
    are shown.
    This is FreeHand only because FreeHand does not support as
    wide character set (Unicode) as most of the apps.
    There are language versions of FreeHand, like Hebrew, that
    support some extended character sets but the standard English
    version can handle about the set available in Type1 font.
    Jukka

  • International characters in IOS filenames

    Hello,
    My movie compiles fine but when I add this file name to the package:
    ahí1.wav
    I get an applicationverificationfailed message when I try to send it to the device from flash cc
    if I rename it to:
    ahi1.wav
    It will work without failure.
    I've got hundreds of files that have international characters do I have to rename them all or is there a special flag, switch, or trick I can use to get around this?

    Stefan,
    It's good news that you are not having this problem, as it means that perhaps I won't shortly either. If we can characterize the differences between our setups, maybe I can have the same result as you do.
    I've just run the obvious case - I've created a file using TextEdit with a German name out on the volume from the Mac, stopped TextEdit, and successfully retrieved it. So it doesn't look like a filesystem mounting issue. I wonder what is so weird about these files. There must be something odd in the header, because it is definitely at the file info level that it is going off the rails. While the name is the obvious differentiator, maybe something else is odd as well.
    One thing I could try is to zip one of the directories affected on the Windows side and then try unzipping it into place there, then boot over to the Mac side and see if things have improved. If that doesn't resolve the problem, I could try unzipping it into place on the Mac side, but first I'll boot over to the Windows side and make sure it can read the file I just created in TextEdit from that end.
    By the way, the KB article you referenced was about shares and about problems with punctuation mounting Mac shares on Windows, so I don't think it pertains. In any case, I'm mounting a FAT volume, not a share, so the drivers would be completely different.
    Anyway, thank you for your help. Now that I'm no longer chasing phantoms, I can attend the real problem.
    Thanks,
    Ralph

Maybe you are looking for