SAX and Unicode

"Reading an XML file in Java is even easier. The standard XML parsing API for Java, SAX, automatically chooses the right encoding when it opens an XML file. SAX provides the parsed data to you in String objects."
Can sombody confirm this?

There's nothing special about the SAX parser or even Java. All conforming XML parsers are supposed to be able to determine the encoding of an XML document, based on possible byte-ordering marks at the start of the file and the presence or absence of an "encoding" attribute in the document's prolog. The exact algorithm is spelled out in the XML Recommendation.
However, as Adam suggests, if you provide a Java parser with a Reader, then you have already chosen an encoding and the parser no longer has that responsibility.

Similar Messages

  • "Right" way to mix SAX and DOM in app

    I'm writing a standalone desktop application that reads and stores data as XML files. Eventually, it might be converted to use some web services, but it's not a priority right now.
    What I need to do is to use an XML file, which could be large, as kind of a database -- there are many entries, each with a unique identifier, and the application will query the file to find those it needs to match and returns those as objects I'm mapping.
    At the moment, because this is a personal project and I'm using it to learn more technologies, I'm trying to (somewhat artificially) restrict myself to the pure Sun APIs. So far, my investigations have pointed to JAXP (by including Java EE 5 libraries) with StAX (including JAX-WS). Which raises two questions:
    1) To do this "right", do I really need to bundle my app with the entire JaveEE+Metro stack?
    2) Is there a better solution than StAX that's fully Java 5 compliant, even if it means stepping out of the Sun box? I haven't found many references to other solutions that are more recent than 2004. Is parsing XML on an app that has nothing to do with an appserver that uncommon?

    More likely than not, I won't be abstracting to that degree. If the current structure isn't right, I'd update the app instead of storing that kind of information in more files.
    I imagine at this point an example would be more effective. The app is itself more of an inventory browser that can jump around different searches dynamically. As an illustration, imagine that it's an inventory for DVDs. One central file will be your collection (with each entry containing a movie ID, date of purchase, etc). Another file would be more static, a list of DVDs themselves. These entries would contain information about the package itself -- how many discs? What's the title of the package? Which special features does it have? It would also point to an entry in yet another file which would have information about the film, containing biographies of the people listed under the movie credits.
    Basically, I want a flatfile database that I can do joins on that are split up into different files. There are few files (here, one) that will be constantly updated by the user. The others could be modified if needed, but it's not going to be optimized for it. (For example, you could own a DVD that nobody's ever heard of, and put in the info yourself.) Periodically, one or more of the more static files is updated and will be downloaded into the app.
    One of the advantages I see for this is that, in the future, I could with few changes turn this into more of a web service. Instead of pushing changes in those few files, the app would look to a web service for the data it would now find in files on the user's hard drive. But for now, it also has to be one standalone package.
    To answer the question, the file that will most commonly be updated by the user is the one that I don't have problems loading into memory in full. It's the other data that it links to which I want to be able to search and load into objects dynamically. My current implementation is to run the file through SAX and grab the data as it sees it, but it's really ugly. That could very well be how I'm using it and not because I'm trying to shoehorn some functionality into a technique it doesn't fit, but I'd like to find that out. ;)

  • Combined Upgrade and Unicode conversion of Sap 4.6C to ECC6.0

    Hello all,
    my project team intends to carry out a combined upgrade and unicode conversion of an SAP ERP 4.6C system with MDMP to ECC6.0 (no enhancement package). The system is running on Oracle 10.2.
    In preparation for this upgrade, I have gone through the SAP notes 928729, 54801.
    We need to get a rough estimate of the entire downtime so as to alert our end users. From the CU&UC documentation in 928729, I read up note 857081. However the program in this note cannot be used to estimate the downtime as my system is < SAP netweaver 6.20.
    Is there any other SAP note or tool or program that I can use to estimate the downtime for the entire CU&UC? Thanks a lot!

    Hi,
    Combined upgrade depend upon number of factors like database size, resources on the server and optimization. In order to get idea of how much downtime, it will take, I would suggest you to do combined upgrade and unicode conversion on sandbox system which should be the replica of your production system. And try to optimize it. From there you can get approx. downtime required.
    Also, please read combined upgrade and unicode conversion guides on  http://service.sap.com/unicode@sap
    Thanks
    Sunny

  • Need suggestion on Multi currency and Unicode character set use in ABAP

    Hi All,
    Need suggestion. In one of the requirement I saw 'multi-currency and Unicode character set experience in FICO'.
    Can you please elaborate me how ABAPers are invlolved in multi currency as I think this is FICO fuctional area.
    And also what is Unicode character set exp.? Please give me some document of you have any.
    Thanks
    Sreedevi
    Moderator message - This isn't the place to prepare for interviews - thread locked
    Edited by: Rob Burbank on Sep 17, 2009 4:45 PM

    Use the default parser.
    By default, WebLogic Server is configured to use the default parser and transformer to parse and transform XML documents. The default parser and transformer are those included in the JDK 5.0.
    The built-in WebLogic Server DOM factory implementation class is com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl.
    The DocumentBuilderFactory.newInstance method returns the built-in parser.
    DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();

  • 4.7EEx1.10 to ECC6.0 upgrade and Unicode conversion

    Hi Experts,
    We are going to initiate the upgrade from next month onwards. Subsequently i have started preparing the plan and strategy for the same.
    As our current setup is 4.7EEx110/Win 2003 R2-64 bit/Oracle 10.2.0.4.0 (Non unicode). And we have recently migrated on to this setup from WIn2k 32 bit. Also the current hardware is Unicode compatible.
    With respect to strategy for achieving this Upgrade and Unicode conversion, i am planning as follows.
    Step 1) Perform Unicode conversion on the current landscape (Both Export/import on the same servers)
    Step 2) Setup Temporary landscape as part of Dual maintenance strategy and migrate data from the current systems to temporary systems using backup/restore method.
    Step 3) Perform the SAP version upgrade on the current landscape and setup transport routes from temporary to current landscape in order to keep it in sync
    Step 4) after successful upgrade, decommission the temporary landscape
    Please provide your suggestions and valuable advices if there is anything wrong with my strategy and execution plan.
    Regards,
    Dheeraj

    Hi,
    Thanks. As i have already referred these notes as i am seeking advise with respect to my upgrade approach.
    However i have planned to perform in the following manner.
    1) Refresh Sandbox with Prod data and perform Upgrade to ECC6.0 EHP5 & subsequently Unicode conversion on the same server (Since both export & Import has to perform on the same hardware as we have recently migrated on this hardware which is Unicode compatible)
    2) Setup temporary landscape for DEv & QAs and establish transport connection to Production system in order to move urgent changes
    3) Keep a track of the changes which have transported during upgrade phase so that the same can be implemented in the upgraded systems i.e. Dev & QAS
    4) After Sandbox Migration and signoff, we will perform Dev & QAS upgrade & unicode conversion on the same hardware (Note: Since these are running on VMware can we export the data from the upgraded system and import on to a new VM?)
    5) Plan for production cutover and Upgrade the Prod system to ECC6.0 Ehp5 and then Unicode conversion. As i am planning to perform upgrade over the weekend and then Unicode conversion activity in the next weekend (Is it a right way?)
    My Production setup: DB on one Physical host and CI on separate Virtual host
    6) After the stabilization phase, we are planning for OS & DB upgrade as follows:
          a) Windows upgrade from 2003 R2 to Windows 2008 R2
          b) Oracle Upgrade from 10.2 to 11.2
    If anyone thinks that there is anything wrong with my above approach and need changes then please revert.
    I have one more doubt as I am going to upgrade 4.7EEx110 (WAS 620, Basis SP64) to ECC6.0 EHp5.As I presume that I can straight away upgrade from the current version to ECC6.0 Ehp5 without installing EHP. Kindly confirm
    Thanks

  • Windows API ANSI version and UNICODE version

    Windows API have two version:
    ANSI version and UNICODE.
    I want to display chinese, so i use the UNICODE version. but i find it's some mistake.
    and later,  i change to ANSI version, it's correct.
    why the ANSI version can display chinese, and UNICODE can't.
    i use the api function
    FindWindowA
    GetWindowTextA
    who can tell me?

    I understand Vista is a 32 bit.
    IE 8 is what I have but I would consider IE 9 if that would be necessary.
    Just a bit more update. I just tried to download Adobe FP 10 again with the following results, again:
    The Adobe Download Manager windows shows:
    FP 10.3 Status as 100% "Installation Pending".
    FP 10.2 Status as 100% " Instatlling Application"
    and that will go one forever but never really actually installs anything and just keeps cycling through the "Add to Download" e.g. "Fanbase", "Times Reader"
    and "Adobe Air" (none of which I want added to the download), and the download and installation shows as 100%.
    Go figure!

  • Combined Upgrade and Unicode conversion question

    Hello Everyone,
    I will be performing combined Upgrade and Unicode conversion soon. Currently i have run Prepare and do not have any errors.
    I have already run SPUMG consistency check and i do not have any errors there. Since this is Combined Upgrade and unicode conversion according to guide i do not need to do the Nametab conversion right now. But now if i go this place:
    SPUMG -> Status -> Additional Information  , i see a status with red for Unicode nametabs are not consistent or not up-to-date.
    Please let me know if i can ignore this step and do the nametab conversion after upgrade is complete and before unicode conversion.
    Thanks,
    FBK

    Please follow the instructions from the guides.
    The Unicode nametabs will be generated automatically during the upgrade.
    An additional check is integrated into the final preparation steps in the target releases.
    Regards,
    Ronald

  • Unicode in UNICHAR and UNICODE Excel Functions Is Decimal

    CASE # 12 64 08 79 83
    Dear Microsoft Engineers,
    There is a lack of information in  Microsoft Support pages*, where UNICODE and  UNICHAR in Excel 2013 are out of real context, given that unicode is always hexadecimal.
    Please add a notice on these pages about the fact that these Excel functions process decimal unicodes.  You might also add a hint to use HEXDEC and DECHEX conversion when working with UNICHAR and UNICODE.
    Please note that low decimal unicodes equal ASCII.  To highlight the new performance, you'd better take some other examples.
    I need UNICHAR in my Excel developement tool related to a Unicode-using application (MSKLC).
    I contacted Microsoft Support by phone today and got the above case ID.
    Best regards,
    Marcel Schneider
    P.S.:  My first post noted these functions to be ASCII.  So I am sorry not to have considered the system (whether it is hexadecimal, or decimal!).
    * Note:  I was not allowed to post this with hyperlinks to the Support Pages.  Links are the following:
       http://office.microsoft.com/en-us/excel-help/unicode-function-HA102753274.aspx
       http://office.microsoft.com/en-gb/excel-help/unichar-function-HA102753273.aspx

    Hi Marcel,
    Thanks for your feedback, I'll collect the information and submit it with internal ways.
    Have a good time.
    Regards,
    George Zhao
    TechNet Community Support
    It's recommended to download and install
    Configuration Analyzer Tool (OffCAT), which is developed by Microsoft Support teams. Once the tool is installed, you can run it at any time to scan for hundreds of known issues in Office
    programs.

  • Combined upgrade and Unicode conversion for ECC5 MDMP system

    Hello,
    We are planning to do Upgrade and Unicode conversion of ECC5 MDMP system to ECC6 EHP4 Unicode. We are adopting Combined upgrade and Unicode conversion strategy to minimise the downtime.
       In source version ECC5 we are in support pack level 6. Should we need to update the support pack to any target version to start with CU&UC or we can start with ECC5 with SP 6 itself.
    Since we cant afford more downtime for support pack update also, is it ok to start with upgrade and unicode conversion with current version.
    please advice.
    Regards
    Vinay

    Hello Vinay,
    please note that as a prerequisite the Basis SP should be accurate for an MDMP conversion.
    There is no MUST to have the latest Basis SP, but without you could have severe issues in SPUMG.
    On the application side, there are in most cases no hard requirements on the SP level.
    Best regards,
    Nils Buerckel
    SAP AG

  • Where is the Combined Upgrade and Unicode Conversion Guide

    Hi All
    Embarassing question time.
    I am after the Combined Upgrade and Unicode Conversion Guide for 4.7 to ERP 6.0, but can only find the Combined Upgrade and Unicode Conversion Guide for 46C to ERP 6.0.
    Can anyone advise where the 4.7 guide is.
    Thanks
    Sam

    Thank God SAP don't include it in the Install guide. The Install Guides are complex already. BTW if you need more info on unicode and its conversion go here
    https://service.sap.com/unicode@sap

  • What is diff in Open SQl and Unicode

    Hi.
    What is difference in Open SQl and Unicode.
    What are advantages of Unicode ?

    hi osk,
    <b>u cant compare open sql with unicode as both are different..</b>
    just a small explanation..
    <b>Open SQL</b> consists of a set of ABAP statements that perform operations on the central database in the R/3 System. The results of the operations and any error messages are independent of the database system in use. Open SQL thus provides a uniform syntax and semantics for all of the database systems supported by SAP. ABAP programs that only use Open SQL statements will work in any R/3 System, regardless of the database system in use. Open SQL statements can only work with database tables that have been created in the ABAP Dictionary
    <b>unicode</b>
    Data types such as CHAR ASCII and CHAR EBCDIC are mainly suited to English and central European languages. With other character sets, a code attribute is usually used for these data types. This code attribute uses a different presentation code to ASCII and EBCDIC, even for internal storage in the database system. This causes problems if you want to access these database systems using a different character set, or if you want to exchange data between database systems with different character sets.
    You can avoid these problems by using internal character coding in accordance with UNICODE. Internally, the UNICODE data is stored in UTF-16/UCS-2 format. In UTF-16/UCS-2 format, all characters are two bytes long.
    SAP DB is able to display various presentation codes in UNICODE format
    <b>please close the thread after rewarding the appropriate points...</b>
    Message was edited by: Ashok Kumar Prithiviraj

  • Help,DataInputStream and Unicode encoding problem

    Hello,everybody
    I am writing a small software for fun,but an problem about Unicode encoding stopped me. I tried to parse a file including integers,floats and Unicode characters(not UTF-8 but some other encoding type). I looked for the JDK documentation and I found that the class DataInputStream( implementing the interface DataInput) fitted my requirement best, then I tried but the Unicode characters are not read correctly( messy codes,only '????????').
    would you please help me? thanks a lot :-)

    the class DataInputStream has the methods useful to me, but find there is no method to set the encoding format ,both in DataInputStream and argument types used in its constructor:
    FileInputStream fis=new FileInputStream(fileName);
    DataInputStream     dis=new DataInputStream(fis);
    String line =dis.readLine();               System.out.println(line);
    // only "????????" output as result :-(
    I wonder how to set the encoding type,or another class.
    if I do it this way,it works,but there is no methods such as "readFloat","readInt",etc, so it's not what I want :
    FileInputStream fis=new FileInputStream(fileName);
    InputStreamReader read=new InputStreamReader(fis,"GB2312");
    BufferedReader reader=new BufferedReader(read);
    DataInputStream     dis=new DataInputStream(fis);
    String line = reader.readLine();
    System.out.println(line);
    thank you for your repley!

  • Sql Plus and Unicode (or utf-8) characters.

    Hello,
    i have problem with Sql Plus and unicode files. I want to execute Start {filename}, where {filename} is file in unicode format (this file have to contains german and polish characters). But i receive error message. It is possible to read from unicode (or utf-8) file and execute commands from this file)?
    Thanks in advance.
    Pawel Przybyla

    What is your client operating system characterset?

  • Logical Database PNP. HR and Unicode

    Hi,
    currently we are checking all programs to make them unicode compliant. Using the logical database PNP a lot of macros is loaded automatically. One of them is
    rp_provide_from_last (or rp_provide_from_frst) to get the last record in a specifed time-interval. The existance is stated in varaible named pnp-sw-found, it is 0 if no record was found and 1 if there is one in existance.
    Checking the program (normal syntach check and extended syntax check) leads to the warning that varaibale names with a hyphen are no longer allowed in unicode programs if its not a structure (and pnp-sw-found is not). The program is doing well, and transaction UCCHECK does not mention this error/warning at all. Has someone experience with that issue and perhaps a solution?
    cu
      Rainer

    Thanks for the answers so far. Using PNPCE does not resolve my problem, cause we have a lot of own written reports and i just want to avoid to change them all.
    And using PNPCE idoes not solve the problem, that i have to use pnp-sw-found, this one is still in existance and gives still the warning that thois is not unicode compliant.
    Switching off the unicode flag is no good idea if we wanna go for unicode.
    Anyone else with experience in unicode in the HR Area?

  • XML - 0112 Error on parsing using SAX and xml file in InputSource object.

    I need to parse a XML string and extract some information. I have to use SAX parser. I'm converting hte string to InputSource and then trying to parese.
    i'm getting XML-0112 error, my guess is that it is not able to locate DTD file but i tried hardcoding the whole path in DOCTYPE tag.
    i tried doing setSystemId also but no luck.
    null

    Can you post a simple test case to look at?

Maybe you are looking for

  • Why does my MacBook Pro take 3 minutes to wake from lid close?

    One thing that used to set Macs apart from PCs has been the ability to close the lid, then open it when you needed to do something and voila! the system was ready to go.  Now, my retina MB pro with 10.8 will not come back to life for 2 or 3 minutes (

  • Background color in Apex

    Hi All, I have added tabs(HTML) which contains image. i want to set the background image with the same color of the tab? can anyone please help ? I am using oracle apex 4.0 Regards Theva

  • Help with XIRR Function

    Is there any way in Numbers 08 to add an XIRR Function. This is one of the most useful functions for calculating Total Return of any Portfolio and it isn't included. Is it included in Numbers 09? I really miss this Function Thanks in advance.

  • Ipod 4th gen

    My Ipod touch 4th gen will Sync but it will not add songs what do I do?

  • BPM Workshop Java Api

    Anyone knows a documentation reference and a library which I can approve or reject a requisition in BPM Workspace using Java API , I like to customize the application , is there another way to do this ? Thank you.