XML validation: how to check ALL validation problem for XHTML

I have a lot of documents in HTML format (not very good) that I would like to convert in XML (XHTML). I know it is not so easy and I would use this strategy in a Java program:
1. Try to check the well-formness and validation with a XML parser (SAX or Xerces)
2. If not valid: try to individuate ALL the problems the file has (*and not only the first one that halts the processing process*)
3. Try to transform the HTML to a validable XHTML with some approach: regular expression or other methods
So the questions I do to you are the following:
1. What XML parser do you think is the best for this purposes? SAX or Xerces?
2. How can I understand what are all the validation problems in the file and not only the first one (If I remember well XML parsers halt the parsing process at the first error...)?
3. How can I transform the HTML to a valid XHTML? I have only to use RegEx or is there other tools to do this for XHTML and HTML problem?
Thanks
r
Edited by: robertobat on Feb 21, 2009 7:09 PM

>
1. Neither of them. (Disregarding the fact that SAX and Xerces aren't in the same category and don't cover all the possibilities.I would say SAX default implementation in JRE and SAX parser in Xerces.
2. I think you have "valid" and "well-formed" confused. And HTML isn't a dialect of XML so the idea of trying to use an XML parser to handle HTML isn't a good idea.I know very well what is the difference between valid and well-formed but I've used "validation" to represent all the conversion problem. But you are right. I'm convincing myself that using a XML parser as the first step is not a good idea.
3. Well, this is the real question, isn't it? Those other two were just a waste. Don't screw about with regex, for one thing it doesn't work well for hierarchical structures and for another you won't finish in a finite time. Just use an HTML parser which can produce a DOM, like TagSoup for example. Or run them through HTMLTidy. You could also submit them to one of the internet sites which will validate XHTML for you.I've seen Tidy and its capability to convert an HTML to a XHTML and I think it is better then TagSoup because I have to implement this mechanism in a production environment and I want to use only open source projects that have a very long story and that are strong. But I'll see TagSoup as you say.
I cannot use an Internet service to convert millions of private documents.

Similar Messages

  • How to display all validator messages at the same time?

    Hi Guys,
    I have a form with validators attached to a couple of my input boxes. I tried to write validators which are reusable in other parts of my app ie. social security number check etc. I also then customized the messages and it all works fine.
    But when I submit the form it displays the messages one at a time, in other words every validator is performed and if there was an error the form is rendered. I believe this is how it should behave and that's fine.
    But what if I want all validators to be performed when I submit the form?Then all messages are displayed at the top of the page and the user can make all his changes and try again. This makes more sense to me.
    So my question is whether there is a way to force the page to perform all validators when the page is submitted and then display all error messages in a h:messages tag?
    Cheers and thanks alot
    p.s. i know about and use the hidden input field validator hack which does all validations, but if I do it that way I duplicate the code which does the social security check for all applicable forms.

    Strange, I was under the impression that all validators were run by default. In my JSF apps, all the validators run and output their errors without any special confguration to make this happen.
    I wonder why yours are just running one at a time? Could you show some of your JSP code, and the hidden field validator code?
    CowKing

  • I Have iPad4 and using with Aricel Prepaid 3G SIM, How to check my VAlidity period and balance amount through iPad?. pls help me

    I Have iPad4 and using with Aricel Prepaid 3G SIM, How to check my VAlidity period and balance amount through iPad?. pls help me, M.Kumar, Chennai,
    <Email Edited By Host>

    There are 2 concepts attached to a bank balance. The balance as per your books of accounts and another is the balance maintained with the bank. I believe i need not explain these 2 concepts. These 2 balances can be obtained from Oracle system provided some of prerequsities are met with.
    Balance as per your books - This is nothing but the GL balance available. In order to obtain balances for each bank accounts, it is advised that each bank account should have a separate account code combination. This is achieved generally by having a separate natural account for each bank. The code combination is attached to the cahs account for each bank. By maintaining separate account code combination, the balance in each code combination can be obtained from GL (provided transactions are accounted and posted in GL). These balances represent the balance for each bank according to your books of accounts. You can create an FSG for this purpose and provide the same to the customer, so that they can run the same whenevr they want.
    Balance as per bank - This balance is maintained by oracle in 2 ways - either the bank balance can be manually entered for each bank account for each date (quite cumbersome). Else, while loading the bank statement, the bank balances are also loaded. There are various types of bank balances stored - value dated balance, available balance, float balance etc. Depending on the balances provided by bank along with the bank statement, the bank balance can be recorded in oracle system. After the bank statement is uploaded and balances stored, standard cash management reports are available to query for the bank account balances. In order to view daily movement, the bank statement should be loaded on daily basis.
    Hope this helps.
    Vinit

  • How to check all songs in iTunes

    Until now when I double click a song in iTunes, it plays the song and continues to play all the songs that follow. Now I play one song and it stops and does not play the following songs.  How to check all songs in iTunes? so the music will continue

    52,
    Try command clicking a song and it should select all in songs or a playlist.

  • Can not open dreamweaver due to missing menu.xml file, how do I fix this problem?

    Can not open dreamweaver due to missing menu.xml file, how do I fix this problem?

    @alicia - We have seen such issues before
    Allow me to quote David Powers from http://forums.adobe.com/thread/1192197
    The menus.xml file is normally in your personal configuration folder. Before going to the trouble of reinstalling, try deleting your configuration folder. Details of how to find it are here: http://forums.adobe.com/thread/494812.
    When you launch Dreamweaver, it should build a new configuration folder.

  • How to check plan line items for plan cost at network activity level?

    Dear all,
    How to check plan line items for plan cost at network activity level? The plan cost is done in network activity in CJ20N.
    I am not able to check using CJI4 or CJI9 report.
    Kindly advise.
    Thanks and regards,
    Jessie

    Hi Jess,
    Have you checked the navigation part which is being displayed on left hand side of report S_ALR_87013565 and S_ALR_87013533.? There you have option to check transaction currency and object currency when you double click on it.
    In case if it is not being displayed under navigation. Then you can bring them on from transaction code CJE2. For eg: Report group for S_ALR_87013533 is 12KST1C. Double click on it and it will open up to do changes in report layout and many more. There you also have options to bring on transaction currency too. Similarly, you can check for other report groups as well just by checking the report description.
    But I am not sure if suits your requirements. Wait for other experts to comment on this.
    Regards,
    Amit

  • I get "mail server does not recognize your Apple ID and password" when I try to share a picture via email in iPhoto.  My iCloud email works ok otherwise.  I have check all mail settings for ID and password.

    I get "mail server does not recognize your Apple ID and password" when I try to share a picture via email in iPhoto.  My iCloud email works ok otherwise.  I have checked all mail settings for ID and password.

    In the iPhoto preferences ==> accounts delete the account and re-enter it - sometimes that resolves this
    Or IMHO the best long term solution is to set Apple Mail as the iPhoto e-mail client in the iPhoto preferences and use it - it has a number of advatages and has fewer problems
    LN

  • How to activate all inactive objects for current user

    Hi
    How to activate all inactive objects for current user ...
    ... I have found a (long winded) way to do this:
    - Environment / Inactive Objects
    - Add to Worklist
    - Display Worklist
    - Select All
    - Activate
    this will open a dialog titled "Inactive Objects for <username>"
    which has the exact functionality I need ... but I can't figure out how to get to this dialog directly - without so many intermediate steps
    the SAP docs repeatedly mention the ability to activate the inactive worklist - but do not mention how
    does anybody know the TCode for this dialog?
    thanks
    ps does the term "mass activation" apply to importing change requests rather than development activation?
    Edited by: FireBean500 on Jun 4, 2010 11:07 PM

    No other way. But usually it's far more simple as all objects are already in our own worklist.
    I wonder why your objects are not already in your worklist, as everytime you create or maintain an object, it is added to your worklist.

  • How to check a pdf uploaded for press in a website automatically ??

    how to check a pdf uploaded for press in a website automatically ??
    i am making a new website for a printer.. his client upload pdf online directly in his website, we want that in the case that the pdf is not
    as the printer need it for printing , the site automaticly after checking the pdf profile uploaded open a window and write what is wrong with this pdf
    and if possible fix what he can fix automaticly as pitstop software is doing offline.
    PLEASE YOUR HELP
    thank you in advance

    Acrobat isn't available with a server license. You might like to look into PitStop Server.

  • How to check the Statistics generated for a table through DBMS_STATS.

    Hi,
    How to check the statistics generated for a Table through DBMS_STATS.GATHER_TABLE_STATS procedure ?
    Please let me know.
    Thanks !
    Regards,
    Rajasekhar

    Rajasekhar wrote:
    Hi,
    How to check the statistics generated for a Table through DBMS_STATS.GATHER_TABLE_STATS procedure ?
    Please let me know.
    Thanks !
    Regards,
    Rajasekharquery ALL_TABLES

  • How to check the tran code for specific activity.

    Hello friends ,
    could you please let me know how to check the tran code for specific activity . AS in table , i can check , what transaction does what ? But now i need to check the transaction for specific activity .
    E.g , For Administrator workbench , there is transcation like RSA1 .
    thanks in advance
    Regards

    Hi,
    try the TSTC table with SE16.
    Hope it helps,
    MG

  • How to check the validity period of saprouter ?

    Dear all,
    As per sap note 1178684, we can check the validity period of the saprouter by executing "sapgenpse get_my_name -n validity"
    I am wondering how to run that command in as/400 ?
    Please advise. I am not familiar with as/400 OS.
    Thanks
    Regards,
    Kent

    Hi Kent,
    please proceed as described on the http://www.easymarketplace.de/snc-iseries-setup.php
    logon with sidadm (or sidofr):
    (depending on the user, that is running the SAPRouter)
    CD DIR('/usr/sap/saprouter')
    ADDLIBLE LIB(SAPROUTER)    ??? or whereever the stuff is ...
    RMVENVVAR ENVVAR('SECUDIR')
    ADDENVVAR ENVVAR('SECUDIR') VALUE('/usr/sap/saprouter')
    RMVENVVAR ENVVAR('SNC_LIB')
    ADDENVVAR ENVVAR('SNC_LIB') VALUE('/usr/sap/saprouter/sapcrypto')
    CALL PGM(SAPGENPSE) PARM('get_my_name')
    Regards
    Volker Gueldenpfennig, consolut international ag
    http://www.consolut.net - http://www.4soi.de - http://www.easymarketplace.de

  • How to check CRL validity time from client?

    Hello,
    I have one Windows Server 2003 R2 working as Standalone CA. It provides certificate for one of our internal IIS website.
    I have decreased CRL publish interval from 1 week -> 1 day and Published new CRL.
    However, our webserver is propably not aware of new CRL publishing interval changed on CA, because I suppose webserver has cached CRL locally.
    My question is, how to check cached CRL validity time from our webserver? Its running Windows Server 2003 R2.
    I attempted to run following command on webserver with 0 results: certutil -urlcache crl

    You basically have to wait it out when you are running an 11 year old operating system
    The certutil -urlcache CRL command was introduced in Windows Server 2008/Vista.
    The deletion/inspection of cached CRL data was not really an option in Server 2003
    Brian

  • How to check the validation of RegEx

    Hi,
    My application needs users to enter RegEx. My question is how I can check the validation of the RegEx that users enter. The Class RegExp does not seem to provide any methods to check the validation of the RegEx.
    Regards,
    Haibin

    You would need to hand roll your own validator, unless someone knows of one that is out there. Flex does not validate RegEx.
    If this post answers your question or helps, please mark it as such.

  • How to check CSI validity

    how to check CSI validity

    hi,
    use these tables to get bom details.
    STPO          BOM Item Details
    STPU          BOM Sub Items (designators)
    STKO         BOM Header Details
    MAST         BOM Group to Material
    STZU          BOM History Records
    STAS          BOM Item Selection
    STPF           BOM Explosion Structure .

Maybe you are looking for

  • Need info about text elements used in Smartform / Sapscript

    Hi experts I am working on Smartform / Sapscript i am facing the problem mentioned below I want the info regarding all the text elements used in Form / Script Can u please suggest me the name of table I have already used STXH table but it is not givi

  • Reg: field names of an Internal table

    Hi all, I'd like to display all the field names of the internal table or a structure. I want it very urgent. Use ful answers will be awarded greatly. Thanks in advance. Jagan Mohan.

  • Confused about FCExpress 3.5 Upgrade

    I'm looking to purchase FCExpress 3.5 version but find many offers for version 3.5 Upgrade. Amazon even has a version 3.5 Upgrade (old version) They all seem to offer the full monty package, but the wording is confusing. Any clarification is apprecia

  • Flash Player 10.3 and Internet Explorer 9 update and workarounds

    Hello Flash Player community, We are tracking the issues being reported on Flash Player 10.3.181.14 and Internet Explorer 9 and actively investigating them.  Users are reporting that Flash content is being displayed in the upper left corner of the sc

  • This copy of iTunes will expire in 119 hours.

    I have been getting this message whenever I open iTunes. I'm running version 9.2f1 (45). It appears I am using the latest version. What is with the messages that it will expire?