Html entity expansion

I was given an HTML document that has an XML document embedded in the HEAD. The XML document is escaped with html entities, so instead of angle brackets, you have the entity equivalent which is "ampersand less than semi colon" (if i typed it out literally it would look like an angle bracket here), or "ampersand greather than semi colon"
Is there a known method for expanding html entities? Or do I need to code up my own entity replacement routine?
Thanks for any help.

I really can't imagine why somebody would do that. It just sounds like a loony idea to me. (I mean the part about putting escaped XML documents in the head of an HTML document, not what you propose to do about it.)
If that's really your requirement, then hopefully the escaped XML is the text inside an element. (In my example, it's the text inside the "head" element except that I put extra whitespace before and after it.) And hopefully the HTML you're getting is actually XHTML, so you could feed it into an XML parser.
If both of those conditions are true then what you should do is this:
1. Feed the entire document into an XML parser. Extract the text from the "head" element that's the child of the root "html" element. You could certainly use DOM for that if you wanted. This will be the XML document you're after, and the parser will have unescaped it for you.
2. Feed that text into another XML parser and proceed with whatever you have to extract from it.
But based on the looniness of this data format you probably won't have either of those conditions true.
If the data isn't XHTML then run it through something that cleans it up, JTidy or TagSoup or something like that.
If the XML document isn't the only text in an element then you'll have to do some string hacking to get rid of the other text.

Similar Messages

  • Convert invalid xml characters to HTML-Entity

    Hi,
    How can i convert invalid XML characters like �,�,�, . . . to the HTML- Entity &auml &uuml &ouml ?
    Is there any Method or class who can handle an input string and transform the invalid characters?
    Or is there another way to mask this characters so that an XML parser do not throw an error when parsing the document.
    Best regards,
    Michael

    Ok sorry, I'll give you more details what i want to do and where i have the problems.
    I have the following xml string:
    <font family="Times New Roman" size="14" color="#333333">This is a sample Text</font>
    The xml-string can contain any characters because the content is from a text pane where the user can type in any characters.
    I use the DOM parser to parse this input string to get the attributes and the text content.
    And thats my problem, how can i make sure that this string wont throw any exceptions when i parse it with DOM?
    Parsing the string with the follwing code:
    public XMLElement parse(String sourceString)
            //create a new xml element
            XMLElement xmlElement = new XMLElement();
            //create a new document
            DocumentBuilder builder = build();
            //now parse the string into the document
            InputStream is = new ByteArrayInputStream(sourceString.getBytes());
            Document document = null;
            try
                document = builder.parse(is);
            catch (SAXException e)
                System.out.println("SAXError while parsing the document");
                e.getMessage();
                //no valid document
                return null;
            catch (IOException e)
                System.out.println("IO Error while parsing the document");
                e.getMessage();
                //no valid document
                return null;
            //get the element
            org.w3c.dom.Element element = document.getDocumentElement();
            if (element != null)
                xmlElement.setNodeName(element.getNodeName());
                xmlElement.setNodeValue(element.getTextContent());
                //attributes defined?
                int length = element.getAttributes().getLength();
                //get the attributes, if defined
                for (int i = 0; i < length; i++)
                    xmlElement.addAttribute(
                            element.getAttributes().item(i).getNodeName(),
                            element.getAttributes().item(i).getTextContent());
            return xmlElement;
        } XMLElement is my own class.
    The builder:
    private DocumentBuilder build()
            DocumentBuilder docBuilder = null;
            try
                DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
                docBuilder = factory.newDocumentBuilder();
            catch(ParserConfigurationException pce)
                System.out.println("Error while creating an DocumentBuilder");
                pce.getMessage();
            //return the document builder
            return docBuilder;
        }Message was edited by:
    heissm - spelling mistakes :(

  • Entering HTML Entity Codes

    I haven't been able to find a way to insert HTML entity codes (specifically, to use em dashes, but other needs arise) through the interface. HTML snippets are intended for something entirely different. Apart from opening published output in an editor to insert the entity codes, is there a simple way that I've missed finding?

    If you want to change the characterset then presumably you can only do that post publishing.
    http://www.markboulton.co.uk/journal/comments/fivesimple_steps_to_typesetting_on_the_webdashes/
    This article may be of use, particularly
    +"In Unicode, the em dash is U+2014 (decimal 8212). In HTML, the numeric forms are — and —. The HTML entity is —."+

  • Entity Expansion Limit Reached!

    Hi experts!
    I use JAXB to parse the DBLP data, however, I received an error message as follows
    "org.xml.sax.SAXParseException: The parser has encountered more than "64,000" entity expansions in this document; this is the limit imposed by the application."
    Can someone give me a hint on how to increase the number of entity expansions allowed by SAXParse?
    Thanks!

    Set the system value -DentityExpansionLimit=128000.

  • HTML Entity Escape Character Conversion

    Requirement is to Convert UTF-8 encoded Speciual language characters to HTML Entity Escape Character's. For example In the source I have a Description field with value "Caractéristiques" which is 'Characteristics' in French, This needs to be converted to "Caractéristiques" when sent to the Reciever.i.e the Special Language Symbols like é = é (in HTML Entity format.)
    Below is the Link for a List of HTML Entity Char's
    http://www.theukwebdesigncompany.com/articles/article.php?article=11
    could anybody please suggest how this can be achieved in mapping...any UDF or Encoding techniques...?
    many Thanks.

    Hi Veera
    this is ajay
    code for ur problem
    String ToHTMLEntity(String s) {
              StringBuffer sb = new StringBuffer(s.length());
              boolean lastWasBlankChar = false;
              int len = s.length();
              char c;
              for (int i = 0; i < len; i++) {
                   c = s.charAt(i);
                   if (c == ' ') {
                        if (lastWasBlankChar) {
                             lastWasBlankChar = false;
                             sb.append(" ");
                        } else {
                             lastWasBlankChar = true;
                             sb.append(' ');
                   } else {
                        lastWasBlankChar = false;
                        // HTML Special Chars
                        if (c == '"')
                             sb.append("&quot;");
                        else if (c == '&')
                             sb.append("&amp;");
                        else if (c == '<')
                             sb.append("&lt;");
                        else if (c == '>')
                             sb.append("&gt;");
                        else if (c == '
                             // Handle Newline
                             sb.append("&lt;br/&gt;");
                        else {
                             int ci = 0xffff & c;
                             if (ci < 160)
                                  sb.append(c);
                             else {
                                  sb.append("&#");
                                  sb.append(new Integer(ci).toString());
                                  sb.append(';');
              return sb.toString();
    rewrd points if it help u

  • HTML entity missing in Design View

    Hi Folks,
    When I use the HTML entity &rarr; (right arrow), it renders properly in browsers as an arrow, but it shows as an empty box in Dreamweaver Design View. This is common with some other special characters also.
    Any idea as to why this happens?
    Thanks.
    Paul
    Dreamweaver 8
    Windows XP Pro SP3

    I just tried it in DW8 and it works for me.
    Pasted in code view.  &rarr;
    Are you using a valid document type?
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
    <html xmlns="http://www.w3.org/1999/xhtml">
    <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    Nancy O.
    Alt-Web Design & Publishing
    Web | Graphics | Print | Media  Specialists
    www.alt-web.com/
    www.twitter.com/altweb
    www.alt-web.blogspot.com

  • Dreamweaver cc html entity conversion problem in mac -NO utf-8 related answer please

    I probably am fighting against a bug existing in DW for a while, and i'm really on the edge of bursting out! 
    Here are the specifications:
    Dreamweaver CC from creative cloud (also tested w/ CS5.5 too) installed on mac, OS and DW user interfaces are english, and on mac turkish keyboard layout is also installed.
    I have been using DW for maybe 15 years, since it was macromedia.. But was always on windows. This is the first time I use it on mac. Here is my problem step by step:
    1- Dreamweaver > Pereferences > New Document > Default Encoding: Western (ISO Latin 1) (NOT UTF-8 PLEASE, IT KEEPS THE CHARS UNCHANGED, ISO LATIN1 IS IMPORTANT)
    2- Go to Design View,
    3- There are 6 special characters in Turkish (times 2 for the caps versions of course), type:
    ĞÜŞİÖÇğüşıöç
    4- Go back to code view, what i should have seen was:
    &#286;&Uuml;&#350;&#304;&Ouml;&Ccedil;&#287;&uuml;&#351;&#305;&ouml;&ccedil;
    But I see:
    Ğ&Uuml;Şİ&Ouml;&Ccedil;ğ&uuml;şı&ouml;&ccedil;
    There are 3 chars (and capital versions) NOT converted to html entity at all. Which were: ĞŞİğşı
    But I should have seen them as: &#286;&#350;&#304;&#287;&#351;&#305;
    Any help would be appreciated, I do not want to leave my old friend DW just because of a weird conversion problem...

    Ok, when you look at the code view, what do you see exactly?
    do you see unconverted
    ĞÜŞİÖÇğüşıöç
    or converted
    &#286;&Uuml;&#350;&#304;&Ouml;&Ccedil;&#287;&uuml;&#351;&#305;&ouml;&ccedil;
    Here is one of my reasons:
    I sometimes create newsletters in turkish for my customers, and the html files i prepare are sent to customers attached as inline through various versions of outlook or thunderbird, or through i completely different email sender company (none is sent by me, i only create the html file). And most of the time the headers and some coding are cut off from the code when used to send as newsletter, and i have no control at all on it. so i have to create absolute correct viewed/rendered html files since i have no control at all on which sending method will be used or which os or browser or mail system will be used to open it...

  • HTML entity codes in XML do not display in IE browsers

    I have an XML file generated via PHP/MySQL that contains HTML entities like " &sup2; ". I'm using DW CS3 to create a Spry (1.6.1) dataset from the XML file and filling a Spry table.
    If I open the table page in any IE browser (6, 7 or 8) the table is filled and displays OK but these characters are just missing, however, I can open this same page with the same Spry dataset in Firefox, Safari, Opera, Chrome, etc. and all characters are displayed correctly, it's just IE that refuses to play.
    I've tried changing DTDs and UTF8 page encoding, nothing seems to make a difference.
    Any suggestions greatly appreciated  [ I can feel wrist slitting time approaching ... ] 

    Hi Phil, it works!
    Many thanks for your suggestions, they certainly helped solve the problem. I had previously tried wrapping in CDATA but I just got the entity codes displayed as text and sort of gave up looking there.  
    The solution is to use CDATA to wrap the relevant content AND to format the column type to "html" with set "ColumnType".
    I also found that Entity declarations can be removed from the XML, so I shouldn't have to worry about any entity codes I might encounter in source data.
    The site I'm working on will eventually be multilingual with supplied database content, so I can't control the source data and all site features must work equally well in all languages.
    I've updated the sample file with some Russian source data and set both character columns to "html", it works just fine! It shouldn't matter which method gets used to insert any special characters.
    http://www.tech-nique.co.uk/development/spry_data/spry-data-table.php
    many thanks for all responses, much appreciated!
    [ every day's a school day ~;o) ]

  • LIGHTSWITCH HTML ENTITY SOFT DELETION

    Hello all,
    I am working on LightSwitch HTML client application. I have an issue regarding entity soft deletion.
    For your kind information, I have also referred Beth Messi's link:
    http://blogs.msdn.com/b/bethmassi/archive/2011/11/18/using-the-save-and-query-pipeline-to-archive-deleted-records.aspx
    My Scenario:  
    I have an Accounts table which is related to other tables.
    In Accounts edit screen, I have a delete button. On deleting account from edit screen Account should be marked as deleted in database(i.e. soft delete) if it has no reference in any other table, otherwise show client side an error message
    myapp.EditAccount.DeleteAccount_Tap_execute = function (screen) {
    // Write code here.
    screen.Account.deleteEntity();
    myapp.applyChanges().then(function success() {
    // Delete successful message.
    }, function error(e) {
    // Delete failure message.
    According to Beth Messi's example: On deletion, the entity will discard changes and mark the entity as soft deleted. 
    So, If data is discarded then I do not get the server side exception " Could not delete. entity is in use." and my code inside function error() could not execute.
    Any different way to accomplish this is also appreciated.  
    Thanks,
    Ravi Patel

    HI Ravi,
    As Beth’s blog said, it marks records for deletion without actually deleting them from the database. It discards the changes, this reverts the deletion.
    As I know, myapp.applyChanges(), calling apply will save all the changes of the current changeset. If only one changeset exists, it will save them to the database. If the current changeset is a nested scope, it will commit the changes to
    the parent changeset. I don’t think you can use myapp.applyChanges()
    method in this scenario.
    Best regards,
    Angie 
    We are trying to better understand customer views on social support experience, so your participation in this interview project would be greatly appreciated if you have time. Thanks for helping make community forums a great place.
    Click
    HERE to participate the survey.

  • Problem: FlashPlayer 10.1 XML and HTML-Entity Rendering problem

    Hi,
    I have some problems using
    childNode[0].nodeValue
    and HTML Entities since updating my FlashPlayer from version 10.0 to 10.1
    First some information about my system:
    FlashPlayer: WIN 10,1,53,64
    OS: WinXP (32bit)
    Browser: Firefox 3.6.6; IE 7.0.5730.13
    I am handling XML data which contains for example some HTML Entities like "&lt;" or "&gt;". A XML-Parser reads the nodeValue and puts the text into a HTML enabled textfield. Now FlashPlayer Version 10.1 does not display the text after "&lt".
    For example the following text in XML:
    <![CDATA[<ul><li>pressure &lt; 250bar</li>
    is rendered as "pressure". Debugging the application shows, that after getting the Text with
    childNode[0].nodeValue
    returns "pressure < 25bar" so HTML textfield interprets the "<" as a HTML Tag.
    Possible Workaround: Using
    <![CDATA[<ul><li>pressure %30C; 250bar</li>
    and replacing it after reading the nodeValue with "&lt;" solves the problem.
    Ist there any other solution without changing my XML Contents? Can I tell Flash or my XMLParser that HTML Tags must not be replaced?
    Thank's for any idea and help.

    Investigate the problem, but did not become easier.
    When calling external function used method "<invoke name="function" returntype="xml"><arguments><string>.....</string></arguments></invoke>" flashplayer remove from string value tag "CDATA".
    This is as in 10.0 player so and in 10.1.
    But after install 10.1 version string exposed decoding. All escape symbols convert to real char data.
    Example:
    "and WELL-FORMED&lt;/b&gt;&lt;/font&gt; HTML"
    =>
    "and WELL-FORMED</b></font> HTML"
    So as CDATA deleted is abnormal decoding XML data in the AS code.
    Who ever can help overcome this unnecessary effect?

  • HTML Entity Question

    &#9660; nicely displays a downward facing solid triangle in Firefox
    IE6 will not display it.
    Will IE7 or later versions show it?
    Many thanks for your attention, it is appreciated.

    This is not my issue. Your forums site doesn't work very well.
    My Issue:
    I just upgraded my PC from XP Professional 32-bit with CS4 to Win 7 64-bit  with CS5. I created and used 4 master templates (not nested) in CS4 which ran perfectly. Since I upgraded to CS5, I am unable to update the child pages at all. I mean it is DOA!! I downloaded an update fixer from Adobe but that seemed to be for the extension manager, not updating pages from a template. I tried all the usual template commands such as Update Current Page, etc....When I tried the command Open Attached template, the message I got  was that I was denied access, but I manually opened it just fine. (@#$%@#$%&%^&$&*)
    What can I do to get my DW to update?

  • HTML online editor and accented letter

    Hi all gurus,
    please forgive the nooby question, I'm a newbie about Portal and I've searched sdn forum without success for the following task.
    Strictly; I have to publish an HTML file in an iview of the Portal.
    This task was accomplished quite easily, the problem is the simple text which is in the HTML file.
    As requirement, that HTML should be an info page; there are then few users which don't know anything of HTML which should update that info page with relevant news and activities.
    The purposed solution is, off course, to use the integrated online HTML editor; here the problem arise as every accented letter in the HTML is shown with strange, ascii characters.
    Googling a bit, I found that a correct charset in the HTML file could do the job:
    charset=ISO-8859-1
    as we're using Latin1.
    Tried this solution on an HTML file on my desktop, it works. The above solution, however, does not work in Portal; the only solution is to use HTML entity (e.g. &agrave), but this is not so immediate to explain to users which don't usually work with HTML. Furthermore, they should switch to textual editor as the online HTML editor doesn't support explicit entity entries.
    I'd rather expect the online HTML editor (which is kinda wysiwyg) has an automatic routine (as in Dreamweaver and most others programs)  to convert an accented letter to the corresponding HTML entity. So I guess the problem is on Portal side (configuration?)
    Could anyone help me solving this issue, or find a workaround for that?
    Thanks!

    Because it looks at the template the object in the system be it a blog or a page it looks for those styles from what it considers the main CSS for that template. Split them up in any way you run into problems.
    Always good to keep one CSS for things for perfomance anyway but it will avoid this and other similar issues you not encounted yet with regard to your issue.

  • Create an XML Document with HTML Entities?

    I'm writing a program that uses SAX to parse an XML document and generate XHTML from it, but I would like the XML document to allow HTML entities, of course, in the XML markup.
    Each time my parser gets to any entity at all it dies reporting an undefined entity.
    Is there any way I can refer to the HTML entity DTD or perhaps, in my own DTD, copy and paste the HTML entities? How would I make a reference like this in my XML document?
    Thanks

    Seems to me you would just put something like<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
            "http://www.w3.org/TR/REC-html40/loose.dtd">before your root element. Or whichever DTD you actually want. I found out about them here:
    http://www.utoronto.ca/webdocs/HTMLdocs/HTML_Spec/html.html

  • Cannot render special HTML character with Java

    I'm pretty sure this is a general Swing issue, please don't ignore this because I reference JavaHelp. When using JavaHelp and French as the displayed language, I'm having problems displaying the &#156; character (HTML entity &# 156;). Below I've included a sample HTML file, based on what my actual files looks like, which should demonstrate the problem. If I use a browser, or even Notepad, to open this file, the character displays just fine. However, in my JavaHelp popup (which uses Swings HTML renderer under the covers if I am not mistaken), all I get is a box. I've tried using the actual character and the HTML entity, but to no avail. Comments/suggestions/pointers would be greatly welcome!
    <html>
        <head>
            <meta http-equiv="Content-Type" content="text/html;charset=iso-8859-1">
            <style>
                li {padding-bottom: 6px; padding-top: 6px;}
                body {font-family: Helvetica, Arial, sans-serif; } td {font-size: smaller;}
            </style>
        </head>
        <body alink="#ff0000" bgcolor="#ffffff" link="#0000ff" text="#000000" vlink="#800080">
            &#156;
        </body>
    </html>Thanks,
    Jamie

    all I get is a boxSo JavaHelp is rendering it as a single character and not as the six characters &, #, 1, 5, 6, and semicolon. So far, so good. But it appears that the font JavaHelp is using is unable to render that character correctly, so you get a box.
    Can you control the fonts that JavaHelp uses? If so, try using a font that can render the &#156; character.
    PC&#178;

  • Evdre error using "dynamic hierarchy expansion" option

    Dear all,
    I'm trying to expand the evdre using the option "dynamic hierarchy expansion", but I get an error in function EVDRE(). This is the error:
    #ERR: Consolidation Mode ON - Only keyword ""Blank"" is applicable for dimension ENTITY when with dimension GROUPS.
    I have not put any value on the memberset of of the entity expansion according with the error message. Here you have the example:
    PARAMETER      EXPANSION 1           EXPANSION 2
    ExpandIn            COL                 COL
    Dimension          GROUPS                 ENTITY
    MemberSet         DEP,PARENTAFTER     
    BeforeRange          
    AfterRange          
    Suppress            Y     
    Insert          
    The following link helps to understand how it works: http://help.sap.com/saphelp_bpc75/helpdata/en/5A/69200C88AA40C9B18844A25259F147/frameset.htm
    Thanks,
    Ru

    Hi Teko,
    Thanks for the information.
    In that case, I think Active X has not been enabled in IE.
    Kindly follow the below step.
    In IE, Tools-> Internet Optios -> Security -> Security level -> Change it to medium or go to custom level -> and enable Active X
    Hope this helps.

Maybe you are looking for

  • DBSequence entity attribute type not available

    Hi OTN, I want to set an entity attribute type to DBSequence. But there's no such type in a drop-down list, only Java types. I tried to set the type in source manually but at runtime framework doesn't assign a negative integer to the attribute at Cre

  • About the finally block of the try catch.

    I know that finally block contains the code that will be executed in any condition of the try catch. However, I think it is unneccessary, since the stack after the try catch stack will be executed any way. Any one can help? for example try{ System.in

  • Pagination question

    What pagination template in APEX 3.0 should I use in order to get the following format (this is a format used by this message board)? Messages: 161,954 - Threads: 34,503 - Pages: 2,301 - [ Previous | 1  2  3  4  5  6  | Next  ] If there is no such a

  • Sending payroll from external system to SAP

    Hello SAP experts, please could you help me to find optimal solution for the following scenario? We use external system which generates payroll and we need to send the data to SAP. SAP then should post it to a bank and also to GL accounts. We can not

  • How to create customized DataTable???

    Hi all, I will have to create Customized DataTable. Requirements for the DataTable are mentioned below 1.*DataTable Should Accept An Array from which Column Name Values are iterated.* 2.*DataTable Should Accept List from which rows are iterated as us