Decode HTML escaped character references

sure, I can write
string.replace(" " , " ")
but obviously can't do that for all Unicode character references in the world, and surely this problem must be a routine library call .... but eh ... which? I don't seem to be able to find anything by googling.
thanks in advance

@hugoT - thanks for the link to the list ...
... but eh .. I really don't want to do this myself, if there's a public library that will do it for me ... something like ... I send a string over, full of escaped character references, and get a nice and human readable string back.
this kind of bread and butter code must be out there somewhere (i hope)

Similar Messages

  • HTML Entity Escape Character Conversion

    Requirement is to Convert UTF-8 encoded Speciual language characters to HTML Entity Escape Character's. For example In the source I have a Description field with value "Caractéristiques" which is 'Characteristics' in French, This needs to be converted to "Caractéristiques" when sent to the Reciever.i.e the Special Language Symbols like é = é (in HTML Entity format.)
    Below is the Link for a List of HTML Entity Char's
    http://www.theukwebdesigncompany.com/articles/article.php?article=11
    could anybody please suggest how this can be achieved in mapping...any UDF or Encoding techniques...?
    many Thanks.

    Hi Veera
    this is ajay
    code for ur problem
    String ToHTMLEntity(String s) {
              StringBuffer sb = new StringBuffer(s.length());
              boolean lastWasBlankChar = false;
              int len = s.length();
              char c;
              for (int i = 0; i < len; i++) {
                   c = s.charAt(i);
                   if (c == ' ') {
                        if (lastWasBlankChar) {
                             lastWasBlankChar = false;
                             sb.append(" ");
                        } else {
                             lastWasBlankChar = true;
                             sb.append(' ');
                   } else {
                        lastWasBlankChar = false;
                        // HTML Special Chars
                        if (c == '"')
                             sb.append("&quot;");
                        else if (c == '&')
                             sb.append("&amp;");
                        else if (c == '<')
                             sb.append("&lt;");
                        else if (c == '>')
                             sb.append("&gt;");
                        else if (c == '
                             // Handle Newline
                             sb.append("&lt;br/&gt;");
                        else {
                             int ci = 0xffff & c;
                             if (ci < 160)
                                  sb.append(c);
                             else {
                                  sb.append("&#");
                                  sb.append(new Integer(ci).toString());
                                  sb.append(';');
              return sb.toString();
    rewrd points if it help u

  • How to pass character references in JDOM?

    Friends,
    I am trying to pass a binary file over XML(over HTTP POST), as an element, like
    <file>
    <data>here goes the binary file</data>
    </file>
    This is only part of the XML, and I am trying to use JDOM to build the XML tree.
    I am using Element.addContent(String binarycontent) method to set the text for the <data> element. The file has invalid XML characters(like characters less than 0x20), so I am escaping them using character references like &#10;
    etc
    The real problem is JDOM interprets the & character and this gets passed as &amp;#10;
    which is useless on the server side.
    Is there any way to signal JDOM not to interpret the text I use in element.addContent(text) and escape them? Or is there any way to insert a character reference?
    Thanks,
    Ram

    XML files can only contain text. This is a law of XML. Instead of trying to hack around it like you did, you should encode your binary data into text before you put it into the XML. Then, of course, to get the binary file back out you would need to decode it. One fairly common and well-documented way to encode binary data into text is the Base64 encoding, which is described here:
    http://www.faqs.org/rfcs/rfc2045.html

  • Can Linux recognize the escape character?

    Hi,
    It's possible that this problem doesn't belong here. But please give it a try.
    I am developing an project using JSP. It includes image uploading. after image uploading, I use a javascript function popUp(url, ...) to open a new window and display this image. The very strange thing is that, sometimes the link can work, open a new window and display the image while sometimes the link can't work, or it only works for one time, then it fails. I couldn't find the reason yet. it works on IE, and not on Netscape and also not work on Linux OS while do work on Window2000. Can somebody take a look at the following link and tell me how to change it to make sure it is work on Linux OS and netscape. What's the difference between OS and windows to specify a String(that is, url of a link). The linux OS seems interperte \" to ", so it cannot recognize the full url.
    The link is :
    imageLink[i] = "<a href='showForm' onclick=\"popUp('" + request.getContextPath() + "/displayForm.jsp?filename=" + sdb.getImageFileName(i)+"&fileDesc=" +sdb.getImageDesc(i) + "', 'showForm', '600', '450', 'yes'); return false;\">"+sdb.getImageDesc(i)+"</a>";
    Thanks in advance!
    jmling

    Linux will recognize the escape character. It looks like you might have other difficulties with your imageLink tag. For example, I think you need to use tags when you use Java inside your html or javascript..
    onclick=\"popUp(" + <%= request.getContextPath() %> + "/displayform.jsp?...

  • Illegal escape character

    Hi i am writing a servlet which has html in it too...so the commands for html pages i just use out.println("").
    But i wanted to add a new picture on the page and had this command
    out.println("<IMG SRC=C:\Documents and Settings\bsharma\My Documents\My Pictures\index1.gif>");
    but i get a compiling error saying 'illegal escape character'
    I know it is because of the \ ..but is there a way around it?
    -bhaarat

    Try:
    out.println("<IMG SRC=C:\\Documents and Settings\\bsharma\\My Documents\\My Pictures\\index1.gif>");

  • Mapping error: Character reference "&# 00" is an invalid XML character

    Hi All,
      Iam performing the RFC(R/3) -> PI(7.1) -> SOAP (third party software) ; Synchronous scenario.
    The messages are reaching the PI server , but the a mapping errors is occurring due to dummy characters ""& #00" been sent to the XI system.
    Is this due to the R/3 sending the invalid characters or these been generated in PI system. Would you suggest any notes,patches to resolve the issue?
    "MAPPING">EXCEPTION_DURING_EXECUTE
    com.sap.aii.utilxi.misc.api.BaseRuntimeException:
    Character reference "& # 00" is an invalid XML character
    Many thanks!
    guru

    Hi,
    If you go through this link last page and last para, which says..
    "The only solution is to use a Java mapping before the actual mapping to perform the escaping."
    https://www.sdn.sap.com/irj/scn/go/portal/prtroot/docs/library/uuid/502991a2-45d9-2910-d99f-8aba5d79fb42
    Regards,
    Sarvesh

  • Regular Expression Escaped Digit "\d" Illegal Escape Character

    Hello,
    I'm trying to write a regular expression to determine if a String matches a date format that is defined as YYYYMMDD. For example, March 11, 2009 would be "20090311"
    For the time being I don't care if an invalid month or day is entered. I've attempted both of the following
    if (date.matches("(19|20)\d{4}")) {
      // warn the user
    }and
    if (java.util.regex.Pattern.matches("(19|20)\d{4}"), date)) {
      // warn the user
    }And both yield Illegal Escape Character compilation errors, for the "\d" part of the regular expression.
    http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html#sum
    Says that "\d" is the predefined digit character class. So at this point, I don't know what I'm doing wrong. I realize I could just define the character class myself, and use a pattern like "(19|20)[0-9]{4}", but I would like to know why "\d" isn't being recognized by the compiler.
    Thanks,
    Paul

    paulwooten wrote:
    Can someone give me an explanation of heuristics, as they might apply to SimpleDateFormat? Does this mean that if the format was similar the parser might figure it out? Say, if instead of "yyyyMMdd", it was "yyyyddMM", or "yyMMdd"?No. Since all of these are valid formats, there's no way for the parser to distinguish this.
    Or does this have to do with rejecting February 29, and other dates like that.That's the one. When setLenient(false) is called, then the 29th February is only accepted in leap years.
    It will also reject the 57th January when lenient is set to false (try parsing that with lenient=true, you'll be surprised).
    I've read some of the wikipedia article about heuristics, but I'm confused as to how it would apply to this example.Don't concentrate to much on the term heuristics. Just remember: lenient=true means that not-really-correct dates will be accepted, lenient=false means more strict checks.

  • Disable html escaping mode - Basic

    4.2.1
    Hi,
    Our application runs within our network and hence no security worries as such. It's also not a critical application.
    In 4.2.1, there is the html escaping mode with only two options basic and extended. Even with Basic it escapes & < and >. IS there a way to disable that? We have parameter passing where some items have & in their names. and they seem to be getting skipped.
    I pass parameters when user clicks on a link in a report(using standard link features) which creates the URL. But looks like the names which have & have those removed.
    Thanks,
    Ryan

    ryansun wrote:
    4.2.1
    Our application runs within our network and hence no security worries as such. It's also not a critical application.
    In 4.2.1, there is the html escaping mode with only two options basic and extended. Even with Basic it escapes & < and >. IS there a way to disable that?No. It's required by the HTML specification.
    We have parameter passing where some items have & in their names. and they seem to be getting skipped.
    I pass parameters when user clicks on a link in a report(using standard link features) which creates the URL. But looks like the names which have & have those removed.As '&' is a URL-reserved character it must be encoded in order to be passed in a URL parameter, for example using the <tt>apex_util.url_encode</tt> API.
    As has been recommended before, the simple way to avoid problems in this area is not to pass string data values as URL parameters. Pass simple numeric or alphanumeric key values, and use these to retrieve additional information using computations and processes on the target page.

  • Query of Queries (QofQ) Escaped Character Problem

    Hello All,
    I'm trying to run a query or queires (QofQ) and I'm doing a
    LIKE comparison that looks for bracket characters ([ ]) within a
    string, however ColdFusion is ignoring the brackets. How can I
    escape the bracket character? So far I have only been able to
    escape the percent sign based on the ColdFusion Live Docs. The
    error message I get when I run the query below is:
    Invalid Escape Sequence. Valid sequence pairs for this escape
    character are: "\%", or "\_".
    Here is the query:
    <cfquery dbtype="query" name="getLogs">
    SELECT *
    FROM GetLogs
    WHERE Description LIKE '%\[User:#UserID#\]%' ESCAPE '\'
    </cfquery>
    Thanks for your help!

    You are correct. If you leave the brackets in the LIKE
    statement, it will return results as if the brackets weren't there
    at all.
    Perhaps I need to figure out the ASCII character value of the
    bracket and include it that way i.e. #Char(?)# where the question
    mark would be the numerical value of that character.
    My temporary solution has been to leave off the starting
    bracket:
    <cfquery dbtype="query" name="getLogs">
    SELECT *
    FROM GetLogs
    WHERE Description LIKE '%user:#UserID#]%'
    </cfquery>
    This has (so far) returned the results i'm looking for
    although its not as 100% accurate without that beginning [ in the
    LIKE statement.

  • Urgent Help - in using Escape character

    hai,
    i have problem in using escape character..
    can anyone help me out in the same...
    sb.append(<jsp:getProperty name="resume_main" property="name"/>);
    //error i am getting is -- Missing term, ')' expected.
    pl help me out in using the escape character in the above statement.
    thanx in advance
    regards
    koel

    try
    sb.append("<jsp:getProperty name='resume_main' property='name'/>");
    or
    sb.append("<jsp:getProperty name=\"resume_main\" property=\"name\"/>");
    both will work

  • PDF/A Conversion Err :CIDset in subset font is incomplete & Character references .notdef glyphs

    I was trying to convert some PDF documents to PDF/A-3b. Using Acrobat XI Pro on a Windows Vista PC, the conversion proceeded fine, but I noticed the size of the converted file increased from 1.91 Mb to  16.8 Mb (See spage usage audits below). I expected some increase in size due to the font embedding and such, but did not expect such a big jump. On Auditing the space usage, I noticed that the converted file had no embedded fonts, but the vast majority of space was being consumed by images. Turns out the "Convert to PDF/A-3b" profile in Preflight settings were set to convert all pages to images, if the regular conversion failed. I edited the profile to not do so, but fail on an error. This time when I passed the document through a "Convert to PDF/A-3b" pre-flight, it failed with the following errors.
    CIDset in subset font is incomplete (font contains glyphs that are not listed). It appears to be referring to the Arial and TimesNewRoman fonts.
    Character references .notdef glyph
    How do I fix these errors? For the CIDset error, I noticed some folks in the InDesign forums mentioning you can locate the missing glyphs and then replace the fonts with other fonts in which these glyphs are present. I am wondering if this is the issue here and something similar can be done in Acrobat. As for the .notdef glyphs, I couldn't find anything. Any help would be much appreciated.
    Thanks,
    Ron
    Note:
    If I edit the conversion preflight profile to allow replacing the pages with images on regular conversion errors, the conversion goes through fine, but as shown below, there is a huge jump in size, which I would like to avoid. I have about 2GB worth of documents, and the conversion ends up using over 8 GB of space.
    Regular PDF
    Archival Ready PDF (PDF/A-3b)

    What software are you using to do OCR?
    Is there a way you can adjust what font is being used by the OCR software or use a different software for OCR? This is not  a standard OpenType font or one the can be mapped to Unicode.
    I'm not sure exactly how OCR software places fonts in  PDFs. Does it show up in the PDF Properties under Fonts?   If so, what is the name of the Font.
    When you examine the Tagged text in the Tags Panel, if you open up a tag and look at the content does it make sense or is it nonsense (gobbledygook)?
    While OCR text is not visible to the end user directly, it can be selected using the text tool and it is recognized by Acrobat for tagging purposes.  The Assistive Technology, e.g., screen reader. will be reading from this text. So if it is not understandable you do not have an accessible file.
    You would probably obtain better results using the Acrobat OCR feature, preferably on a Windows machine.  It's been a while since I've exchanged files between Mac and Windows, and I wouldn't trust that the encodings would be the same without testing it.

  • Bypass Adapter URI Endpoint with Escape Character for Web Service

    Dear All,
    I would like to apply by pass adapter URI Endpoint for XI webservice, the default format is
    http://<host>:<port>/sap/xi/engine?type=entry&version=3.0&Sender.Service=<BusinessService>&Interface=<namespace>^<Outbound Interface name>
    If I am using format using with carat () character then there has no problem to the service, but consumer doesn't support carat () character. I instead the carat (^) with URL Escape Character (%5E)
    http://<host>:<port>/sap/xi/engine?type=entry&version=3.0&Sender.Service=<BusinessService>&Interface=<namespace>%5E<Outbound Interface name>
    Then error occurred
    <SOAP:Envelope xmlns:SOAP="http://schemas.xmlsoap.org/soap/envelope/">
       <SOAP:Body>
          <SOAP:Fault>
             <faultcode>SOAP:Server</faultcode>
             <faultstring>System Error</faultstring>
             <detail>
                <s:SystemError xmlns:s="http://sap.com/xi/WebService/xi2.0">
                   <context/>
                   <code>RCVR_DETERMINATION.MESSAGE_INCOMPLETE</code>
                   <text>Message is incomplete. No Sender found</text>
                </s:SystemError>
             </detail>
          </SOAP:Fault>
       </SOAP:Body>
    </SOAP:Envelope>
    How to resolve this error...
    Thank you.
    Regards,
    Weng

    Hi ,
    as per my knowledge.....................
    When you create a WSDL with the help of a wizard. In the Integration Directory, choose Tools -> Define Web Service to enter the wizard.
    On Propose URL button, as this genertated URL default Point to Entegration Engine.. SO already proformance wise Good.
    If You want to Point your URL to adapter engine , use below given URL this will point ur incoming soap  message to SOAPadapter sender channel
    http://<host>:<j2ee-port>/XISOAPAdapter/MessageServlet?channel=:<service>:<channel>.
    Regards
    Prabhat Sharma.

  • Converting HTML Escaping to Unicode Escaping characters in Java

    Hi,
    I am getting some HTML escaping for special characters like pound, space, dollar etc. from database in HTML escaping format as  &apos; &pound;      &reg; etc.which I want to convert their Unicode equivalent escaping as U00A3,U0026. Java only convert & to & (U0026) but rest of the characters are not getting converted. If there is any API or way to do this please reply.
    Note : I cant change Database as there are already thousands of records & My front end only needs Java to do all these conversions I cant change that also.

    I have posted a method that does what you want. It was a long time ago since I wrote it and you should probably use a StringBuilder instead of a StringBuffer if you are going to use it in Java 5 or later. You can find the method in this thread:
    http://forum.java.sun.com/thread.jspa?threadID=652630

  • Displaying unicode or HTML escaped characters from HTTPService in Flex components.

    Here is a solution on the Flex Cookbook I developed for
    displaying data in Flex components when the data comes back from
    HTTPService as unicode of HTML escaped data:
    Displaying
    unicode or HTML escaped characters from HTTPService in Flex
    components.

    Hi again Greg,
    I have just been adapting your idea for encountering
    occasional escaped characters within a body of "normal" text, eg
    something like
    hell&ocirc; sun&scaron;ine
    Now, the handy String.fromCharCode(charCode) call works a
    dream if instead of the above I have
    hell&#244; sun&#353;ine
    Do you know if there is an equivalent call that takes the
    named entities rather than the numeric ones? Clearly I can just do
    some text substitution to get the mapping, but this means rather
    more by-hand work than I had hoped. However, this is definitely a
    step in a useful direction for me.
    Thanks,
    Richard
    PS hoping that the web page won't simply outguess me and
    replace all the above! Basically, the first line uses named
    entities and the second the equivalent numbers...

  • How to use escape character in update statement.

    Hi All,
    I'm trying to update table using following sql update statement, but everytime it's asking me for the input due to the '&' value in below sql.
    UPDATE xyz_xyz
       SET NAME = 'ABC & PQR'
    WHERE ID = (SELECT ID
                   FROM abc_abc
                  WHERE NAME = 'C & PQR');Please let me know how to use escape character syntax or let me know if there is any alternative solution.
    Thanks,
    Vishwas

    Hi,
    By default, & marks a substitution variable name.
    If you're not using substitution variables in that statement (or, if this is in PL/SQL, in that entire package or procedure) then the easiest thing to do is just diable substitution variables; then & will be a normal character:
    SELECT  DEFINE  OFF
    UPDATE xyz_xyz
       SET NAME = 'ABC & PQR'
    WHERE ID = (SELECT ID
                   FROM abc_abc
                  WHERE NAME = 'C & PQR');
    SET  DEFINE  ONIf you can't do that, then & is always taken literally if it comes right before a single-quote, so you could say:
    UPDATE xyz_xyz
       SET NAME = 'ABC &' || ' PQR'
    WHERE ID = (SELECT ID
                   FROM abc_abc
                  WHERE NAME = 'C &' || ' PQR');There is a SQL*Plus "SET ESCAPE" command, too, but if you use it, you have to worry about whether the escape character is to be taken literally or not.
    SET   ESCAPE  \Yet another alternative is to make some other character, such as ~, mark the substitution variables:
    SET  DEFINE  ~Read all about them in the SQL*Plus manual.
    http://download.oracle.com/docs/cd/B28359_01/server.111/b31189/ch2.htm#sthref103

Maybe you are looking for

  • Cannot access Bam Administrator via URL

    Do I need to configure something in IIS? I used a mix of the Quick Install and the Install instructions. I discontinued the Quick Install after I ran bam\scripts\misc\orabam_ts_users.sql, and install.cmd failed to recognize that the tablespace and us

  • Problem w/FB2 Mac OS X Design mode

    Can not switch to design mode w/o getting this error message ""An unknown item is declared as the root of your MXML ". This msg pops up even with a new file generated by the builder wizard. This bug doesn't exist on winXP with the same file, so this

  • Exported JPEGs from LR2.7 are too orange

    I am using LR 2.7 and a color calibrated EIZO Color Edge 222W monitor.  When I export images as JPEGs and view the images using Windows Picture and Fax viewer the images ae significantly color shifted to the orange.  It doesn't matter which color spa

  • Dealer Portal

    Hi, How to Start with dealer Portal ? I know what is to be done at Portal side but dont know about SPRO settings . Is there any special authorisation required for the users of Portal ? What else I should keep in mind ? Regards, Sanjay

  • With out CLOSE_FORM  in BDC

    Hi experts, If i write the code with out CLOSE_FORM what will happen to my session in BDC programing , Can any body explain breifly?