Decode HTML escaped character references

sure, I can write
string.replace(" " , " ")
but obviously can't do that for all Unicode character references in the world, and surely this problem must be a routine library call .... but eh ... which? I don't seem to be able to find anything by googling.
thanks in advance

@hugoT - thanks for the link to the list ...
... but eh .. I really don't want to do this myself, if there's a public library that will do it for me ... something like ... I send a string over, full of escaped character references, and get a nice and human readable string back.
this kind of bread and butter code must be out there somewhere (i hope)

Similar Messages

HTML Entity Escape Character Conversion

Requirement is to Convert UTF-8 encoded Speciual language characters to HTML Entity Escape Character's. For example In the source I have a Description field with value "Caractéristiques" which is 'Characteristics' in French, This needs to be converted to "Caractéristiques" when sent to the Reciever.i.e the Special Language Symbols like é = é (in HTML Entity format.)
Below is the Link for a List of HTML Entity Char's
http://www.theukwebdesigncompany.com/articles/article.php?article=11
could anybody please suggest how this can be achieved in mapping...any UDF or Encoding techniques...?
many Thanks.

Hi Veera
this is ajay
code for ur problem
String ToHTMLEntity(String s) {
          StringBuffer sb = new StringBuffer(s.length());
          boolean lastWasBlankChar = false;
          int len = s.length();
          char c;
          for (int i = 0; i < len; i++) {
               c = s.charAt(i);
               if (c == ' ') {
                    if (lastWasBlankChar) {
                         lastWasBlankChar = false;
                         sb.append(" ");
                    } else {
                         lastWasBlankChar = true;
                         sb.append(' ');
               } else {
                    lastWasBlankChar = false;
                    // HTML Special Chars
                    if (c == '"')
                         sb.append(""");
                    else if (c == '&')
                         sb.append("&");
                    else if (c == '<')
                         sb.append("<");
                    else if (c == '>')
                         sb.append(">");
                    else if (c == '
                         // Handle Newline
                         sb.append("<br/>");
                    else {
                         int ci = 0xffff & c;
                         if (ci < 160)
                              sb.append(c);
                         else {
                              sb.append("&#");
                              sb.append(new Integer(ci).toString());
                              sb.append(';');
          return sb.toString();
rewrd points if it help u

How to pass character references in JDOM?

Friends,
I am trying to pass a binary file over XML(over HTTP POST), as an element, like
<file>
<data>here goes the binary file</data>
</file>
This is only part of the XML, and I am trying to use JDOM to build the XML tree.
I am using Element.addContent(String binarycontent) method to set the text for the <data> element. The file has invalid XML characters(like characters less than 0x20), so I am escaping them using character references like 

etc
The real problem is JDOM interprets the & character and this gets passed as &#10;
which is useless on the server side.
Is there any way to signal JDOM not to interpret the text I use in element.addContent(text) and escape them? Or is there any way to insert a character reference?
Thanks,
Ram

XML files can only contain text. This is a law of XML. Instead of trying to hack around it like you did, you should encode your binary data into text before you put it into the XML. Then, of course, to get the binary file back out you would need to decode it. One fairly common and well-documented way to encode binary data into text is the Base64 encoding, which is described here:
http://www.faqs.org/rfcs/rfc2045.html

Can Linux recognize the escape character?

Hi,
It's possible that this problem doesn't belong here. But please give it a try.
I am developing an project using JSP. It includes image uploading. after image uploading, I use a javascript function popUp(url, ...) to open a new window and display this image. The very strange thing is that, sometimes the link can work, open a new window and display the image while sometimes the link can't work, or it only works for one time, then it fails. I couldn't find the reason yet. it works on IE, and not on Netscape and also not work on Linux OS while do work on Window2000. Can somebody take a look at the following link and tell me how to change it to make sure it is work on Linux OS and netscape. What's the difference between OS and windows to specify a String(that is, url of a link). The linux OS seems interperte \" to ", so it cannot recognize the full url.
The link is :
imageLink[i] = "<a href='showForm' onclick=\"popUp('" + request.getContextPath() + "/displayForm.jsp?filename=" + sdb.getImageFileName(i)+"&fileDesc=" +sdb.getImageDesc(i) + "', 'showForm', '600', '450', 'yes'); return false;\">"+sdb.getImageDesc(i)+"</a>";
Thanks in advance!
jmling

Linux will recognize the escape character. It looks like you might have other difficulties with your imageLink tag. For example, I think you need to use tags when you use Java inside your html or javascript..
onclick=\"popUp(" + <%= request.getContextPath() %> + "/displayform.jsp?...

Illegal escape character

Hi i am writing a servlet which has html in it too...so the commands for html pages i just use out.println("").
But i wanted to add a new picture on the page and had this command
out.println("<IMG SRC=C:\Documents and Settings\bsharma\My Documents\My Pictures\index1.gif>");
but i get a compiling error saying 'illegal escape character'
I know it is because of the \ ..but is there a way around it?
-bhaarat

Try:
out.println("<IMG SRC=C:\\Documents and Settings\\bsharma\\My Documents\\My Pictures\\index1.gif>");

Mapping error: Character reference "&# 00" is an invalid XML character

Hi All,
Iam performing the RFC(R/3) -> PI(7.1) -> SOAP (third party software) ; Synchronous scenario.
The messages are reaching the PI server , but the a mapping errors is occurring due to dummy characters ""& #00" been sent to the XI system.
Is this due to the R/3 sending the invalid characters or these been generated in PI system. Would you suggest any notes,patches to resolve the issue?
"MAPPING">EXCEPTION_DURING_EXECUTE
com.sap.aii.utilxi.misc.api.BaseRuntimeException:
Character reference "& # 00" is an invalid XML character
Many thanks!
guru

Hi,
If you go through this link last page and last para, which says..
"The only solution is to use a Java mapping before the actual mapping to perform the escaping."
https://www.sdn.sap.com/irj/scn/go/portal/prtroot/docs/library/uuid/502991a2-45d9-2910-d99f-8aba5d79fb42
Regards,
Sarvesh

Regular Expression Escaped Digit "\d" Illegal Escape Character

Hello,
I'm trying to write a regular expression to determine if a String matches a date format that is defined as YYYYMMDD. For example, March 11, 2009 would be "20090311"
For the time being I don't care if an invalid month or day is entered. I've attempted both of the following
if (date.matches("(19|20)\d{4}")) {
// warn the user
}and
if (java.util.regex.Pattern.matches("(19|20)\d{4}"), date)) {
// warn the user
}And both yield Illegal Escape Character compilation errors, for the "\d" part of the regular expression.
http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html#sum
Says that "\d" is the predefined digit character class. So at this point, I don't know what I'm doing wrong. I realize I could just define the character class myself, and use a pattern like "(19|20)[0-9]{4}", but I would like to know why "\d" isn't being recognized by the compiler.
Thanks,
Paul

paulwooten wrote:
Can someone give me an explanation of heuristics, as they might apply to SimpleDateFormat? Does this mean that if the format was similar the parser might figure it out? Say, if instead of "yyyyMMdd", it was "yyyyddMM", or "yyMMdd"?No. Since all of these are valid formats, there's no way for the parser to distinguish this.
Or does this have to do with rejecting February 29, and other dates like that.That's the one. When setLenient(false) is called, then the 29th February is only accepted in leap years.
It will also reject the 57th January when lenient is set to false (try parsing that with lenient=true, you'll be surprised).
I've read some of the wikipedia article about heuristics, but I'm confused as to how it would apply to this example.Don't concentrate to much on the term heuristics. Just remember: lenient=true means that not-really-correct dates will be accepted, lenient=false means more strict checks.

Disable html escaping mode - Basic

4.2.1
Hi,
Our application runs within our network and hence no security worries as such. It's also not a critical application.
In 4.2.1, there is the html escaping mode with only two options basic and extended. Even with Basic it escapes & < and >. IS there a way to disable that? We have parameter passing where some items have & in their names. and they seem to be getting skipped.
I pass parameters when user clicks on a link in a report(using standard link features) which creates the URL. But looks like the names which have & have those removed.
Thanks,
Ryan

ryansun wrote:
4.2.1
Our application runs within our network and hence no security worries as such. It's also not a critical application.
In 4.2.1, there is the html escaping mode with only two options basic and extended. Even with Basic it escapes & < and >. IS there a way to disable that?No. It's required by the HTML specification.
We have parameter passing where some items have & in their names. and they seem to be getting skipped.
I pass parameters when user clicks on a link in a report(using standard link features) which creates the URL. But looks like the names which have & have those removed.As '&' is a URL-reserved character it must be encoded in order to be passed in a URL parameter, for example using the <tt>apex_util.url_encode</tt> API.
As has been recommended before, the simple way to avoid problems in this area is not to pass string data values as URL parameters. Pass simple numeric or alphanumeric key values, and use these to retrieve additional information using computations and processes on the target page.

Query of Queries (QofQ) Escaped Character Problem

Hello All,
I'm trying to run a query or queires (QofQ) and I'm doing a
LIKE comparison that looks for bracket characters ([ ]) within a
string, however ColdFusion is ignoring the brackets. How can I
escape the bracket character? So far I have only been able to
escape the percent sign based on the ColdFusion Live Docs. The
error message I get when I run the query below is:
Invalid Escape Sequence. Valid sequence pairs for this escape
character are: "\%", or "\_".
Here is the query:
<cfquery dbtype="query" name="getLogs">
SELECT *
FROM GetLogs
WHERE Description LIKE '%\[User:#UserID#\]%' ESCAPE '\'
</cfquery>
Thanks for your help!

You are correct. If you leave the brackets in the LIKE
statement, it will return results as if the brackets weren't there
at all.
Perhaps I need to figure out the ASCII character value of the
bracket and include it that way i.e. #Char(?)# where the question
mark would be the numerical value of that character.
My temporary solution has been to leave off the starting
bracket:
<cfquery dbtype="query" name="getLogs">
SELECT *
FROM GetLogs
WHERE Description LIKE '%user:#UserID#]%'
</cfquery>
This has (so far) returned the results i'm looking for
although its not as 100% accurate without that beginning [ in the
LIKE statement.

Urgent Help - in using Escape character

hai,
i have problem in using escape character..
can anyone help me out in the same...
sb.append(<jsp:getProperty name="resume_main" property="name"/>);
//error i am getting is -- Missing term, ')' expected.
pl help me out in using the escape character in the above statement.
thanx in advance
regards
koel

try
sb.append("<jsp:getProperty name='resume_main' property='name'/>");
or
sb.append("<jsp:getProperty name=\"resume_main\" property=\"name\"/>");
both will work

PDF/A Conversion Err :CIDset in subset font is incomplete & Character references .notdef glyphs

I was trying to convert some PDF documents to PDF/A-3b. Using Acrobat XI Pro on a Windows Vista PC, the conversion proceeded fine, but I noticed the size of the converted file increased from 1.91 Mb to 16.8 Mb (See spage usage audits below). I expected some increase in size due to the font embedding and such, but did not expect such a big jump. On Auditing the space usage, I noticed that the converted file had no embedded fonts, but the vast majority of space was being consumed by images. Turns out the "Convert to PDF/A-3b" profile in Preflight settings were set to convert all pages to images, if the regular conversion failed. I edited the profile to not do so, but fail on an error. This time when I passed the document through a "Convert to PDF/A-3b" pre-flight, it failed with the following errors.
CIDset in subset font is incomplete (font contains glyphs that are not listed). It appears to be referring to the Arial and TimesNewRoman fonts.
Character references .notdef glyph
How do I fix these errors? For the CIDset error, I noticed some folks in the InDesign forums mentioning you can locate the missing glyphs and then replace the fonts with other fonts in which these glyphs are present. I am wondering if this is the issue here and something similar can be done in Acrobat. As for the .notdef glyphs, I couldn't find anything. Any help would be much appreciated.
Thanks,
Ron
Note:
If I edit the conversion preflight profile to allow replacing the pages with images on regular conversion errors, the conversion goes through fine, but as shown below, there is a huge jump in size, which I would like to avoid. I have about 2GB worth of documents, and the conversion ends up using over 8 GB of space.
Regular PDF
Archival Ready PDF (PDF/A-3b)

What software are you using to do OCR?
Is there a way you can adjust what font is being used by the OCR software or use a different software for OCR? This is not a standard OpenType font or one the can be mapped to Unicode.
I'm not sure exactly how OCR software places fonts in PDFs. Does it show up in the PDF Properties under Fonts? If so, what is the name of the Font.
When you examine the Tagged text in the Tags Panel, if you open up a tag and look at the content does it make sense or is it nonsense (gobbledygook)?
While OCR text is not visible to the end user directly, it can be selected using the text tool and it is recognized by Acrobat for tagging purposes. The Assistive Technology, e.g., screen reader. will be reading from this text. So if it is not understandable you do not have an accessible file.
You would probably obtain better results using the Acrobat OCR feature, preferably on a Windows machine. It's been a while since I've exchanged files between Mac and Windows, and I wouldn't trust that the encodings would be the same without testing it.

Bypass Adapter URI Endpoint with Escape Character for Web Service

Dear All,
I would like to apply by pass adapter URI Endpoint for XI webservice, the default format is
http://<host>:<port>/sap/xi/engine?type=entry&version=3.0&Sender.Service=<BusinessService>&Interface=<namespace>^<Outbound Interface name>
If I am using format using with carat () character then there has no problem to the service, but consumer doesn't support carat () character. I instead the carat (^) with URL Escape Character (%5E)
http://<host>:<port>/sap/xi/engine?type=entry&version=3.0&Sender.Service=<BusinessService>&Interface=<namespace>%5E<Outbound Interface name>
Then error occurred
<SOAP:Envelope xmlns:SOAP="http://schemas.xmlsoap.org/soap/envelope/">
   <SOAP:Body>
      <SOAP:Fault>
         <faultcode>SOAP:Server</faultcode>
         <faultstring>System Error</faultstring>
         <detail>
            <s:SystemError xmlns:s="http://sap.com/xi/WebService/xi2.0">
               <context/>
               <code>RCVR_DETERMINATION.MESSAGE_INCOMPLETE</code>
               <text>Message is incomplete. No Sender found</text>
            </s:SystemError>
         </detail>
      </SOAP:Fault>
   </SOAP:Body>
</SOAP:Envelope>
How to resolve this error...
Thank you.
Regards,
Weng

Hi ,
as per my knowledge.....................
When you create a WSDL with the help of a wizard. In the Integration Directory, choose Tools -> Define Web Service to enter the wizard.
On Propose URL button, as this genertated URL default Point to Entegration Engine.. SO already proformance wise Good.
If You want to Point your URL to adapter engine , use below given URL this will point ur incoming soap message to SOAPadapter sender channel
http://<host>:<j2ee-port>/XISOAPAdapter/MessageServlet?channel=:<service>:<channel>.
Regards
Prabhat Sharma.

Converting HTML Escaping to Unicode Escaping characters in Java

Hi,
I am getting some HTML escaping for special characters like pound, space, dollar etc. from database in HTML escaping format as ' £ ® etc.which I want to convert their Unicode equivalent escaping as U00A3,U0026. Java only convert & to & (U0026) but rest of the characters are not getting converted. If there is any API or way to do this please reply.
Note : I cant change Database as there are already thousands of records & My front end only needs Java to do all these conversions I cant change that also.

I have posted a method that does what you want. It was a long time ago since I wrote it and you should probably use a StringBuilder instead of a StringBuffer if you are going to use it in Java 5 or later. You can find the method in this thread:
http://forum.java.sun.com/thread.jspa?threadID=652630

Displaying unicode or HTML escaped characters from HTTPService in Flex components.

Here is a solution on the Flex Cookbook I developed for
displaying data in Flex components when the data comes back from
HTTPService as unicode of HTML escaped data:
Displaying
unicode or HTML escaped characters from HTTPService in Flex
components.

Hi again Greg,
I have just been adapting your idea for encountering
occasional escaped characters within a body of "normal" text, eg
something like
hellô sun&scaron;ine
Now, the handy String.fromCharCode(charCode) call works a
dream if instead of the above I have
hellô sunšine
Do you know if there is an equivalent call that takes the
named entities rather than the numeric ones? Clearly I can just do
some text substitution to get the mapping, but this means rather
more by-hand work than I had hoped. However, this is definitely a
step in a useful direction for me.
Thanks,
Richard
PS hoping that the web page won't simply outguess me and
replace all the above! Basically, the first line uses named
entities and the second the equivalent numbers...

How to use escape character in update statement.

Hi All,
I'm trying to update table using following sql update statement, but everytime it's asking me for the input due to the '&' value in below sql.
UPDATE xyz_xyz
   SET NAME = 'ABC & PQR'
WHERE ID = (SELECT ID
               FROM abc_abc
              WHERE NAME = 'C & PQR');Please let me know how to use escape character syntax or let me know if there is any alternative solution.
Thanks,
Vishwas

Hi,
By default, & marks a substitution variable name.
If you're not using substitution variables in that statement (or, if this is in PL/SQL, in that entire package or procedure) then the easiest thing to do is just diable substitution variables; then & will be a normal character:
SELECT DEFINE OFF
UPDATE xyz_xyz
   SET NAME = 'ABC & PQR'
WHERE ID = (SELECT ID
               FROM abc_abc
              WHERE NAME = 'C & PQR');
SET DEFINE ONIf you can't do that, then & is always taken literally if it comes right before a single-quote, so you could say:
UPDATE xyz_xyz
   SET NAME = 'ABC &' || ' PQR'
WHERE ID = (SELECT ID
               FROM abc_abc
              WHERE NAME = 'C &' || ' PQR');There is a SQL*Plus "SET ESCAPE" command, too, but if you use it, you have to worry about whether the escape character is to be taken literally or not.
SET   ESCAPE \Yet another alternative is to make some other character, such as ~, mark the substitution variables:
SET DEFINE ~Read all about them in the SQL*Plus manual.
http://download.oracle.com/docs/cd/B28359_01/server.111/b31189/ch2.htm#sthref103

Decode HTML escaped character references

Similar Messages

Maybe you are looking for