Converting HTML Escaping to Unicode Escaping characters in Java

Hi,
I am getting some HTML escaping for special characters like pound, space, dollar etc. from database in HTML escaping format as ' £ ® etc.which I want to convert their Unicode equivalent escaping as U00A3,U0026. Java only convert & to & (U0026) but rest of the characters are not getting converted. If there is any API or way to do this please reply.
Note : I cant change Database as there are already thousands of records & My front end only needs Java to do all these conversions I cant change that also.

I have posted a method that does what you want. It was a long time ago since I wrote it and you should probably use a StringBuilder instead of a StringBuffer if you are going to use it in Java 5 or later. You can find the method in this thread:
http://forum.java.sun.com/thread.jspa?threadID=652630

Similar Messages

[iPhone] Any built in way to convert HTML entities to Unicode?

I have a string with contents something like:
"© 2008"
Is there some method that I can't seem to find that will convert this to:
"© 2008"
Basically is there something built in to convert all of the '&xxx;' HTML entities to their Unicode counterpart? I can write my own code to do it but I want to check here first.
Thanks.

I had the same problem and did only find a semi built in solution using NSXMLParser
@interface MREntitiesConverter : NSObject {
NSMutableString* resultString;
@property (nonatomic, retain) NSMutableString* resultString;
- (NSString)convertEntiesInString:(NSString)s;
@end
@implementation MREntitiesConverter
@synthesize resultString;
- (id)init
if([super init]) {
resultString = [[NSMutableString alloc] init];
return self;
- (void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)s {
[self.resultString appendString:s];
- (NSString)convertEntiesInString:(NSString)s {
if(s == nil) {
NSLog(@"ERROR : Parameter string is nil");
NSString* xmlStr = [NSString stringWithFormat:@"<d>%@</d>", s];
NSData *data = [xmlStr dataUsingEncoding:NSUTF8StringEncoding allowLossyConversion:YES];
NSXMLParser* xmlParse = [[NSXMLParser alloc] initWithData:data];
[xmlParse setDelegate:self];
[xmlParse parse];
NSString* returnStr = [[NSString alloc] initWithFormat:@"%@",resultString];
return returnStr;
- (void)dealloc {
[resultString release];
[super dealloc];
@end
In Cocoa (Core Foundation) there is
NSString* sI = (NSString*)CFXMLCreateStringByUnescapingEntities(NULL, (CFStringRef)s, NULL);
but that does not (yet?) exist on the IPhone (2.01)

How to get the unicode escapes for characters outside a characterset

Hi!
I'm tryiing to edit into a RTF file and have been fairly successful so far. But living outside the U.S i need some characters outside ASCII. Those characters are supposed to be escaped as unicode-escapes, eg \u45. But I can't find a way to get the escapesequense for the unicode-characters that lives outside ASCII.
I'm guessing that this is a very simple thing to do but I have not been lucky with google so far.
So, how do I get the unicode escapes for characters outside a characterset?
Thanks in advance
Roland Carlsson

I'm tryiing to edit into a RTF file and have been
fairly successful so far. But living outside the U.S
i need some characters outside ASCII. Those
characters are supposed to be escaped as
unicode-escapes, eg \u45. But I can't find a way to
get the escapesequense for the unicode-characters
that lives outside ASCII.You are asking about RTF and not java correct?
As a guess....
Unicode is 32 bit (presumably you are not using the newest one.) Thus it requires a 32 bit representation. Thus \u45 actually is the same as \u0045. Thus something like \u1e45 would probably work.

Chinese characters to Unicode Escape

I'd like to implement a function that convert Chinese string into Unicode escape codes. Just like what the native2ascii doing.
I can convert single bytes with charToHex but have no clue on dealing double byte character. Any hints?

I think unicode escapes can be obtained through the tool Native2Ascii from a file. However, if you would a code, the following might be an example.
public class UnicodeTool{
static String byteToHex(byte b) {
      // Returns hex String representation of byte b
      char hexDigit[] = {'0', '1', '2', '3', '4', '5', '6', '7','8', '9', 'a', 'b', 'c', 'd', 'e', 'f'};
      char[] array = { hexDigit[(b >> 4) & 0x0f], hexDigit[b & 0x0f] };
      return new String(array);
   }   // end of method byteToHex
    static String charToHex(char c) {
      // Returns hex String representation of char c
      byte hi = (byte) (c >>> 8);
      byte lo = (byte) (c & 0xff);
      return byteToHex(hi) + byteToHex(lo);
   }   // end of method charToHex
static String toUnicodeFormat(char c){
           // int n = (int)c;
           //String body = Integer.toHexString(n);
           String body=charToHex(c);
            String zeros = "000";
             return ("\\u" + zeros.substring(0, 4-body.length()) + body);
    } //end of method toJavaUnicodeFormat
/*   public static void main(String[] args){
    String str = "09Az";//example of a string
    char[] chs = str.toCharArray();
    for(int j=0;j<chs.length;j++)
     System.out.println(toUnicodeFormat(chs[j]));
}

Problem with converting html to pdf using LiveCycle ES Java API

I am using this code to convert html to pdf.
* 1. adobe-generatepdf-client.jar
* 2. adobe-livecycle-client.jar
* 3. adobe-usermanager-client.jar
* 4. adobe-utilities.jar
* 5. wlclient.jar
import java.io.File;
import java.util.Properties;
import com.adobe.idp.Document;
import com.adobe.idp.dsc.clientsdk.ServiceClientFactory;
import com.adobe.idp.dsc.clientsdk.ServiceClientFactoryProperties;
import com.adobe.livecycle.generatepdf.client.GeneratePdfServiceClient;
import com.adobe.livecycle.generatepdf.client.HtmlToPdfResult;
public class ConvertHTML {
   public static void main(String[] args)
        try{
        //Set connection properties required to invoke LiveCycle ES
        Properties connectionProps = new Properties();
        connectionProps.setProperty(ServiceClientFactoryProperties.DSC_DEFAULT_EJB_ENDPOINT, "t3://localhost:7001");
        connectionProps.setProperty(ServiceClientFactoryProperties.DSC_TRANSPORT_PROTOCOL,Service ClientFactoryProperties.DSC_EJB_PROTOCOL);
        connectionProps.setProperty(ServiceClientFactoryProperties.DSC_SERVER_TYPE, "WebLogic");
        connectionProps.setProperty(ServiceClientFactoryProperties.DSC_CREDENTIAL_USERNAME, "administrator");
        connectionProps.setProperty(ServiceClientFactoryProperties.DSC_CREDENTIAL_PASSWORD, "password");
        //Create a ServiceClientFactory instance
        ServiceClientFactory factory = ServiceClientFactory.createInstance(connectionProps);
          //Create a GeneratePdfServiceClient object
        GeneratePdfServiceClient pdfGenClient = new GeneratePdfServiceClient(factory);
       //Get an HTML document to convert to a PDF document a
        String inputFileName = "http://www.adobe.com";
        //String inputFileName = "C:\\Documents and Settings\\venkat\\Desktop\\Adobe.htm";
        String securitySettings = "No Security";
        String fileTypeSettings = "Standard";
System.out.println("one");
        //Convert HTML content to a PDF document
        HtmlToPdfResult result = pdfGenClient.htmlToPDF2(inputFileName, fileTypeSettings, securitySettings, null, null);
System.out.println("two");
        //Get the newly created document
        Document createdDocument = result.getCreatedDocument();
        //Save the PDF document as a PDF file
        createdDocument.copyToFile(new File("C:\\test.pdf"));
    catch (Exception e) {
        System.out.println("Error OCCURRED: " + e.getMessage());
        e.printStackTrace();
I can able to compile this class but while running i am getting error like below.
Error OCCURRED: Internal error.
ALC-DSC-000-000: com.adobe.idp.dsc.DSCRuntimeException: Internal error.
        at com.adobe.idp.dsc.provider.impl.ejb.EjbMessageDispatcher.doSend(EjbMessageDispatcher.java
:160)
        at com.adobe.idp.dsc.provider.impl.base.AbstractMessageDispatcher.send(AbstractMessageDispat
cher.java:57)
        at com.adobe.idp.dsc.clientsdk.ServiceClient.invoke(ServiceClient.java:208)
        at com.adobe.livecycle.generatepdf.client.GeneratePdfServiceClient.htmlToPDF2(GeneratePdfSer
viceClient.java:666)
        at ConvertHTML.main(ConvertHTML.java:84)
Caused by: java.rmi.RemoteException: Remote EJBObject lookup failed for 'ejb/Invocation'; nested exc
eption is:
        org.omg.CORBA.COMM_FAILURE:   vmcid: SUN minor code: 203 completed: No
        at com.adobe.idp.dsc.provider.impl.ejb.EjbMessageDispatcher.initialise(EjbMessageDispatcher.
java:101)
        at com.adobe.idp.dsc.provider.impl.ejb.EjbMessageDispatcher.doSend(EjbMessageDispatcher.java
:130)
        ... 4 more
Caused by: org.omg.CORBA.COMM_FAILURE:   vmcid: SUN minor code: 203 completed: No
        at com.sun.corba.se.impl.logging.ORBUtilSystemException.writeErrorSend(Unknown Source)
        at com.sun.corba.se.impl.logging.ORBUtilSystemException.writeErrorSend(Unknown Source)
        at com.sun.corba.se.impl.transport.SocketOrChannelConnectionImpl.writeLock(Unknown Source)
        at com.sun.corba.se.impl.encoding.BufferManagerWriteStream.sendFragment(Unknown Source)
        at com.sun.corba.se.impl.encoding.BufferManagerWriteStream.sendMessage(Unknown Source)
        at com.sun.corba.se.impl.encoding.CDROutputObject.finishSendingMessage(Unknown Source)
        at com.sun.corba.se.impl.protocol.CorbaMessageMediatorImpl.finishSendingRequest(Unknown Sour
ce)
        at com.sun.corba.se.impl.protocol.CorbaClientRequestDispatcherImpl.marshalingComplete1(Unkno
wn Source)
        at com.sun.corba.se.impl.protocol.CorbaClientRequestDispatcherImpl.marshalingComplete(Unknow
n Source)
        at com.sun.corba.se.impl.protocol.CorbaClientDelegateImpl.invoke(Unknown Source)
        at com.sun.corba.se.impl.protocol.CorbaClientDelegateImpl.is_a(Unknown Source)
        at org.omg.CORBA.portable.ObjectImpl._is_a(Unknown Source)
        at weblogic.corba.j2ee.naming.Utils.narrowContext(Utils.java:126)
        at weblogic.corba.j2ee.naming.InitialContextFactoryImpl.getInitialContext(InitialContextFact
oryImpl.java:94)
        at weblogic.corba.j2ee.naming.InitialContextFactoryImpl.getInitialContext(InitialContextFact
oryImpl.java:31)
        at weblogic.jndi.WLInitialContextFactory.getInitialContext(WLInitialContextFactory.java:41)
        at javax.naming.spi.NamingManager.getInitialContext(Unknown Source)
        at javax.naming.InitialContext.getDefaultInitCtx(Unknown Source)
        at javax.naming.InitialContext.init(Unknown Source)
        at javax.naming.InitialContext.<init>(Unknown Source)
        at com.adobe.idp.dsc.provider.impl.ejb.EjbMessageDispatcher.initJndiContext(EjbMessageDispat
cher.java:213)
        at com.adobe.idp.dsc.provider.impl.ejb.EjbMessageDispatcher.getJndiContext(EjbMessageDispatc
her.java:226)
        at com.adobe.idp.dsc.provider.impl.ejb.EjbMessageDispatcher.initialise(EjbMessageDispatcher.
java:87)
        ... 5 more
can u plz give me some way to do the convertion.

Yes Sir.....Thanks for ur suggestion.....
But i didn't find exact solution..well..yes i found some but not exactly there were not in the way i required...I jus need to convert HTML to PDF using iText API for java.....I already used some classes in that like HTMLParser.....etc..
So Any thing else...Any one...Sure can help me in this................

Convert Hexadecimal NCRs to unicode characters

I have Hexadecimal NCRs in comments while importing comments from the 3b2 application. How can this be converted to appropriate characters using acrobat javascript.
For example: The comment contains (&#x000D) which should be converted to the appropriate unicode character.

I may be wrong, but I think you would use String.fromCharCode to convert a UCS-2 code into a string. So you then have to parse your string to process any escapes in it, and call that method.

Convert HTML special characters to String

Hi,
I'm looking for an easy way to convert HTML special characters (like "&#246") to String.
Any Ideas how implement this? Thanks

Hi,
I'm looking for an easy way to convert HTML special
characters (like "�") to String.
Any Ideas how implement this? Thankswell im assuming that you mean that when you are working with java you will be getting a ? instead of the character you want. assuming that you are reading and writing to this html file. if not well maybe this can stil help : BufferedReader inFile = new BufferedReader(new InputStreamReader(new FileInputStream(fileName),"ISO8859-1"));
PrintWriter outFile = new PrintWriter(new OutputStreamWriter(new FileOutputStream(fileName2), "ISO8859-1")); basically you need to make sure that ou define ISO8859-1 as the encoding.

Parsing unicode escape codes

Hi,
I'm looking for a way to covert a string, wich is read from a file, containing unicode escape codes.
In short this means the file contains a string e.g. "Some text\nOn a new line" which i want to get into a String object as if it was the result of
String s = new String("Some text\nOn a new line"); I've been looking in the java docs but didn't find a function to do that conversion (though the compiler has to do it all the time...).
Any ideas?

Thats not what i'm looking for.
What i've got is a file that look like this:
1="Somestring"
2="another message\nAnd some more text"
3="text with\tTabs\n\tTo get some layout"
etc...
It's used as a stringtable for a program that has to be aviable in multiple languages. There versions of the file in different languages.
What i want is to be able to get e.g. string number 2 out of it in such way that System.out.println(string2); will give the following result:
another message
And some more text
instead of:
"another message\nAnd some more text"

How can I get unicode escape for £ in java

Is there any API which can translate the symbol £ to its corresponding unicode escape?

SurfManNL wrote:
Found one as well, but in the currency secion: 20A4
http://www.unicode.org/charts/PDF/U20A0.pdf
That's for Italian Lira, which has been replaced by the Euro some years ago ;)

Unicode escapes

When I get the input of unicode escapes,
I can work with them as a String, but as soon as I add \u0022 (that is a double quote) to the input,
then it won't take it anymore, and it says ";" expected.
Now I am wondering, what is the way around this problem?
Any advice please?
public class Unicode
     String uniSt = "\u0074\u0065\u0073\u0074"; // can't add \u0022 to this string
     // As soon as I add this double quote \u0022, it won't work and would say ";" expected";
     public Unicode()
          System.out.println(uniSt);
     public static void main (String[] args)
          new Unicode();
{code}

The unicode value is a double quote and acts as one. If the string has an ending double quote character also, that's an error.
Use this
String s = "\u0074\u0065\u0073\u0074\u0022;{code}

How to convert html to pdf using acrobat sdk 8.0?

hi
I am a beginner of acrobat sdk .
I want to know How to use acrobat sdk 8.0 to convert html to pdf?
herere some questions :
1:How to support navigation inside PDF file that generated using acrobat sdk 8.0? For example: theres catalog in the top of HTML file, customer hopes can navigate inside the PDF file just like navigating inside the HTML file.
2:How to support operating some controls in the PDF file that generated using acrobat sdk 8.0? For example: therere some drop down list and text box in HTML file, customer hopes can input text in the text box, click the drop down list to see available options in it just like in HTML file.
Thanks in advance for any help and suggestion.

Hello,
I want a system to re-brand my 37 pages PDF for affiliates.
I want a php dynamic link in the PDF online in order to personalize automatically the PDF for each affiliate. I need to change 2 links each time. The affiliate ID and the Paypal email (payment button) in page 36.
Can you help?
Please let me know
Thank you
Alex
PS My system is online and i can give you the url if it helps.

A tool can convert HTML to Excel

Hi All , Are you using report 6i and want to out put report in excel format? If you are , a free software which can convert HTML to Excel is available .
The software is designed to print very large report , Now a wonderful function is added to software , Thru which you can convert HTML to Excel easily . But the function is still basal , It will do better in the future .
For more information, Please visit
http://repbrowser.freewebpage.org/
Thank you ,
Regards

Hi,
the only other ways (as I know), if you really want to convert is
a) write a parser to convert html into csv(xls)
b) use a html2csv script on the os level
like:
http://sebsauvage.net/python/html2csv.py (or just google html2csv)
c) use excel (data source web; local file: "file:///C:/test.htm"
Kind Regards,
Dirk

How to convert a HTML files into a text file using Java

Hi guys...!
I was wondering if there is a way to convert a HTML file into a text file using java programing language. Likewise I would also like to know if there is a way to convert any type of file (excel, power point, and word) into text using java.
By the way, I really appreciated the help that you guys gave me on my previous topic on how to extract tests from a pdf file.
Thank you....

HTML files are already text files. What do you mean you want to convert them?
I think if you search the web, you can find things for converting those MS Office files to text (or extracting text from them, as I assume you mean).

Convert html to word document

convert html to word document ,
I tried poi-3.0.2-FINAL,Apache POI - HWPF - Java API to Handle Microsoft Word Files
it is not working...

My actual goal is convert html file into word document,
i posted into forum, some people are suggested HWPF just look,
I tried one by one program i not getting any answer for example one program,
HWPFDocument     doc = new HWPFDocument (new FileInputStream ("c:\\temp.doc"));
               Range r = doc.getRange();
          System.out.println("Example you supplied:");
          System.out.println("---------------------");
          for (int x = 0; x < r.numSections(); x++)
          Section s = r.getSection(x);
          for (int y = 0; y < s.numParagraphs(); y++)
          Paragraph p = s.getParagraph(y);
          for (int z = 0; z < p.numCharacterRuns(); z++)
          //character run
          CharacterRun run = p.getCharacterRun(z);
          //character run text
          String text = run.text();
          // show us the text
          System.out.print(text);
          // use a new line at the paragraph break
          System.out.println();
          }catch(NullPointerException exception){
               exception.printStackTrace();
          } catch (FileNotFoundException e) {
               // TODO Auto-generated catch block
               e.printStackTrace();
          } catch (IOException e) {
               // TODO Auto-generated catch block
               e.printStackTrace();
java.io.IOException: Invalid header signature; read 5789751444030890300, expected -2226271756974174256

SOAP error "Unicode supplemental characters encountered in parameter"..!!

Hi All,
Scenario: Webservice (SOAP)--> XI --> SAP
We are facing some problem with chinese characters while sending the soap call.
The webservice receiving error in the Soap fault as
" FAILED TO INVOKE WEB SERVICE OPERATION OS_Sales Could not call Web service operation OS_Sales. Unicode supplemental characters encountered in parameter 1 (11839)".
Is there any where I can get rid of this?
Thanks
Deepthi

Hi,
try this - find the character encoding of your chinese characters.......for this take the data and view it in internet explorer and then from menu view - encoding see the encoding used to view it properly....or you can ask your SOAP application guys the character encoding of your chinese characters......
Then in sender SOAP comm channel, in module tab, add the following in module configuration section:
Module Key Parameter.name Parameter.value
soap XMBWS.XMLEncoding <your_encoding>
where <your_encoding> is the encoding scheme of your chinese characters something like iso-<some_number>
Then rerun your scenario.
Regards,
Rajeev Gupta

Converting HTML Escaping to Unicode Escaping characters in Java

Similar Messages

Maybe you are looking for