Converting HTML Escaping to Unicode Escaping characters in Java

Hi,
I am getting some HTML escaping for special characters like pound, space, dollar etc. from database in HTML escaping format as  ' £      ® etc.which I want to convert their Unicode equivalent escaping as U00A3,U0026. Java only convert & to & (U0026) but rest of the characters are not getting converted. If there is any API or way to do this please reply.
Note : I cant change Database as there are already thousands of records & My front end only needs Java to do all these conversions I cant change that also.

I have posted a method that does what you want. It was a long time ago since I wrote it and you should probably use a StringBuilder instead of a StringBuffer if you are going to use it in Java 5 or later. You can find the method in this thread:
http://forum.java.sun.com/thread.jspa?threadID=652630

Similar Messages

  • [iPhone] Any built in way to convert HTML entities to Unicode?

    I have a string with contents something like:
    "© 2008"
    Is there some method that I can't seem to find that will convert this to:
    "© 2008"
    Basically is there something built in to convert all of the '&xxx;' HTML entities to their Unicode counterpart? I can write my own code to do it but I want to check here first.
    Thanks.

    I had the same problem and did only find a semi built in solution using NSXMLParser
    @interface MREntitiesConverter : NSObject {
    NSMutableString* resultString;
    @property (nonatomic, retain) NSMutableString* resultString;
    - (NSString)convertEntiesInString:(NSString)s;
    @end
    @implementation MREntitiesConverter
    @synthesize resultString;
    - (id)init
    if([super init]) {
    resultString = [[NSMutableString alloc] init];
    return self;
    - (void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)s {
    [self.resultString appendString:s];
    - (NSString)convertEntiesInString:(NSString)s {
    if(s == nil) {
    NSLog(@"ERROR : Parameter string is nil");
    NSString* xmlStr = [NSString stringWithFormat:@"<d>%@</d>", s];
    NSData *data = [xmlStr dataUsingEncoding:NSUTF8StringEncoding allowLossyConversion:YES];
    NSXMLParser* xmlParse = [[NSXMLParser alloc] initWithData:data];
    [xmlParse setDelegate:self];
    [xmlParse parse];
    NSString* returnStr = [[NSString alloc] initWithFormat:@"%@",resultString];
    return returnStr;
    - (void)dealloc {
    [resultString release];
    [super dealloc];
    @end
    In Cocoa (Core Foundation) there is
    NSString* sI = (NSString*)CFXMLCreateStringByUnescapingEntities(NULL, (CFStringRef)s, NULL);
    but that does not (yet?) exist on the IPhone (2.01)

  • How to get the unicode escapes for characters outside a characterset

    Hi!
    I'm tryiing to edit into a RTF file and have been fairly successful so far. But living outside the U.S i need some characters outside ASCII. Those characters are supposed to be escaped as unicode-escapes, eg \u45. But I can't find a way to get the escapesequense for the unicode-characters that lives outside ASCII.
    I'm guessing that this is a very simple thing to do but I have not been lucky with google so far.
    So, how do I get the unicode escapes for characters outside a characterset?
    Thanks in advance
    Roland Carlsson

    I'm tryiing to edit into a RTF file and have been
    fairly successful so far. But living outside the U.S
    i need some characters outside ASCII. Those
    characters are supposed to be escaped as
    unicode-escapes, eg \u45. But I can't find a way to
    get the escapesequense for the unicode-characters
    that lives outside ASCII.You are asking about RTF and not java correct?
    As a guess....
    Unicode is 32 bit (presumably you are not using the newest one.) Thus it requires a 32 bit representation. Thus \u45 actually is the same as \u0045. Thus something like \u1e45 would probably work.

  • Chinese characters to Unicode Escape

    I'd like to implement a function that convert Chinese string into Unicode escape codes. Just like what the native2ascii doing.
    I can convert single bytes with charToHex but have no clue on dealing double byte character. Any hints?

    I think unicode escapes can be obtained through the tool Native2Ascii from a file. However, if you would a code, the following might be an example.
    public class UnicodeTool{
    static String byteToHex(byte b) {
          // Returns hex String representation of byte b
          char hexDigit[] = {'0', '1', '2', '3', '4', '5', '6', '7','8', '9', 'a', 'b', 'c', 'd', 'e', 'f'};
          char[] array = { hexDigit[(b >> 4) & 0x0f], hexDigit[b & 0x0f] };
          return new String(array);
       }   // end of method byteToHex
        static String charToHex(char c) {
          // Returns hex String representation of char c
          byte hi = (byte) (c >>> 8);
          byte lo = (byte) (c & 0xff);
          return byteToHex(hi) + byteToHex(lo);
       }   // end of method charToHex
    static String toUnicodeFormat(char c){
               // int n = (int)c;
               //String body = Integer.toHexString(n);
               String body=charToHex(c); 
                String zeros = "000";
                 return ("\\u" + zeros.substring(0, 4-body.length()) + body);
        } //end of method toJavaUnicodeFormat
    /*   public static void main(String[] args){
        String str = "09Az";//example of a string
        char[] chs = str.toCharArray();
        for(int j=0;j<chs.length;j++)
         System.out.println(toUnicodeFormat(chs[j]));
    }

  • Problem with converting html to pdf using LiveCycle ES Java API

    I am using this code to convert html to pdf.
    * 1. adobe-generatepdf-client.jar
    * 2. adobe-livecycle-client.jar
    * 3. adobe-usermanager-client.jar
    * 4. adobe-utilities.jar
    * 5. wlclient.jar
    import java.io.File;
    import java.util.Properties;
    import com.adobe.idp.Document;
    import com.adobe.idp.dsc.clientsdk.ServiceClientFactory;
    import com.adobe.idp.dsc.clientsdk.ServiceClientFactoryProperties;
    import com.adobe.livecycle.generatepdf.client.GeneratePdfServiceClient;
    import com.adobe.livecycle.generatepdf.client.HtmlToPdfResult;
    public class ConvertHTML {
       public static void main(String[] args)
            try{
            //Set connection properties required to invoke LiveCycle ES                             
            Properties connectionProps = new Properties();
            connectionProps.setProperty(ServiceClientFactoryProperties.DSC_DEFAULT_EJB_ENDPOINT, "t3://localhost:7001");
            connectionProps.setProperty(ServiceClientFactoryProperties.DSC_TRANSPORT_PROTOCOL,Service ClientFactoryProperties.DSC_EJB_PROTOCOL);       
            connectionProps.setProperty(ServiceClientFactoryProperties.DSC_SERVER_TYPE, "WebLogic");
            connectionProps.setProperty(ServiceClientFactoryProperties.DSC_CREDENTIAL_USERNAME, "administrator");
            connectionProps.setProperty(ServiceClientFactoryProperties.DSC_CREDENTIAL_PASSWORD, "password");
            //Create a ServiceClientFactory instance
            ServiceClientFactory factory = ServiceClientFactory.createInstance(connectionProps);
              //Create a GeneratePdfServiceClient object
            GeneratePdfServiceClient pdfGenClient = new GeneratePdfServiceClient(factory);
           //Get an HTML document to convert to a PDF document a
            String inputFileName = "http://www.adobe.com";
            //String inputFileName = "C:\\Documents and Settings\\venkat\\Desktop\\Adobe.htm";
            String securitySettings = "No Security";
            String fileTypeSettings = "Standard";
    System.out.println("one");
            //Convert HTML content to a PDF document
            HtmlToPdfResult result = pdfGenClient.htmlToPDF2(inputFileName, fileTypeSettings, securitySettings, null, null);
    System.out.println("two");         
            //Get the newly created document
            Document createdDocument = result.getCreatedDocument();
            //Save the PDF document as a PDF file
            createdDocument.copyToFile(new File("C:\\test.pdf"));
        catch (Exception e) {
            System.out.println("Error OCCURRED: " + e.getMessage());
            e.printStackTrace();
    I can able to compile this class but while running i am getting error like below.
    Error OCCURRED: Internal error.
    ALC-DSC-000-000: com.adobe.idp.dsc.DSCRuntimeException: Internal error.
            at com.adobe.idp.dsc.provider.impl.ejb.EjbMessageDispatcher.doSend(EjbMessageDispatcher.java
    :160)
            at com.adobe.idp.dsc.provider.impl.base.AbstractMessageDispatcher.send(AbstractMessageDispat
    cher.java:57)
            at com.adobe.idp.dsc.clientsdk.ServiceClient.invoke(ServiceClient.java:208)
            at com.adobe.livecycle.generatepdf.client.GeneratePdfServiceClient.htmlToPDF2(GeneratePdfSer
    viceClient.java:666)
            at ConvertHTML.main(ConvertHTML.java:84)
    Caused by: java.rmi.RemoteException: Remote EJBObject lookup failed for 'ejb/Invocation'; nested exc
    eption is:
            org.omg.CORBA.COMM_FAILURE:   vmcid: SUN  minor code: 203  completed: No
            at com.adobe.idp.dsc.provider.impl.ejb.EjbMessageDispatcher.initialise(EjbMessageDispatcher.
    java:101)
            at com.adobe.idp.dsc.provider.impl.ejb.EjbMessageDispatcher.doSend(EjbMessageDispatcher.java
    :130)
            ... 4 more
    Caused by: org.omg.CORBA.COMM_FAILURE:   vmcid: SUN  minor code: 203  completed: No
            at com.sun.corba.se.impl.logging.ORBUtilSystemException.writeErrorSend(Unknown Source)
            at com.sun.corba.se.impl.logging.ORBUtilSystemException.writeErrorSend(Unknown Source)
            at com.sun.corba.se.impl.transport.SocketOrChannelConnectionImpl.writeLock(Unknown Source)
            at com.sun.corba.se.impl.encoding.BufferManagerWriteStream.sendFragment(Unknown Source)
            at com.sun.corba.se.impl.encoding.BufferManagerWriteStream.sendMessage(Unknown Source)
            at com.sun.corba.se.impl.encoding.CDROutputObject.finishSendingMessage(Unknown Source)
            at com.sun.corba.se.impl.protocol.CorbaMessageMediatorImpl.finishSendingRequest(Unknown Sour
    ce)
            at com.sun.corba.se.impl.protocol.CorbaClientRequestDispatcherImpl.marshalingComplete1(Unkno
    wn Source)
            at com.sun.corba.se.impl.protocol.CorbaClientRequestDispatcherImpl.marshalingComplete(Unknow
    n Source)
            at com.sun.corba.se.impl.protocol.CorbaClientDelegateImpl.invoke(Unknown Source)
            at com.sun.corba.se.impl.protocol.CorbaClientDelegateImpl.is_a(Unknown Source)
            at org.omg.CORBA.portable.ObjectImpl._is_a(Unknown Source)
            at weblogic.corba.j2ee.naming.Utils.narrowContext(Utils.java:126)
            at weblogic.corba.j2ee.naming.InitialContextFactoryImpl.getInitialContext(InitialContextFact
    oryImpl.java:94)
            at weblogic.corba.j2ee.naming.InitialContextFactoryImpl.getInitialContext(InitialContextFact
    oryImpl.java:31)
            at weblogic.jndi.WLInitialContextFactory.getInitialContext(WLInitialContextFactory.java:41)
            at javax.naming.spi.NamingManager.getInitialContext(Unknown Source)
            at javax.naming.InitialContext.getDefaultInitCtx(Unknown Source)
            at javax.naming.InitialContext.init(Unknown Source)
            at javax.naming.InitialContext.<init>(Unknown Source)
            at com.adobe.idp.dsc.provider.impl.ejb.EjbMessageDispatcher.initJndiContext(EjbMessageDispat
    cher.java:213)
            at com.adobe.idp.dsc.provider.impl.ejb.EjbMessageDispatcher.getJndiContext(EjbMessageDispatc
    her.java:226)
            at com.adobe.idp.dsc.provider.impl.ejb.EjbMessageDispatcher.initialise(EjbMessageDispatcher.
    java:87)
            ... 5 more
    can u plz give me some way to do the convertion.

    Yes Sir.....Thanks for ur suggestion.....
    But i didn't find exact solution..well..yes i found some but not exactly there were not in the way i required...I jus need to convert HTML to PDF using iText API for java.....I already used some classes in that like HTMLParser.....etc..
    So Any thing else...Any one...Sure can help me in this................

  • Convert Hexadecimal NCRs to unicode characters

    I have Hexadecimal NCRs in comments while importing comments from the 3b2 application. How can this be converted to appropriate characters using acrobat javascript.
    For example: The comment contains (&#x000D) which should be converted to the appropriate unicode character. 

    I may be wrong, but I think you would use String.fromCharCode to convert a UCS-2 code into a string. So you then have to parse your string to process any escapes in it, and call that method.

  • Convert HTML special characters to String

    Hi,
    I'm looking for an easy way to convert HTML special characters (like "&#246") to String.
    Any Ideas how implement this? Thanks

    Hi,
    I'm looking for an easy way to convert HTML special
    characters (like "�") to String.
    Any Ideas how implement this? Thankswell im assuming that you mean that when you are working with java you will be getting a ? instead of the character you want. assuming that you are reading and writing to this html file. if not well maybe this can stil help : BufferedReader inFile = new BufferedReader(new InputStreamReader(new FileInputStream(fileName),"ISO8859-1"));
    PrintWriter outFile = new PrintWriter(new OutputStreamWriter(new FileOutputStream(fileName2), "ISO8859-1")); basically you need to make sure that ou define ISO8859-1 as the encoding.

  • Parsing unicode escape codes

    Hi,
    I'm looking for a way to covert a string, wich is read from a file, containing unicode escape codes.
    In short this means the file contains a string e.g. "Some text\nOn a new line" which i want to get into a String object as if it was the result of
    String s = new String("Some text\nOn a new line"); I've been looking in the java docs but didn't find a function to do that conversion (though the compiler has to do it all the time...).
    Any ideas?

    Thats not what i'm looking for.
    What i've got is a file that look like this:
    1="Somestring"
    2="another message\nAnd some more text"
    3="text with\tTabs\n\tTo get some layout"
    etc...
    It's used as a stringtable for a program that has to be aviable in multiple languages. There versions of the file in different languages.
    What i want is to be able to get e.g. string number 2 out of it in such way that System.out.println(string2); will give the following result:
    another message
    And some more text
    instead of:
    "another message\nAnd some more text"

  • How can I get unicode escape for £ in java

    Is there any API which can translate the symbol &pound; to its corresponding unicode escape?

    SurfManNL wrote:
    Found one as well, but in the currency secion: 20A4
    http://www.unicode.org/charts/PDF/U20A0.pdf
    That's for Italian Lira, which has been replaced by the Euro some years ago ;)

  • Unicode escapes

    When I get the input of unicode escapes,
    I can work with them as a String, but as soon as I add \u0022 (that is a double quote) to the input,
    then it won't take it anymore, and it says ";" expected.
    Now I am wondering, what is the way around this problem?
    Any advice please?
    public class Unicode
         String uniSt =  "\u0074\u0065\u0073\u0074"; // can't add \u0022 to this string
         // As soon as I add this double quote \u0022, it won't work and would say  ";" expected";
         public Unicode()
              System.out.println(uniSt);
         public static void main (String[] args)
              new Unicode();
    {code}                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       

    The unicode value is a double quote and acts as one. If the string has an ending double quote character also, that's an error.
    Use this
    String s = "\u0074\u0065\u0073\u0074\u0022;{code}                                                                                                                                                                                                                                                                                                                                                                                                   

  • How to convert html to pdf using acrobat sdk 8.0?

    hi
    I am a beginner of acrobat sdk .
    I want to know How to use acrobat sdk 8.0 to convert html to pdf?
    herere some questions :
    1:How to support navigation inside PDF file that generated using acrobat sdk 8.0? For example: theres catalog in the top of HTML file, customer hopes can navigate inside the PDF file just like navigating inside the HTML file.
    2:How to support operating some controls in the PDF file that generated using acrobat sdk 8.0? For example: therere some drop down list and text box in HTML file, customer hopes can input text in the text box, click the drop down list to see available options in it just like in HTML file.
    Thanks in advance for any help and suggestion.

    Hello,
    I want a system to re-brand my 37 pages PDF for affiliates.
    I want a php dynamic link in the PDF online in order to personalize automatically the PDF for each affiliate. I need to change 2 links each time. The affiliate ID and the Paypal email (payment button) in page 36.
    Can you help?
    Please let me know
    Thank you
    Alex
    PS My system is online and i can give you the url if it helps.

  • A tool can convert HTML to Excel

    Hi All , Are you using report 6i and want to out put report in excel format? If you are , a free software which can convert HTML to Excel is available .
    The software is designed to print very large report , Now a wonderful function is added to software , Thru which you can convert HTML to Excel easily . But the function is still basal , It will do better in the future .
    For more information, Please visit
    http://repbrowser.freewebpage.org/
    Thank you ,
    Regards

    Hi,
    the only other ways (as I know), if you really want to convert is
    a) write a parser to convert html into csv(xls)
    b) use a html2csv script on the os level
    like:
    http://sebsauvage.net/python/html2csv.py (or just google html2csv)
    c) use excel (data source web; local file: "file:///C:/test.htm"
    Kind Regards,
    Dirk

  • How to convert a HTML files into a text file using Java

    Hi guys...!
    I was wondering if there is a way to convert a HTML file into a text file using java programing language. Likewise I would also like to know if there is a way to convert any type of file (excel, power point, and word) into text using java.
    By the way, I really appreciated the help that you guys gave me on my previous topic on how to extract tests from a pdf file.
    Thank you....

    HTML files are already text files. What do you mean you want to convert them?
    I think if you search the web, you can find things for converting those MS Office files to text (or extracting text from them, as I assume you mean).

  • Convert  html to word document

    convert html to word document ,
    I tried poi-3.0.2-FINAL,Apache POI - HWPF - Java API to Handle Microsoft Word Files
    it is not working...

    My actual goal is convert html file into word document,
    i posted into forum, some people are suggested HWPF just look,
    I tried one by one program i not getting any answer for example one program,
    HWPFDocument     doc = new HWPFDocument (new FileInputStream ("c:\\temp.doc"));
                   Range r = doc.getRange();
              System.out.println("Example you supplied:");
              System.out.println("---------------------");
              for (int x = 0; x < r.numSections(); x++)
              Section s = r.getSection(x);
              for (int y = 0; y < s.numParagraphs(); y++)
              Paragraph p = s.getParagraph(y);
              for (int z = 0; z < p.numCharacterRuns(); z++)
              //character run
              CharacterRun run = p.getCharacterRun(z);
              //character run text
              String text = run.text();
              // show us the text
              System.out.print(text);
              // use a new line at the paragraph break
              System.out.println();
              }catch(NullPointerException exception){
                   exception.printStackTrace();
              } catch (FileNotFoundException e) {
                   // TODO Auto-generated catch block
                   e.printStackTrace();
              } catch (IOException e) {
                   // TODO Auto-generated catch block
                   e.printStackTrace();
    java.io.IOException: Invalid header signature; read 5789751444030890300, expected -2226271756974174256

  • SOAP error "Unicode supplemental characters encountered in parameter"..!!

    Hi All,
    Scenario:  Webservice (SOAP)--> XI --> SAP
    We are facing some problem with chinese characters while sending the soap call.
    The webservice receiving error in the Soap fault as
    " FAILED TO INVOKE WEB SERVICE OPERATION OS_Sales Could not call Web service operation OS_Sales.  Unicode supplemental characters encountered in parameter 1 (11839)".
    Is there any where I can get rid of this?
    Thanks
    Deepthi

    Hi,
    try this - find the character encoding of your chinese characters.......for this take the data and view it in internet explorer and then from menu view - encoding see the encoding used to view it properly....or you can ask your SOAP application guys the character encoding of your chinese characters......
    Then in sender SOAP comm channel, in module tab, add the following in module configuration section:
    Module Key          Parameter.name          Parameter.value
    soap                  XMBWS.XMLEncoding     <your_encoding>
    where <your_encoding> is the encoding scheme of your chinese characters something like iso-<some_number>
    Then rerun your scenario.
    Regards,
    Rajeev Gupta

Maybe you are looking for

  • Follow-up-transactions ( OPP - AG )

    Hi All, I creating the contract with reference to quotation using the copying controls. Header data is copied to contract but item cann,t be copied to contract, while using the copying controls (Quotation to contract) system will ask whether the prod

  • Can't post sayings to my profile page anymore

    I used to be able to post little "sayings" and recipes to my page each morning and everyone liked seeing them...now all of a sudden...everytime I try to post something like that...it says UNDEFINED and nothing else is there...very disappointing...wha

  • Why do all my images open as grayscale?

    No matter what I change my settings to, all images I open in photoshop come in as grayscale.  Any suggestions?

  • Use of volatile modifier in Serialization

    can any body explain me the use of this modifier in Serialization

  • SCEP Post-install update fails

    Hi, I have SCCM 2012 R2 running in a secure environment - Internet access is tightly controlled. I'm presently replacing my SEP clients with SCEP, and my Antimalware policy is set to point to WSUS for updates. SCEP will not update automatically durin