Decoding HTML Characters

A while ago I wrote in java an application which would sit on my college server and every 15mins parse an rss feed and create a duplicate of that feed but with full body content .The reason I have done this is so that I can view the feed offline.
All works well except I have one issue and Im sure it lies in decoding the stream, for the most part all the characters come out fine but for certain special characters they come out in junk
So again, I'll
listen to some Utada Hikaru to begin with and instantly I know something'sGets decoded to
So again, Iâ??ll listen to some Utada Hikaru to begin with and instantly I know somethingâ??sSo obviously the character ' is throwing out the decoding (actually as I see it the character is not ' as above but a curly version of it, this format appers to be cleaning it up).
So I would like to know how to decode the stream correctly I tried using InputStreamReader and setting the CharsetDecoder but to no avail though of course I may not have done this correctly.
Thanks Ger.
Edited by: Ger@newToProgramming on Nov 3, 2009 4:04 AM

I tired again to set the CharSet and no luck.
To explain whats is happening to see if the problem lies anywhere else.
A screen scrape is preformed the scrapped page is run through JTidy the main article located and storged in an sql database using Apache it is then later retrieved and written to a rss feed enclosed in CData blocks.
[Example Output|http://www.redbrick.dcu.ie/~gleesog4/rss2/feeds/Breaking/Bloomberg/index.rss]
Code for reading the page:
     public static String GetPage(String url) throws Exception
          BufferedReader buffer = new BufferedReader( new InputStreamReader(GetURLStream(url),"UTF-8") );
          StringBuilder builder = new StringBuilder();
          int byteRead;
          while ((byteRead = buffer.read()) != -1)
                        builder.append((char) byteRead);
          buffer.close();
          return builder.toString();
    public static InputStream GetURLStream(String url) throws Exception
         InputStream strIn = null;
         boolean isFile = false;
         boolean isURL = false;
          if(!isURL)
                   try
                        strIn = new FileInputStream(url);
                        isFile = true;
                   }catch(Exception e){}
          if(!isFile)
               try
                    strIn = new URL(url).openStream();
                    isURL = true;
               }catch(Exception e)     {}
          if(!isFile && !isURL)
               throw new FileNotFoundException("Can not locate as either file or url\n" + url);
          return strIn;
    }

Similar Messages

  • Special HTML Characters

    Hi,
    I encountered a problem with regards to the display of special HTML characters(chr 155). Crystal was not able to correctly display the cahracters. Instead a blank space was displayed. In addition to, when the report is exported to PDF, it is displayed as boxes.
    Is there a way to handle display of special HTML chars in crystal?
    Thanks

    Crystal HTML interpreter is very limited and has been same for years, so it seems unlikley it will chnage any time soon.
    As its a specific character that is failing use a replace formul to remove the long dash html and replace with a short dash html which I guess Crystal will recognise.
    Replace(yourfield, 'longdashhtml', 'shortdashhtml')
    Ian

  • Parsing HTML characters (e.g. &nbsp)

    Hi
    Apologies if I'm missing something obvious, I haven't been able to find an answer searching the API or Forums...
    I'm parsing HTML documents (currently as Strings) to extract certain information. Is there an easy way to replace all special HTML characters such as   < etc. to a space or < respectively without having to do a string replace on every possible HTML character?
    I know there's an HTML parser in swing but that seems to be geared towards creating an HTML editor.
    Any help would be appreciated!

    There are also a number of open source or shareware programs, such as TidyHTML, that clean-up and parse existing HTML. Check out Sourceforge or www.downloads.com.
    - Saish

  • Escaping of html characters

    ahoj!
    in an sql report i have to show text messages that include sometimes special html characters like <. is there an oracle function to convert this characters in the format & #60; (without the blank)? i don't want to replace all the special characters by myself.
    thanks!
    ciao,
    christian
    Edited by: Christian Ropposch on Apr 8, 2009 1:39 PM
    Edited by: Christian Ropposch on Apr 8, 2009 1:40 PM

    Hi Christian R.!
    If I understood right then you are using APEX. HTP and HTF are included with Oracle and APEX. You don't need the Application Server.
    regards

  • HTML characters display incorrectly in Firefox

    Special HTML characters like ∠ and ← and ∝ (etc.) do not display as they should in Firefox on my Mac.
    I messed with Font Book earlier today, and though I don't think I made any fatal changes, I think that might have caused this problem (though certain symbols like ♥ had never shown up correctly).
    However, all of these characters display perfectly in Safari. See:
    http://www.plisher.org/safari.jpg
    http://www.plisher.org/firefox.jpg
    How might I fix this?

    Well, I just launched Firefox and opened this page in it. The angle, arrow and whatevertheotherthing is all displayed correctly, the heart however showed up in Firefox as a perpendicular line.
    I'm using Firefox 2.0.0.4, with the default font set to Geneva.
    Francine
    Francine
    Schwieder

  • [svn] 3590: Replace invalid html characters

    Revision: 3590
    Author: [email protected]
    Date: 2008-10-13 07:29:43 -0700 (Mon, 13 Oct 2008)
    Log Message:
    Replace invalid html characters
    Checkin Test Passed: Yes
    QA: No
    Bug:
    Doc: No
    Modified Paths:
    flex/sdk/trunk/frameworks/projects/flex4/src/mx/layout/ILayoutItem.as

    Try the HtmlEditFormat function built into ColdFusion.
    http://livedocs.adobe.com/coldfusion/8/htmldocs/help.html?content=functions_h-im_04.html#4 744272

  • String auto-converts HTML characters?

    Hi all, I have an XML document that contains a field like
    this:
    <someHTML>&lt;b&gt;ThisIsBolded&lt;/b&gt;</someHTML>
    Now, I want to put this correctly-rendered HTML into the
    htmlText portion of a Text tag. Which I try to accomplish using the
    following:
    var nodeHTML:String = node.someHTML.toString();
    trace(nodeHTML); // Output: <b>ThisIsBolded</b>
    (what!? I didn't replace anything!)
    nodeHTML.replace("<","&lt;").replace(">","&gt;");
    trace(nodeHTML); // Output: <b>ThisIsBolded</b>
    (what!? replacement didn't do anything!)
    myText.htmlText += nodeBody + "\n"; // Output: Plaintext
    output, no bolding! But also no tags...?
    Does anyone know what is going on? It seems like I can't
    display HTML in the htmlText property. Any help would be
    appreciated.

    Post Author: V361
    CA Forum: General
    The maximum length of a String constant, a String value held by a String variable, a String value returned by a function or a String element of a String array is 65,534 characters.
    The maximum size of an array is 1000 elements.
    The maximum number of arguments to a function is 1000. (This applies to functions that can have an indefinite number of arguments such as Choose).
    Not sure about the HTML ?

  • Decode HTML character in hyperlink parameter

    Hi all,
    I have some problem in passing parameters from a report to another. In particular, the LOVs of prompts have particular formatting:
    1._ _ _ first value
    2._ _ _ second value
    Does anybody know how to decode "dots" and "dashes" in html syntax?
    Thanks
    Riccardo

    Hi,
    Are the "dots" and "dashes" in the value really causing problems ?
    You can try enclosing the values within double quotes. Use Char(34) formula to include double quotes in the URL.
    Note: If there is a space within the value, please replace the space with a '+' using Replace function and you will not have to use double quotes.
    For example, replace
    1._ _ _ first value
    by
    1._+_+_+first+value
    I think it is the spaces within the values which can cause problem and not the dashes and dots.
    Regards

  • How to display, translate or remove html characters

    Hi,
    I need the ability to translate or remove the characters below to display properly in oracle apex text area. Anyone know how to do this?
    Apex 4
    &lt;p&gt;1985 World Champ. U.19 - Gold, &lt;span style=&quot;color: rgb(255, 0, 255);&quot;&gt;&lt;strong&gt;1986 World Champ. - Gold,&lt;/strong&gt; &lt;/span&gt;1987 Pan American Games - Gold, 1987 North American Champ. - Gold, 1987 World Cup - Gold, &lt;span style=&quot;color: rgb(255, 0, 255);&quot;&gt;&lt;strong&gt;1989 World Champ. - Gold,&lt;/strong&gt;&lt;/span&gt; 1990 Goodwill Games - Gold, 1990 World Cup - Gold, 1991 Playa Giron - Gold, 1991 Pan American Games - Gold, &lt;span style=&quot;color: rgb(255, 0, 255);&quot;&gt;&lt;strong&gt;1991 World Champ. -
    Thanks
    Dean

    Assuming I've understood the requirement:
    SQL> with t as (
      2    select '<p>1985 World Champ. U.19 - Gold, <span style="color: rgb(255, 0, 255);"><strong>1986 World Champ. - Gold,</strong> <
    /span>1987 Pan American Games - Gold, 1987 North American Champ. - Gold, 1987 World Cup - Gold, <span style="color: rgb(255, 0, 255)
    ;"><strong>1989 World Champ. - Gold,</strong></span> 1990 Goodwill Games - Gold, 1990 World Cup - Gold, 1991 Playa Giron - Gold, 199
    1 Pan American Games - Gold, <span style="color: rgb(255, 0, 255);"><strong>1991 World Champ. - ' s from dual)
      3  select
      4            regexp_replace(s, '<[^>]*>') removed
      5          , htf.escape_sc(s) escaped
      6  from
      7*           t
    SQL> /
    REMOVED                                  ESCAPED
    1985 World Champ. U.19 - Gold, 1986 Worl &lt;p&gt;1985 World Champ. U.19 - Gold,
    d Champ. - Gold, 1987 Pan American Games &lt;span style=&quot;color: rgb(255, 0,
    - Gold, 1987 North American Champ. - Go 255);&quot;&gt;&lt;strong&gt;1986 World
    ld, 1987 World Cup - Gold, 1989 World Ch Champ. - Gold,&lt;/strong&gt; &lt;/span&
    amp. - Gold, 1990 Goodwill Games - Gold, gt;1987 Pan American Games - Gold, 1987
    1990 World Cup - Gold, 1991 Playa Giron North American Champ. - Gold, 1987 World
    - Gold, 1991 Pan American Games - Gold,  Cup - Gold, &lt;span style=&quot;color:
    1991 World Champ. -                      rgb(255, 0, 255);&quot;&gt;&lt;strong&g
                                             t;1989 World Champ. - Gold,&lt;/strong&g
                                             t;&lt;/span&gt; 1990 Goodwill Games - Go
                                             ld, 1990 World Cup - Gold, 1991 Playa Gi
                                             ron - Gold, 1991 Pan American Games - Go
                                             ld, &lt;span style=&quot;color: rgb(255,
                                              0, 255);&quot;&gt;&lt;strong&gt;1991 Wo
                                             rld Champ. -(Inconsistent behaviour by the Jive forum software means it isn't displaying the difference in the ESCAPED result properly...)

  • Replacing HTML characters

    I'll keep this short and simple. Here's what I'm trying to
    do:
    <cfset
    qInfo.instructions=ReplaceNoCase(qInfo.instructions,">","&gt;","all">
    qInfo.instructions is used to populate a HTML cfformitem, so
    I want to replace all the <,>,and &. The above line is
    the first of three replacenocase statements and is throwing an
    'Invalid CFML construct' error. What do I need to do
    differently?

    Try the HtmlEditFormat function built into ColdFusion.
    http://livedocs.adobe.com/coldfusion/8/htmldocs/help.html?content=functions_h-im_04.html#4 744272

  • Escaping html characters

    Hello all,
    I am creating a flex chat app. This communicates with my own
    chat server.
    Now I want to know what is the best way to escape characters?
    I send all data in a XML document to the server, so if I type
    a message: <i>test</i> the text will appear italy, as
    it is not escaped.
    If I use escape, it will escape to a code that the Text
    control not is able to read.
    I just want to know which function will replace < in
    &lt; > in &gt; and & in &amp;.
    Thanks!

    I have tried esc ape and unes cape.
    But these are my results:
    var Str: String = '<foo>rock & roll</foo>';
    Alert.show(esc ape(Str));
    This will come to:
    %3Cfoo%3Erock%20%26%20roll%3C/foo%3E
    Thats fine for sending data, but putting it back with un esc
    ape will just give me the original back (as expected).
    But I want my < to be &lt; or the equivelant ascii
    code &#100;
    In PHP you use htmlentities or htmlencode, but in Flex is a
    mystery.
    Sorry, didn't read your post properly, I think I will have to
    come up with my own replace function then :(

  • Decode HTML escaped character references

    sure, I can write
    string.replace(" " , " ")
    but obviously can't do that for all Unicode character references in the world, and surely this problem must be a routine library call .... but eh ... which? I don't seem to be able to find anything by googling.
    thanks in advance

    @hugoT - thanks for the link to the list ...
    ... but eh .. I really don't want to do this myself, if there's a public library that will do it for me ... something like ... I send a string over, full of escaped character references, and get a nice and human readable string back.
    this kind of bread and butter code must be out there somewhere (i hope)

  • Weblogic xss vulnerablity : html character entities getting decoded in jsp by ${} expression

    This is from my question at stack overflow java - Weblogic xss vulnerablity : html character entities getting decoded in jsp - Stack Overflow
    I am using a filter to prevent xss by encoding html character of my jsp form parameters.
    I am resolving them in jsp using ${param} expression.
    This is working fine in tomcat as the values are resolved as is, but on weblogic the values are getting decoded, causing the XSS to succeed
    I am using this simple code in jsp to test it
    <c:set var="testing" value="eb011&quot;&gt;&lt;img src=a onerror=confirm(1)&gt;47379"/> <input type="hidden" name="encoding" value="${testing }"/>
    Result in tomcat
    <input type="hidden" onerror="confirm(1)&gt;47379&quot;/" src="a" &gt;&lt;img="" value="eb011" name="encoding">
    Result in weblogic
    <input type="hidden" value="eb011" name="encoding"><img onerror="confirm(1)" src="a">47379"/&gt;
    why is weblogic decoding html codes and what could be done to prevent it.

    It is really handy to learn how to read schema validation errors. It really does say exactly what's wrong there. If you can get access to the XSD that your XML document is prescribing, you should be able to tell what mistake you made. If you learn how to do this, you'll never have to ask questions like this again. :)
    The error refers to the "http://www.bea.com/ns/weblogic/weblogic-web-app" namespace, which I believe is in your "weblogic.xml" file. It's saying that in the "jsp-descriptor" element, it found a "noTryBlocks" element at a point where it was not legal. At that point, it expected to find either a "'precompile-continue" or several other elements, but not that one. Read the XSD to determine the correct order for elements. If you're editing this file in Eclipse, you may not even have to obtain the XSD. If you hover the mouse over the root element of the document, it will give you a popup showing the syntax details of the element, which will tell you what the expected order of elements is.

  • Decoding characters

    Hello
    I have a problem when decoding special characters using new String().
    My problem:
    A C++ program is communicating with a Java program through a socket.
    The two programs are working on two separate computers with each
    an Linux RedHat 7.3 OS.
    When I'm writing a special character (127<char<=255) on C++ on the
    socket, the String returned on Java side is bad decoded.
    My Java code:
    in = new DataInputStream(new ByteArrayInputStream(body));
    byte[] myStringBytes = new byte[stringLength];
    in.read(myStringBytes);
    String myString = new String(myStringBytes);
    System.out.println("String attribute: "+myString);
    I tried new String() with some standard charset bu the problem is still
    the same (US-ASCII, ISO-8859-1, UTF-8, UTF-16BE, UTF-16LE, UTF-16)
    Thanks
    Ludovic
    France

    Messages are serialized using the following scheme:
    Header: MessageType(1B, int) MessageID(4B, int) BodySize(2B, int)
    Body: Sequence of the following pieces:
    - for numeric (unsigned integer) attributes:
    AttributeType(1B, int) AttributeValue(4B, int)
    - for text attributes:
    AttributeType(1B, int) TextLength(2B, int) AttributeValue(1+B, String) Zero(1B)
    The code:
    void readFrom(InputStream stream) throws java.io.IOException
    DataInputStream in = new DataInputStream(stream);
    int type = in.readUnsignedByte();
    int id = in.readInt();
    int bodySize = in.readUnsignedShort();
    byte[] body = new byte[bodySize];
    in.readFully(body);
    in = new DataInputStream(new ByteArrayInputStream(body));
    while (in.available()!=0)
    int type = in.readUnsignedByte();
    if (Types.isStrAttribute(type))
    try
    // Get length
    int usLength = in.readUnsignedShort();
    System.out.println("Message.readFrom: length of the String attribute: "+usLength);
    // Then, read length bytes from the input stream
    byte[] myStringBytes = new byte[usLength];
    in.read(myStringBytes);
    // Default
    String myDefaultString = new String(myStringBytes);
    System.out.println("Message.readFrom: String attribute: default =="+myDefaultString);
    in.readUnsignedByte();
    catch (IOException ioException)
    System.out.println("Message.readFrom: ioException");
    else
    addNumAttribute(type, in.readInt());
    }

  • Remove HTML from Interactive report download

    I have interactive reports where the column link on a specific column has to be dynamic, that means, it cannot be hard coded in the column link attributes. The following is an example of one such report query:
    case when d.object_type_description ='Business Service' then
    '< href="f?p='||:app_id||':183:'||:app_session||'::::P183_OBJECT_ID:'||d.id||'">'||d.object_name||'</>'
    when d.object_type_description = 'Real Time Event' then
    '< href="f?p='||:app_id||':162:'||:app_session||'::::P162_OBJECT_ID:'||d.id||'">'||d.object_name||'</>'
    else
       null
    end as "OBJECT NAME"As you see in the above example, the link on the "Object Name" column could either redirect to page 183 or to page 162 based on the "Object Type Description" column.
    The column attribute of the "Object Name" column has "Display Type" set to "Standard Report Column". That works perfectly fine in the UI of the report. However, if I download the IR data (in any format) from the Actions -> Download menu, the object name column values are downloaded with the HTML characters as:
    < href="f?p=15548:183:6072319179284::::P183_OBJECT_ID:255245470513999672860510787772603748464">JP010000</>where JP010000 is the object name.
    Is there a way I can strip the HTML from the column values in the downloaded files?
    I am using Apex 4.1.

    Rohit,
    You can define the link in the query, make it hidden, use the value as column link URL. So, your query will look like the following:
    d.object_name,
    case when d.object_type_description ='Business Service' then
        'f?p='||:app_id||':183:'||:app_session||'::::P183_OBJECT_ID:'||d.id
    when d.object_type_description = 'Real Time Event' then
        'f?p='||:app_id||':162:'||:app_session||'::::P162_OBJECT_ID:'||d.id
    else
       null
    end linkChange the display type of LINK column to Hidden. In column link, enter #OBJECT_NAME# as Link Text. Select URL as Target. Enter #LINK# as URL value. In APEX 4.2, you can do this in easier way by defining HTML Column Expression.
    Regards,
    Christina
    Edited by: cbcho on Sep 27, 2012 11:35 AM

Maybe you are looking for

  • Where can I buy a locked iphone4s in the uae?

    where can I buy a locked iphone4s (white)16gb in the uae?

  • LockBox - Not able to clear customer invoice with partial payment

    Importing LB data using FBL2. I can successfully clear full payment. I have tolerance group null for customer setup as below. When I try even a few cents below full invoice amount it posts, but doesn't clear. The account for Over/Underpayments and Un

  • Recover contacts from iPhone in recovery mode

    I recently tried to update my phone from ios 5.1.1 to ios 7 on the phone itself. Tihe update seemed to be working but then it went into recovery mode. Whenever I connect it to iTunes it only gives me the option to restore. I haven't synced my contact

  • The ship-to party could not be determined

    Hi All, while i was creating abilling doc i got this massage ( The ship-to party could not be determined) system massage NU 049 although i am using acustomer which i defined to it aship-to party and this ship-to party already retrived in the sales or

  • Need help with ASA config to set up proxy on DMZ

    Hello guys, I have a problem, I´m trying to configure an ASA as shown in the attached scenario. I need that all inside users to go to the proxy server on DMZ and from there they will go out to the internet. Right now i have: INSIDE INTERFACE Access-l