Converting UNICODE to regular text

I don't know if the subject line really explains my problem. I'm reading some content from a URL and would like to write that content back to a database. This all works fine except when the Html of that URL contains character codes.
As an example: "Effects on blood lipids of a blood pressure & #8211; lowering....."
As you can see from this, it put � instead of putting the dash (-). What I would like to write to my database is:
"Effects on blood lipids of a blood pressure?lowering diet..."
instead of
"Effects on blood lipids of a blood pressure & #8211; lowering....."
Does anyone have any ideas? Is there a way to transform all codes to there character equivalent?
Thanks for your help
(By the way, the html isn't quite right. I had to put it in brackets and insert a space for it to display correctly in the forum)

Hi again,
if you get wrong characters from the URL.openStream it is because the encoding set on your HTML page (probably in the header) is not the same as the default encoding used by your JVM on the client. If you know what encoding is on the HTML page you can set the same encoding for your stream. Use InputStreamReader(inputStream, encoding) for it. If you want to check the encoding of the page programaticaly you can use URLConnection.getContentEncoding() for it.
Hope it helps you,
Regards,
Martin

Similar Messages

  • Converting Unicode to plain text

    Hi,
    Is there anyway i can get the string "Internationalization" from I�t�rn�ti�n�liz�ti�n?

    You could decompose it using one of the Unicode normalizations:
    http://java.sun.com/javase/6/docs/api/java/text/Normalizer.html
    and then go through the result and remove all of the combining characters.

  • How do I convert identity-h encoding to regular text?

    I am currently working on this project where I am running this program called Swish-e.
    Here is the link for Swish-e: Swish-e :: Home Page
    This is used to index files. I have noticed that it is only able to index certain PDF files but not PDF files that are Chinese for example.
    I am using Nitro PDF3 reader to read my PDF files if that makes any difference.
    What I would like to know is what would be the best Linux command or piece of software to use to convert PDF files that are in Identity-h encoding to regular text files? Could this software be incorporated into Swish-e if so how?  Is there even a way to do this? Any help would be greatly appreciated.
    What would I have to put in my swish-e configuration file to index it?

    jdam wrote:
    Here are 8 words that comprise the following message   KNID/T192023
    19278, 18756, 12116, 12601, 12848, 12851, 8224, 8224
    Your 8 words should give 16 characters.  The last four must be spaces because there are only 12 characters in KNID/T192023.  Also you stated that the data is in hex, but your 8 words are obviously in decimal.  Anyway, here is a vi to decipher the words.
    Once again Altenbach has to come up with a simpler way to do things.  Maybe I should send him all my work for reveiw.
    Message Edited by tbob on 06-20-2006 04:16 PM
    - tbob
    Inventor of the WORM Global
    Attachments:
    DecimalStringsToAsciiChars.vi ‏21 KB

  • AE error: Could not convert Unicode Characters

    Hey guys,
    I purchased the Video Copliot Action essentials 2 (720p). Whenever I try to import or drag and drop the pre-keyed clips (quicktime .mov format) into AE, I get the After Effects error: could not convert Unicode Characters (23 ::46) . I found an article online that said to make changes to the text in whatever Im importing, but umm it's a video, not text.
    I am using AE cs5.
    I can import the clips just fine into Premier Pro and export them oddly enough in Quicktime format just fine, however I lose the transperency "pre-keyed" , that's somehow embedded into the original video, therefore I now have a video of smoke, but with a non removable black background.
    Please help! thanks!

    Hey man i made an account just to reply to this, i had the same error come up while i was importing video files so i had a look around and found that it had something to do with the language/coding not being recognised, so i looked closer into the footage and tried different method of importing the file and later realised that after effects didnt recognise some of the characters in the file path way, the original folder was created using a macbook, windows recognises the language but after effects didnt, so i moved the file to my desktop and tried to import it again and presto it worked fine, you may not have the same problem but i thought just incase you do, you should try moving the file,
    if not heres a thread for the error:
    http://helpx.adobe.com/after-effects/kb/error-could-convert-unicode-characters.html
    hope that could be of some help.
    Zai

  • My friend has an iPhone but doesn't use iMessage. How can I send her regular texts with the new iOS 7 update?

    My friend has an iPhone but doesn't use iMessage. How can I send her regular texts with the new iOS 7 update?

    That's because her phone is recognized in the Apple database as an iPhone.  It should eventually convert to SMS but that's above my level of expertise here so hopefully someone else can come along and help you further.  She could just turn on iMessage and solve the problem. Ha!
    Good luck!

  • Can I use Mac Dictate to convert voice memos to text?

    Can I use Mac Dictate to convert voice memos to text?

    If by "Mac Dictate", you mean the built in speech-to-text feature, no, you can't use that to convert a voice memo, at least not easily. Speech-to-text is designed to work on live speech, not recordings. But, there's no harm in trying.

  • Convert Cross-Reference to Text

    Does anybody have a script that will convert cross-references to text? In other words, I want to be able to remove the linkage of the cross-reference but keep the static text in place.
    Editorial comment: I set up on my local computer several books in CS6 and used cross-references, and everything worked fine. Then I moved the files to a network location and started suffering that painful slowdown that I had experienced in previous versions of InDesign. (I foolishly believed Adobe would have fixed this by now. My mistake.)

    I'm guessing that the op want to convert the cross-references that you set in the Hyperlinks panel. Try this:
    var xrefs = app.documents[0].crossReferenceSources.everyItem().getElements();
    for (var i = xrefs.length-1; i > -1; i--){
        xrefs[i].sourceText.insertionPoints[-1].contents = xrefs[i].sourceText.contents;
        xrefs[i].remove();
    Peter

  • Xml Rtf has value as Regular Text,but in Excel it is considering as Number

    Hi All,
    In XML Rtf ,for an Item,the value in Text form Field is set as 'Regular Text',and we are generating the output in excel.
    In Excel this item value is considering as number.
    For Eg : 0120190 is the actual Item,but in excel it is displaying as 120190,'0' is missing.
    Needed the output to print as it is 0120190,Please let us know how to accomplish this.
    Thanks
    Sreekanth

    plz see
    Excel Output From BI Publisher or XML Publisher is Trimming Leading Zeros [ID 417811.1]

  • How do I convert speech dictation to text on my macbook pro?

    How do I convert speech dictation to text on my macbook pro?

    Help here >   Mac Basics: Dictation

  • TS2755 I used to get iMessages on my iPad 2 and iPhone 4 now it goes to one or the other, not both.  Icloud is not working for this and iMessage texting doesn't work I need to send it as a regular text. How do I fix this issue.

    Icloud doesn't work between my iPad 2 and iPhone 4. And when I try to iMessage from my iphone 4 it doesn't send. I need to send it as a regular text message. Any ideas why they won't sync and iMessage won't send?

    To delete, tap "Edit" (tap to enlarge image)

  • I am looking for an application that would allow me to open a word doc, and take notes in the .doc using a stylus pen.  I'd then like to convert those notes to text, and then be able to copy / paste those notes into an email.  Does this app exist?

    I am looking for an application that would allow me to open a word doc, and take notes in the .doc using a stylus pen.  I'd then like to convert those notes to text, and then be able to copy / paste those notes into an email.  Does this app exist?  It seems like we were doing these same types of things with Palm Pilots years ago, one would think this would work with iPads?

    I don't believe it will open a Word document, but Writepad allows for handwritten conversion of notes to text and then to email. Might help you some of the way...

  • Error synchroniz​ing with Windows 7 Contacts: "CRADSData​base ERROR (5211): There is an error converting Unicode string to or from code page string"

    CRADSDatabase ERROR (5211): There is an error converting Unicode string to or from code page string
    Device          Blackberry Z10
    Sw release  10.2.1.2977
    OS Version  10.2.1.3247
    The problem is known by Blackberry but they didn't make a little effort to solve this problem and they wonder why nobody buy Blackberry. I come from Android platform and I regret buying Blackberry: call problems(I sent it to service because the people that I was talking with couldn't hear me), jack problems (the headphones does not work; I will send it again to service). This synchronisation problem is "the drop that fills the glass". Please don't buy Blackberry any more.
    http://btsc.webapps.blackberry.com/btsc/viewdocume​nt.do?noCount=true&externalId=KB33098&sliceId=2&di​...

    This is a Windows registry issue, if you search the Web using these keywords:
    "how to fix craddatabase error 5211"       you will find a registry editor that syas it can fix this issue.

  • Converting Word to pdf -  Text in "Pictures" corrupted

    When I convert a Word (ver 10 SP3) document with a graphic "picture" which contains both graphics and text to .pdf format with Acrobat 9 Pro the graphics convert fine but the text in the picture becomes very large and moves outside of the picture. The same problem occurred when I had Acrobat 5 on my system.
    A colleague with Acrobat 9 Pro converted one of my .doc files and it converted fine on his system. He feels that something is wrong with the PDF driver, Word, or Acrobat. Where do I start tracing down this problem?
    I can post the source and resulting .pdf files, but I don't know how to do that in this forum.

    I am not sure which version of WORD is ver 10. You may want to update WORD. Word does have some problems along this line. The new DOCX version even does some games with graphics that deletes the font in vector graphics apparently.

  • Report Builder SQL Queries - Convert CN to clear text

    I am trying to customize a query in Report Builder to message values as they are delivered, and running into value expression errors.
    My query returns data from the AD Computer Object "managedBy" field. The problem is that this field returns data in the format of:
    CN=Security Group Name,OU=blah,OU=blah,DC=stuff,DC=com etc
    I am trying to get it to return just the "Security Group" value which is much more useful. I found this great article which almost works for me: https://social.technet.microsoft.com/forums/systemcenter/en-US/6610d238-72f2-4e75-a0cc-e1383dd8e94b/ad-system-discovery-convert-cn-to-clear-text
    However, once I try to save the report in Report Builder I get:
    System.Web.Services.Protocols.SoapException: The Value expression for the text box ‘managedBy0’ refers to the field ‘managedBy0’.  Report item expressions can only refer to fields within the current dataset scope or, if inside an aggregate, the specified
    dataset scope. Letters in the names of fields must use the correct case.
    This is the original SQL query:
    select  all SMS_R_System.Name0,SMS_R_System.managedBy0,SMS_R_System.description0,SMS_G_System_OPERATING_SYSTEM.Caption00,SMS_R_System.Resource_Domain_OR_Workgr0,SMS_R_System.whenCreated0,SMS_R_System.Last_Logon_Timestamp0 from vSMS_R_System AS SMS_R_System
    INNER JOIN Operating_System_DATA AS SMS_G_System_OPERATING_SYSTEM ON SMS_G_System_OPERATING_SYSTEM.MachineID = SMS_R_System.ItemKey  INNER JOIN _RES_COLL_CAS000A9 AS SMS_CM_RES_COLL_CAS000A9 ON SMS_CM_RES_COLL_CAS000A9.MachineID = SMS_R_System.ItemKey
    My edited query:
    select  all SMS_R_System.Name0,REPLACE(SUBSTRING(SMS_R_System.managedBy0,4,CHARINDEX(',OU',SMS_R_System.managedBy0,3)-4),'\,',','),SMS_R_System.description0,SMS_G_System_OPERATING_SYSTEM.Caption00,SMS_R_System.Resource_Domain_OR_Workgr0,SMS_R_System.whenCreated0,SMS_R_System.Last_Logon_Timestamp0
    from vSMS_R_System AS SMS_R_System INNER JOIN Operating_System_DATA AS SMS_G_System_OPERATING_SYSTEM ON SMS_G_System_OPERATING_SYSTEM.MachineID = SMS_R_System.ItemKey  INNER JOIN _RES_COLL_CAS000A9 AS SMS_CM_RES_COLL_CAS000A9 ON SMS_CM_RES_COLL_CAS000A9.MachineID
    = SMS_R_System.ItemKey
    Thoughts?

    Hey,
    Since you manipulate the original value of
    ManagedBy0 in your SELECT, the column name will change to
    (No column name). I did a test on my side and look what I get.
    Error : "The Value expression for the text box ‘managedBy0’ refers to the field ‘managedBy0’."
    He search for a variable
    ManagedBy0 in your query but don't find any because there's no name assigned.
    Try to run this query. Add
    AS ManagedBy0 after your REPLACE. 
    select all
    SMS_R_System.Name0,
    REPLACE(SUBSTRING(SMS_R_System.managedBy0,4,CHARINDEX(',OU',SMS_R_System.managedBy0,3)-4),'\,',',') AS ManagedBy0,
    SMS_R_System.description0,
    SMS_G_System_OPERATING_SYSTEM.Caption00,
    SMS_R_System.Resource_Domain_OR_Workgr0,
    SMS_R_System.whenCreated0,
    SMS_R_System.Last_Logon_Timestamp0 from vSMS_R_System AS SMS_R_System INNER JOIN Operating_System_DATA AS SMS_G_System_OPERATING_SYSTEM ON SMS_G_System_OPERATING_SYSTEM.MachineID = SMS_R_System.ItemKey INNER JOIN _RES_COLL_CAS000A9 AS SMS_CM_RES_COLL_CAS000A9 ON SMS_CM_RES_COLL_CAS000A9.MachineID = SMS_R_System.ItemKey
    Let me know.
    Nick Pilon | Blog : System Center Dudes

  • Converting PDF's to Text

    I have a huge collection of documents I want to digitize. I just bought an Epson scanner, so I can scan the documents in a variety of formats, including .jpg, .tiff and pdf. Unfortunately, I can't get the OCR software that came with the scanner (ABBYY FineReader Sprint) to work.
    Then I remembered seeing PDF converters online, so I figured I could just scan everyrthing as a PDF, then convert it to text. But I'm confused. I tried Adobe Acrobat's export function, but that didn't do anything. I read that you can open a PDF in Preview and copy the text, but that doesn't work.
    It sounds like there are two ways to create a PDF. With OCR software you can create a text-PDF, whereas I apparently have a scanned-image-PDF, if I understand correctly.
    Anyway, I'm confused. Can anyone recommend a software program or online service that will convert PDF's to text on a Mac? I'm also interested in learning how to batch process PDF's. I'm going to have hundreds of documents, maybe a few thousand.
    Thanks for any tips.

    David Blomstrom wrote:
     ...Can anyone recommend a software program or online service that will convert PDF's to text on a Mac? I'm also interested in learning how to batch process PDF's. I'm going to have hundreds of documents, maybe a few thousand. 
    Thanks for any tips.
    Since being able to scan combined with OCR will be the easiest approach, getting the OCR software to work would seem to be the best solution. Which version of FineReader Sprint do you have? There's a version in the App Store (https://itunes.apple.com/app/abbyy-finereader-express/id412310371?mt=12) which is supposed to be compatible with Lion and Mountain Lion and if that's not the version you have, perhaps you can upgrade to it. Check out http://www.abbyy.com/checkforupdates/?PartNumber=71817&product=FineReader%20Expr ess%20Edition%20for%20Mac which might do the trick.
    Unless the scanning process has the OCR step built in, what you'll get is an image, usually JPG, which can be turned into a PDF file but it's still just a picture. If you could turn it into a PDF that has actual text in it, then you can get into text extraction. There are a number of programs which are supposed to be able to do that. The only one I've tried that works pretty well is MS Word in the Office 2013 suite for Windows. I have it running in a Windows 8 Virtual Machine on the Mac, but that's a long and expensive way around to begin to do what you need.

Maybe you are looking for