Convertin Cp1252 right curly quotes into UTF8

I was not sure where to post this.
I was having some problems trying to convert CP1252 right curly quote into UTF8. Other Cp1252 characters were converting correctly.
The right curly quote was read from the database as 3 bytes. The first 2 bytes were mapping to legitimate UTF8 tokens but the third byte was not.
According to this chart:
http://www.io.com/~jdawson/cp1252.html
The right curly quote is “e2809d”.
Byte1 Byte2 Byte3
e2 80 9d (hexadecimal)
/342 /200 /235 (octal)
The octals are in a big private static final Java string in sun.nio.cs.MS1252 except “/235” is missing. So the process gets the first 2 bytes right but chokes on the third.
Hacking my own encoder class with the “/235” in the right place to made it work.
I was wondering if this was a bug in sun.nio.cs.MS1252 ?
Thx!
Edited by: langal on Feb 2, 2010 10:37 PM

Ah, the old double-encoding ploy! It looks like the text was encoded as UTF-8, then the resulting byte array was decoded as windows-1252, then the result was encoded again as UTF-8. // original string:
”
// encoded as UTF-8:
E2 80 9D
// decoded as windows-1252 (the third character
// is undisplayable, but valid):
”
// encoded as UTF-8:
C3 A2 E2 82 AC C2 9D The conversion to windows-1252 followed Microsoft's practice instead of its specification, so the "9D" control character is stored safely in its UTF-8 encoding, "C2 9D". Unfortunately, Java follows the spec, so getting the character back out is not so easy (and that maybe-bug is relevant--my apologies). It treats '\u009D' as an invalid character and converts it to 0x3F, the encoding for the question mark.
I did some checking, and it seems U+201D (Right Double Quotation Mark: ”) is the only one of the added characters in the "80..9F" range that causes this problem. If the data is otherwise intact, you can work around this problem by looking for the relevant byte sequence in the intermediate processing stage and replacing it: byte[] bb0 = { (byte)0xC3, (byte)0xA2, (byte)0xE2, (byte)0x82, (byte)0xAC, (byte)0xC2, (byte)0x9D };
// This is the string 'x' from your sample code.
String x = new String(bb0, "UTF-8");
System.out.println(x);  // ”
byte[] bb1 = x.getBytes("windows-1252"); // E2 80 3F
for (int i = 0; i < bb1.length - 2; i++)
  if ((bb1[i+2] & 0xFF) == 0x3F &&
      (bb1[i+1] & 0xFF) == 0x80 &&
      (bb1[i]   & 0xFF) == 0xE2)
    bb1[i+2] = (byte)0x9D;
String s = new String(bb1, "UTF-8");
System.out.println(s);  // &#x201D; The byte sequence "E2 80 3F" is unlikely to occur naturally in windows-1252, and it's invalid in UTF-8. Of course, this would only be a temporary measure, while you clean up your database as @jtahlborn said.

Similar Messages

  • CS5: Pasting smart quotes into code view problem

    Hi all,
    In my blogging workflow, I copy the blog text from from MacJournal into the code view in Dreamweaver CS5.  In CS3, it seems that DW allowed the left & right smart and unicode quotes to be pasted in untouched.  However, DW CS5 seems to be converting all single ‘ & ’ to the ASCII ' and all “ & ” to ASCII ".  Is there a way to turn this off?
    I don't want to paste the text into the design view because I don't want DW to auto format the text.  I got some specific formatting that I like to do.
    Thanks!
    Steven

    Yes, the appropriate meta tag is in place.  In Preferences, UTF-8 is selected under Fonts.  Try pasting:
    “Test”
    into the code view and see what happens.  Hmm, Just tried pasting the above into the Design View and the same thing happened.  The curly quotes where changed into straight quotes.  BTW, I'm runn ing DW under Mac OS X Lion.
    I pasted the same text in another app, TextEdit, and the curly quotes pasted just fine.
    I need the curly quotes because I change them into &#8216;, &#8217;, &#8220;, and &#8221; (‘ ’ “ ”) for compatibility with older browsers (pre-IE6).  Yes, my site has been visited by such older browsers.
    Thanks!
    Steven

  • How to quickly switch between straight and curly quotes?

    I've recently moved from a Windows XP machine with MS Office to a Mac Pro with Pages.
    For the kinds of documents I typically work on, sometimes I need to have straight quotes, and sometimes curly quotes. With MS Word, I was able to create a couple of macros that would switch these preferences for me. With these macros linked to an icon in the toolbar, switching between straight and curly quotes was as easy as clicking a button.
    Now I'm looking for a way to do this -- or something like it -- with Pages.
    I know how to switch back and forth using the preferences menu, of course, but I'm looking for something quicker and simpler, since I often have to make this change several times a day.
    Can Automator do something like this? Or is there another way?
    -- Eric

    Turn off the auto correction and you can type Curly quotes with:
    left single ‘ option ]
    right single ’ option shift ]
    left double “ option [
    right double ” option shift [
    If you want the French quotes « and » they are option and option shift |
    Peter

  • Curly quotes

    My company is moving away from windows based servers running IIS to Linux running Apache. So now I'm running into the curly quotes issue when pasting from Word in dreamweaver CS4. I have a bunch of sites already built that will be ported over. Not to mention the fact that I get virtually all the content from Word docs. Is there any way to automatically convert  curly quotes, apostrophes, em dashes etc without having to resort to manual find and replace on every page? This has become a major issue for me and I'm wondering if there's any way to set up CS4 to automatically handle these characters when cutting and pasting from Word. DW used to do this but CS4 doesn't seem to. Any info is appreciated.
    thanks

    That is bad programming, they should not occur in these situations.
    Thank you for pointing it out I never noticed, because like 95% of the known universe I use metric measures.
    I have tried all the likely keyboard commands and searched the user manual and can not find a solution.
    I thought of replacing the auto correction with the appropriate space" and "space but this is already hard encoded in and wouldn't fix the problem. What was required in the programming was to check if there was a leading quote mark before substituting a trailing quote mark.
    Send feedback to Apple at http://www.apple.com/feedback/pages.html
    A work around is to go:
    +Menu > Edit > Special Characters… > Punctuation > and use ′ or ″ (single or double Prime)+
    These are actually the correct characters for feet and inches and will not be substituted.
    To speed it up as you type you could use auto-correction to substitute for /' and /" or whatever you think will work.

  • Grep that finds curly quotes and changes to straight ones, but only after digits....

    Hello
    I am trying to find all the curly quotes that follow numbers and turn them into straight quotes.
    I am going to try and put this in a GREP style eventually, but in the meantime, I was testing it with find and change.
    Here is what I did......but it doesn't work.
    For the Find what- I did a positive look behind to find a digit and then pasted in the curly quote
    Fort the change to-I found the Unicode value for the straight quote.
    But it didn't work ;-(
    any thoughts out there!!
    babs

    Well, I'm no Engineer, and I suppose it would require all the font
    manufacturers to rebuild the fonts to handle it the same as small
    No...given that All Caps, is handled by InDesign without special
    font support, this feature could work the same way. A Character
    panel option (like All Caps) that Forces Straight Quotes.
    Or maybe that was your point. (?)
    It can certainly be done.
    Question is merely whether it should be done.
    But how much worse can it get than all those airplanes falling from the sky
    How could you ask such a thing? Surely having the entire universe
    sucked into a rift and enveloped in eternal blackness without sunshine,
    light, or heat is far worse than a few aluminum chassis falling into the
    ocean and there being no power grid to run InDesign unless you purchased
    the hand-crank version!

  • How to parse out curly quotes from a string

    Hi,
    I am writing a web application, where people will be copying from a Word Document into a text area. Then I get a String from the parameter passed.
    How can I parse out curly quotes and mdashes from this String? Are there specific character codes that I can parse out to replace them with regular quote characters or html quote characters?
    Thanks,
    Gabe

    Interesting problem and one that we had to deal with a couple of years ago. I think you might be talking about smart quotes and these are actually control characters used by MS products. They show up as squares in HTML unless properly dealt with. Try downloading some UNICODE charts to find out the values of these characters. I think they are something like 0044 and 0042 but I cannot remember off hand.

  • CS5 and DW curly quotes

    I just upgraded from CS4 to CS5. Dreamweaver used to convert curly quotes to the proper UTF encoding when you typed them in. For example, if you typed the Option-Shift-leftbracket keys you would get &ldquo; in your code. This no longer happens with CS5. It properly converts straight quotes to &quot; but doesn't handle curly quotes or em dashes. Is there a preference or menu command that I don't know about to re-activate this feature? It would be a nightmare to have to convert all high-bit characters, especially since the app used to handle this automatically. I would consider this a major downgrade rather than an upgrade if this feature was lost.

    AdwizUSA wrote:
    I just upgraded from CS4 to CS5. Dreamweaver used to convert curly quotes to the proper UTF encoding when you typed them in. For example, if you typed the Option-Shift-leftbracket keys you would get &ldquo; in your code.
    That's not UTF-8 encoding. &ldquo; is an HTML entity that was necessary because the Western European (ISO-8859-1) encoding doesn't support curly quotes. UTF-8 does support curly quotes, so it's no longer necessary to convert them to HTML entities.
    If you switch your page encoding to Western European (ISO Latin 1), you'll get the HTML entities. If you use UTF-8, you'll get UTF-8 curly quotes.

  • Pages '09 integrating straight and curly quotes within one file.

    I'm doing an edit in Pages and the document has a mix of straight and curly quotes. I can't seem to find a way to get them all the same. Find-and-replace turns some of the quotes backwards (as in 'em for them) Nothing in the archives addresses this.

    Look for patterns in search and replace.
    Search for a space + " to get the leading quote and " + space to get a trailing quote.
    Or get WordServices to fix all this.
    Peter

  • Handling smart/curly quotes in Java

    Hi - I want to know how to handle smart / curly quotes in Java. I need to replace them with actual quotes. I was trying somethin like below.
    xmlString = xmlString .replaceAll( "‘", "&apos;" );
    but this is not working. Just tried to print indexOf( ""‘") and it only returns -1. I was trying to use the html equiv value inside i.e &#8216(folowed by semicolon. the preview replaces it with actual value)
    Pls guide me on this. Its urgent!!
    -Thanks,
    Magesh
    Edited by: magesh_rathnam on Jan 31, 2010 7:09 PM
    Edited by: magesh_rathnam on Jan 31, 2010 7:10 PM

    I guess not then...
    Anyhow try this:
    public static String replaceSmartQuotes(String smartQuotedString) {
      return smartQuotedString.replaceAll("[“”]", "\"").replaceAll("[‘’]", "'");
    }{code}
    Mel                                                                                                                                                                                                                                                                                                                                                                                                                       

  • How to convert characterset VN8VN3 into UTF8??

    Hi all,
    My database is using character set VN8VN3, I want to convert into UTF8. I use tool: csscan. Then:
    SVRMGR> connect sys as sysdba;
    SVRMGR> shutdown immediate
    SVRMGR> startup mount
    SVRMGR> alter system enable restricted session;
    SVRMGR> alter system set job_queue_processes = 0;
    SVRMGR> alter database open;
    SVRMGR> alter database character set utf8;
    but command at the end line, I see error: " new character set is not superset old character set"
    So, what should I do now ??

    thx
    paste values from the edit menu works !

  • Curly quotes in titles in Premiere Pro CS4 - how to get them?

    How can I get the titler text tool to give me proper typographer's curly quotes instead of the inch marks? Is using Alt0147 and Alt0148 the only way?
    Thanks

    Use the character map.  You can see what you want to select.
    I use it all the time for special characters. 

  • Command for move all clips at right of deletion into the deletion gap ?

    Hi,
    Having used razor tool to cut out a portion of a video/audio, rather than then have to highlight all clips etc to the right and drag into this space to snap to last clip, is there a command to do this ?
    Such would save having to zoom out, until able to see all clips and timeline, select the lot, then zoom in again to be sure of dragging across the small gap created in my last edit.
    Envirographics

    Thanks Harm,
    not the most obvious sounding solution, sounds more like a deletion command ! and as for the word Ripple ? ! ...
    r/click 'refill from right' seems more user friendly but it does the job and many thanks to you.
    I shall remember it for the odd name.
    Envirographics

  • Pulling Stock quote into Numbers

    Hello all, after searching Google and Apple, I decided to just post a question. I am looking for a method of pulling a stock quote into Numbers. Anyone have any ideas? Thank you....

    The only soluce it a copy paste from the source web page.
    Yvan KOENIG (from FRANCE samedi 5 juillet 2008 10:39:37)

  • Insert value with quotes into the table

    Hi,
    How can I insert a value along with quotes into a table column of type varchar.
    Meaning..I want to insert the value 'TEST' including the single quotes.
    Regards,
    Murali Mohan

    It only a matter of using the correct number of quotes. to insert 'TEST' you use 3 quotes on each end
    example: '''TEST'''
    Hi,
    How can I insert a value along with quotes into a table column of type varchar.
    Meaning..I want to insert the value 'TEST' including the single quotes.
    Regards,
    Murali Mohan

  • How to store double quote into a string?

    How to store double quote into a string?
    What I mean is:
    suppose I want to save the following sentence into string s:
    What is the syntax?
    Thanks a lot!

    String s = "<a href=\"../jsp/Logout.jsp\">"
    check out this page
    http://java.sun.com/docs/books/tutorial/index.html
    Hope this helps

Maybe you are looking for

  • How to manage multiple folios inside our Newsstand App?

    We publish a monthly food magazine and will soon publish special "recipe collections" during the same month. (We have Pro account and a multi-folio with iTunes subscription) Questions are... 1. How do I keep the monthly food issue (the parent) always

  • Connect macbook to windows network

    Trying to connect a workmate's macbook to MS SBSSRV 2008... absolutely no joy...

  • Does keynote widget work in portrait

    I can insert a keynote into an existing chapter (page) or section when in landscape mode but can't get it to when I am in protrait mode - I just get a beep.  Am I doing something wrong?  One time - not sure how I did it,  but I got it to insert in pr

  • Service file under template folder missing

    Hai, We have installed ITS 6.2 in  a machine.  The templates folder of AGATE does not have service files like BWSP, PZ transaction. I think after installation standard service files should be in templates folder. Correct me if i am wrong.  Is there a

  • Settlement-Why local currency 3 did not be include in settlement by SAP?

    Dear All, Need your advice regarding issue bellow: I have post Accounting Document with zero amount in local currency(USD) and local currency2(USD), but no zero amount in local currency3(IDR). For better simulation, kindly find case bellow: Itm PK S