Replacing Unicode characters in a String

I have a text as a String and in this text which is foreign language appears sometimes characters in Unicode format as ö or Ö
The question is, how can I convert this String so these Unicode characters would become readable characters.
Thank you for any lead...

:-) Interesting - twice in 10 minutes I'm recommending the use of Elliott Hughes' class. See http://elliotth.blogspot.com/2004/07/java-implementation-of-rubys-gsub.html .
You just have to write a regular expression to match the terms ( "&#(\\\d+)"), extract the number as a string (group(1)) then use the Integer.parseInt(the numeric string) and cast to a char.
Edited by: sabre150 on Feb 9, 2012 9:53 AM
This is an example I wrote a couple of years ago (and published on the old Sun site forums) that does pretty much exactly what you need
import e.util.Rewriter;
public class Sabre20090919
    public static void main(String[] args) throws Exception
        String title = "& #26412;& #26399;& #28136;& #21033;\n" +
                "& #22522;& #26412;& #27599;& #32929;& #30408;& #39192;\n" +
                "& #32380;& #32396;& #29151;& #26989;& #21934;& #20301;& #28136;& #21033;\n" +
                "& #26371;& #35336;& #21407;& #21063;& #35722;& #21205;& #32047;& #31309;& #24433;& #38911;& #25976;\n" +
                "& #26222;& #36890;& #32929;& #27599;& #32929;& #30408;& #39192;\n" +
                "& #31232;& #37323;& #27599;& #32929;& #30408;& #39192;\n";
        Rewriter rewriter = new Rewriter("&\\s*#(\\d+);")
            @Override
            public String replacement()
                return Character.toString((char) Integer.parseInt(group(1)));
        title = rewriter.rewrite(title);
        System.out.println(title);
        System.out.print("Unicode :\"");
        for (char ch : title.toCharArray())
            System.out.printf("\\u%04x", (int) ch);
        System.out.println("\"");
        byte[] asBytes = title.getBytes("big5");
        for (byte b : asBytes)
            System.out.printf("%02x ", (int) (b & 0xff));
        System.out.println();
}The rest is just a test harness.
Edited by: sabre150 on Feb 9, 2012 10:02 AM
Removed redundant code

Similar Messages

  • Replace multiple characters in a string

    Hi, I have some string fields that contain special characters such as ô and û and I want to replace them with ō and ū, and their upper case equivalents as well. How do I write a single formula that would find any of these characters in a string and replace them?
    Thanks,
    Will

    replace(replace(replace(replace(x,'ô','ō'),'û','ū'),'Ô','Ö'),'Û','Ü');
    where x is the string field.  I suggest using the unicodes rather than the actual character.  I do not think that I have the correct uppercase characters.  Please ensure that they are correct in your formula.

  • Need help in replacing special characters in a string

    Hi,
    please let me know the best way to replace all the special characters in a string with space.
    other than alphabets and numbers
    with regards.
    sumanth.

    please let me know the best way to replace all the special characters in a string with space.
    other than alphabets and numbers
    >
    Sumanth Nag Kristam wrote:
    > actually i need to replace hexa decimal char 0X1A in a string.... that is 'substitue' as per the chart
    > any pointers....
    >
    > chk the link for the ASCII codes
    > http://www.techonthenet.com/ascii/chart.php
    But in Hexa decimal value there is no special characters?

  • Regular expressions-how to replace [ and ] characters from a string

    Hi,
    my input String is "sdf938 [98033]". Now from this given string, i would like to replace the characters occurring within square brackets to empty string, including the square brackets too.
    my output String needs to be "sdf938" in this case.. How should I do it using regular expressions? I tried several possible combinations but didn't get the expected results.

    "\\s*\\[[^\\]]+\\]"

  • Replace Special Characters in a String

    Let's say someone copies the folowing list and pastes it into a Javascript prompt box:
    302304
    305678
    245675
    How do I manipulate the string so it reads:
    302304 305678 245675
    In other words, how do I replace the manual line breaks in a string with spaces?
    Thanks!

    \r and \n are both \s characters.
    This is functionally equivalent to what you have...
    stringName = stringName.replace(/\s/g,' ');
    I highly recommend browsing this website (lots of useful info there):
    http://www.regular-expressions.info/
    Harbs

  • How to replace escape characters in a string ?

    Hi All,
    In my application I came across a problem where I want to replace a substring (contains escape characters also) with another string. The below shown code will replicate my problem :
    public class StringSearchAndReplace {
      public static void main(String args[])   {
        String stmt = " \\pntext\\bullet\\tab The question as to ";
        String newStmt = stmt.replaceAll("\\bullet\\tab",  "B");
        System.out.println("BEFORE: " + stmt + "\n");
        System.out.println("AFTER: " + newStmt);
    }Here I want to replace "\\bullet\\tab" with "B". I am unable to move further. Please help/suggest me in this regard.
    Its urgent.
    Thanks in advance.

    Satyaprasad_Mylavarapu wrote:
    Hi All,
    In my application I came across a problem where I want to replace a substring (contains escape characters also) with another string. The below shown code will replicate my problem :
    public class StringSearchAndReplace {
    public static void main(String args[])   {
    String stmt = " \\pntext\\bullet\\tab The question as to ";
    String newStmt = stmt.replaceAll("\\bullet\\tab",  "B");
    System.out.println("BEFORE: " + stmt + "\n");
    System.out.println("AFTER: " + newStmt);
    }Here I want to replace "\\bullet\\tab" with "B". I am unable to move further. Please help/suggest me in this regard.
    Its urgent.
    Thanks in advance.If the String you're trying to replace contains a slash you need four slashes (double it for java, then again because String.replaceAll takes a regular expression)
    So I think what you're looking for is:
    String newStmt = stmt.replaceAll("\\\\bullet\\\\tab",  "B");I haven't tested that though.

  • Replace Special Characters in a string + XQUERY

    Hi All,
    I am using the following replace function to replace the special characters in my XQUERY.
    *replace($string1, '[&"-*;-`!|:,¿/{}@#$%^*~()_+-]', ' ')*
    Fortunately it is replacing all the special characters. But the only problem here is it is even replacing capitol letters in to spaces, which I don't want.
    Please help me out!!

    Hi
    <<< What did u change >?> I dont see any change with the Regex ?  >>
    if you observe in the modified function, he just removed (*) inside the replace funtion.
    Thanks
    Shankar AUNV

  • PHP: automatically replace alphanumeric characters in a string

    Quick question:
    How do I program a PHP script to replace all non-alphanumeric
    charters within a string, with underscores?

    .oO(AngryCloud)
    >Yes, this does help, although it is only alphabetic.
    >
    >Will changing the line to this make it alphanumeric?:
    >
    >$new_string = preg_replace("/[^a-zA-Z0-9]/", "_",
    $string);
    Did you try it? ;-)
    You could also use this shorter pattern:
    /[^a-z\d]/i
    Should be the same (\d matches decimals and the /i modifier
    makes the
    entire thing case-insensitive).
    Micha

  • A clever way of replacing some characters in a string

    I have a string passed between methods in classes. Due to a face that I don't have all data ready when the string is constructed at the first place. I need to a few of sections in the string before I can use it. I am wondering any clever way to carry out this task.
    Here is a sample of such string:
    <web site name>/book.html?bookid=<book id data + series data>&language=<language data>
    where <web site name> <series data> from the first method and <book id data> and <language data> from the second method.

    One way would be to pass a String array that contains the sections that need to have other pieces inserted between them:
    [ "/book.html?bookid=" , "&language=" ]That's not a very flexible solution though, a better way would be to pass a Map<String, String> around, which holds the bookid, language, etc. Strings as keys, and lets the other methods map them to <book id data + series data>, <language data>, etc. values. Then when they're all populated, construct the final url from those values.
    That way if the params of the string change, it's easier to change the methods that populate the values.

  • Replace characters in a string

    Hi,
    I need to replace all occurrences of control characters except space,newline,tabs in a string . I can give a replace statement for each of these characters but I want to avoid this by making use of regular expressions. Can anyone help me in this regard.
    I tried using the following replace statements with regular expression, but i am not getting the required results:
    replace all occurrences of REGEX '[[:cntrl:]]' in lv_char with space replacement count lv_count_r.
    ---> this replaces even the spaces
    replace all occurrences of REGEX '[[:cntrl:]][^[:space:]]' in lv_char with space replacement count lv_count_r.
    --> this replaced even some alpha numeric characters
    Thanks and Regards,
    Shankar

    is there anyway to do this without using regular
    expressions.. regular expressions are the last
    solution for me..Remember that you can never really replace the characters of a String. Strings are immutable. Once created they cannot change.

  • Unicode characters not displayed in text property

    I am developing a web application with Flex Builder. I write
    the text for each label using a font called Dhivehi which is
    written from left to right, and then copy the text and paste it in
    the label property called text.
    However in the source code view the text property of the
    label shows
    text=""
    The issue is that when rendered the text is rversed. So I
    want to run a function once the application is loaded, to reverse
    the text in the label, so that the text will appear in it's
    original way.
    any help will be very much appreciated

    Hi,
    I have a strange problem here with Windows.Forms.RichTextBox, when I assign a .ToString() value of sting builder to a rich text box’s .Rtf Property the Unicode characters containing in string builder gets converted to ???? symbols in .Rtf property of rich
    text box.
    Could you please let me know if Rich text box’s .Rtf property can hold Unicode characters? or is there any other way to store the Unicode characters in rich text box?
    Thanks & Regards,
    Tabarak
    Hello,
    To clarify and help you get proper solution, I would recommend you share a rtf string or even a simple sample which could reproduce that issue with us.
    We will based on that sample to help you.
    Regards,
    Carl
    We are trying to better understand customer views on social support experience, so your participation in this interview project would be greatly appreciated if you have time. Thanks for helping make community forums a great place.
    Click
    HERE to participate the survey.

  • Unicode Characters in Label/JLabels

    Hi All,
    Does anyone know how when any unicode characters within a String get transformed into the character they represent? I ask because I'm getting conflicting behaviour depending on whether the String is hard-coded or read from file at runtime.
    For instance, the following code works fine and produces a label on the GUI containing the infinity character:
    String name = "100 to \u221E";
    JLabel label = new JLabel(name);
    However, if <name> is read from an XML file, the label produced shows "100 to \u221E" verbatim.
    Has anyone else seen this effect?
    Thanks in advance for any advice,
    Andy Chamberlain

    Thanks for that. If I understand correctly, is it
    therefore the case that by the time the JLabel
    constructor gets called, the String object ("name", in
    this case) already has any unicode characters
    encoded within it?
    Exactly. The compiled .class file already has the unicode characters in it; JLabel has nothing to do with it.
    If so, then when debugging, any such characters must
    get decoded again back to ASCII when the value of
    "name" is inspected within the debugger environment
    (JDeveloper in this case).
    Depends on the unicode-awareness of JDeveloper; I don't know anything about it.
    And the finger would certainly then point to when the
    String was created by the XML parser (I'm using
    org.dom4j.io.SAXReader). I'll investigate this
    further.If you have a text editor that can save a file in UTF-8, you could try saving the xml with the infinity symbol as plain text and specify the encoding of the file with <?xml encoding='UTF-8'?>... Or does your parser accept the &#some-decimal-number; way?

  • Replace multiple characters to single character

    Hi friends,
    I would like to replace multiple characters in a string to a single character. The character may be a pipe (|) or a tilde (~).
    For example "|||asdf||123|xyz||" should be changed to "asdf|123|xyz".
    I use Oracle 11g2.
    Thanks in advance!

    Without regexp
    with testdata as (
        select '||asdf||||123|xyz|' str from dual union all
        select 'asdf|123|||||||xyz||||' from dual union all
        select '||||||asdf|||123||||||xyz||||||' from dual
    select
    trim (both '|' from
        replace (
            replace (
                replace (str,'|','|'||chr(0))
                ,chr(0)||'|'
            ,chr(0)
    ) str
    from testdata
    STR
    "asdf|123|xyz"
    "asdf|123|xyz"
    "asdf|123|xyz"
    with regexp
    with testdata as (
        select '||asdf||||123|xyz|' str from dual union all
        select 'asdf|123|||||||xyz||||' from dual union all
        select '||||||asdf|||123||||||xyz||||||' from dual
    select
    trim (both '|' from
        regexp_replace (
            str
            ,'\|+'
            ,'|'
    ) str
    from testdata
    STR
    "asdf|123|xyz"
    "asdf|123|xyz"
    "asdf|123|xyz"
    Message was edited by: chris227 regexp added

  • Problem in replacing characters of a string ?

    Hello everybody,
    I want to replace a few characters with their corresponding unicode codepoint values.
    I have a userdefined method that gets the unicode codepoint value for a character.
    1. I want to know how to replace the characters and have the replaced string after the comparision is over in the for loop in my main.
    Currently , i am able to replace , but i am not able to have the replacements done in a single variable.
    The output of the code is
    e\u3006ame
    ena\u3005e
    But i want the output i require is,
    e\u3006a\u3005e
    Please offer some help in this regard
    import java.io.*;
    class Read1
         public static void main(String s[])
             String rp,snd;
             String tmp="ename";
             for(int i=0;i<tmp.length();i++)
                 snd=getCodepoint(tmp.charAt(i));
                 if(snd!=null)
                    rp=replace(tmp,String.valueOf(tmp.charAt(i)),"\\u"+snd);
                    System.out.println(rp);
    public static String replace(String source, String pattern, String replace)
         if (source!=null)
             final int len = pattern.length();
             StringBuffer sb = new StringBuffer();
             int found = -1;
             int start = 0;
             while( (found = source.indexOf(pattern, start) ) != -1)
                 sb.append(source.substring(start, found));
                 sb.append(replace);
                 start = found + len;
             sb.append(source.substring(start));
             return sb.toString();
         else return "";
    ...,Any help in this regard would be useful
    Thanks
    khurram

    This manual replacement thingy reminds me of quite an old technique, when
    64KB of memory was considered enough for 20 users (at the same time that is!)
    Suppose you have a buffer of, say, n characters. Starting at location i, a region
    of chars have to be swapped with bytes starting at location j >= i+l_i; the lengths
    of the two regions are l_i and l_j respectively.
    Suppose the following method is available:public void reverse(char[] buffer, int f, int l_f) {
       for (int t= f+l_f; --t > f; f++) {
          char tmp=buffer[f]; buffer[f]= buffer[t]; buffer[t]= tmp;
    }i.e. the above method reverses a region of characters, starting at position f
    with length l_f. Given this simple method, the original problem can be solved
    using the following simple sequence:reverse(buffer, i, j+l_j);
    reverse(buffer, i, l_j);
    reverse(buffer, i+l_j, j-i-l_i);
    reverse(buffer, j+l_j-l_i, l_i);Of course, when replacing characters we don't need the last reversal.
    kind regards,
    Jos (dinosaurus)

  • Search and replace for Unicode characters

    Hello,
    I have a function which searches and replaces characters on file. It works with ASCI sharacters, but not when the strings which needs to be replaced contains Unicode ('á', 'ā') etc.  The source file is codding
    utf8 .
    $file = "file.txt" 
    $SearchReplace = @($file)
    #Process files by performing a search and replace
    foreach ($file in $SearchReplace) 
    #Select-Object -Skip 1 |
        (Get-Content $file) | 
            Foreach-object { $_ -replace 'unicode_string' , ';'   } | 
         out-file -encoding Unicode $file
    How to get working the search (and replace) function with Unicode characters?
    Thanks!

    No. it does not. I have verified that script does not recognize the diacritic
    ('á',
    'ā') characters when at all the operations with the files I have specified encoding utf-8/Unicode.

Maybe you are looking for

  • Any beginners' guide to Project Online?

    Hi all, My company has recently subscribed to Office 365 Pro and Project Online was included in the package. From what I can tell, Project Online is a really powerful tool that can be useful in many situations. As someone who has zero experience or e

  • Session logoff with new target url using BSP_MODEL template

    Hi, I implemented the session management pages as used in ITSM. However I had some problems when calling the application in a popup (somehow the deletion of the session appears also after the 2nd start of the app deleting all my created models) So I

  • ORA-19612: datafile 0 not restored due to missing data

    Hello, I'm testing a restore of the controlfile. The database is in archivelog mode and autobackup is set to ON. I successfully did a full backup and backup as copy database. To test I did the following: - note the DBID - shutdown the database - rena

  • Poor photo brightness accuracy on Photosmart printers

    I had a 3 year old HP photosmart printer that did not accurately reproduce the image I was seeing on my Sony LCD computer screen.  Photos printed much darker than they appeared on my screen or camera viewfinder,  This happens irrespective of the phot

  • After installing lion get an error -50 when trying to burn music to cd - hand no problem before lion

    Is anyone else having this problem??  I always burn audio books/music to cd and have had no problems doing so UNTIL I installed Lion.  Now I keep getting an error message. It is error -50.  This NEVER happened before. I'm very frustrated - - I tried