Removing non english characters from my string input source

Guys,
I have problem where I need to remove all non english (Latin) characters from a string, what should be the right API to do this?
One I'm using right now is:
s.replaceAll("[^\\x00-\\x7F]", "");//s is a string having chinese characters.
I'm looking for a standard Solution for such problems, where we deal with multiple lingual characters.
TIA
Nitin

Nitin_tiwari wrote:
I have a string which has Chinese as well as Japanese characters, and I only want to remove only Chinese characters.
What's the best way to go about it?Oh, I see!
Well, the problem here is that Strings don't have any information on the language. What you can get out of a String (provided you have the necessary data from the Unicode standard) is the script that is used.
A script can be used for multiple languages (for example English and German use mostly the same script, even if there are a few characters that are only used in German).
A language can use multiple scripts (for example Japanese uses Kanji, Hiragana and Katakana).
And if I remember correctly, then Japanese and Chinese texts share some characters on the Unicode plane (I might be wrong, 'though, since I speak/write neither of those languages).
These two facts make these kinds of detections hard to do. In some cases they are easy (separating latin-script texts from anything else) in others it may be much tougher or even impossible (Chinese/Japanese).

Similar Messages

  • Removing non-English characters from data.

    Ours is global system with some data with non-English characters. We want to download file by removing this non-English characters.
    Any suggestions how we can remove these non-English characters from file..?

    The FM u said
         Replace non-standard characters with standard characters
       Functionality
         SCP_REPLACE_STRANGE_CHARS processes a text so that it only contains
         simple characters. Special characters and national characters are
         replaced in such a way that the text remains reasonably legible.
         The character set 1146 is used by default. In this case the following
         replacements are made, for example:
          Æ ==> AE        (AE)
          Â ==> A         (Acircumflex)
          Ä ==> Ae        (Adieresis)
          £ ==> L         (sterling)
         Note that the new text can be longer than the old.
    So i dont think it ll be useful for eliminating the sp. chars.
    U have to check each and every alphabet with std 26 alphabets
    Thanks & Regards
    vinsee

  • Removing Non-Ascii Characters from a String

    Hi Everyone,
    I would like to remove all NON-ASCII characters from a large string. For example, I am taking text from websites and would like to remove all the strange arabic and asian characters. How can I accomplish this?
    Thank you in advance.

    I would like to remove all NON-ASCII characters from a large string. I don't know if its a good method but try this:
    str="\u6789gj";
    output="";
    for(char c:str.toCharArray()){
         if((c&(char)0xff00)==0){
              output=output+c;
    System.out.println(output);
    all the strange arabic and asian characters.Don't call them so.... I am an Indian Muslim ;-) ....
    Thanks!

  • Removing non-numeric characters from string

    Hi there,
    I need to have the ability to remove non-numeric characters from a string and I do not know how to do this.
    Does any one know a way?
    Example:
    Present String: (02)-2345-4607
    Required String: 0223454607
    Thanks in advance

    Dear NickM
    Try this this will work...........
    create or replace function char2num(mstring in varchar2) return integer
    is
    -- Function to remove Special characters and alphebets from phone no. string field
    -- Author - Valid Bharde.(India-Mumbai)
    -- Date :- 20 Sept 2006.
    -- This Function will return numeric representation.
    -- The Folowing program is gifted to NickM with respect to his post on oracle site regarding Removing non-numeric characters from string on the said date
    mstatus number :=0;
    mnum number:=0;
    mrefstring varchar2(50);
    begin
    mnum := length(mstring);
    for x in 1..mnum loop
    if (ASCII(substr(upper(mstring),x,1)) >= 48 and ASCII(substr(upper(mstring),x,1)) <= 57) then
    mrefstring := mrefstring || substr(mstring,x,1);
    end if;
    end loop;
    return mrefstring;
    end;
    copy the above program and use it at function for example
    SQL> select char2num('(022)-453452781') from dual;
    CHAR2NUM('(022)-453452781')
    22453452781
    Chao!!!

  • Removing non-english characters

    Hi,
    I'm trying to define a regular expression that helps me to replace non-english characters from a string.
    For example:
    BESANÇON
    and I need to get something like: BESANCON, or BESAN*ON.
    Could any one give me some hints?
    Max A.

    You can use the convert function:
    SELECT CONVERT('BESANÇON','US7ASCII')
    FROM dual;
    CONVERT(
    BESANCON
    1 row selected.

  • Removing Non-numeric characters from Alpha-numeric string

    Hi,
    I have one column in which i have Alpha-numeric data like
    COLUMN X
    +91 (876) 098 6789
    1-567-987-7655
    so on.
    I want to remove Non-numeric characters from above (space,'(',')',+,........)
    i want to write something generic (suppose some function to which i pass the column)
    thanks in advance,
    Mandip

    This variation uses the like operators pattern recognition to remove non alphanumeric characters. It also
    keeps decimals.
    Code Snippet
    CREATE FUNCTION dbo.RemoveChars(@Str varchar(1000))
    RETURNS VARCHAR(1000)
    BEGIN
    declare @NewStr varchar(1000),
    @i int
    set @i = 1
    set @NewStr = ''
    while @i <= len(@str)
    begin
    --grab digits or (| in regex) decimal
    if substring(@str,@i,1) like '%[0-9|.]%'
    begin
    set @NewStr = @NewStr + substring(@str,@i,1)
    end
    else
    begin
    set @NewStr = @NewStr
    end
    set @i = @i + 1
    end
    RETURN Rtrim(Ltrim(@NewStr))
    END
    GO
    Code to validate:
    Code Snippet
    declare @t table(
    TestStr varchar(100)
    insert into @t values ('+91 (8.76) \098 6789');
    insert into @t values ('1-567-987-7655');
    select dbo.RemoveChars(TestStr)
    from @t

  • Removing non printable characters from an excel file using powershell

    Hello,
    anyone know how to remove non printable characters from an excel file using powershell?
    thanks,
    jose.

    To add - Excel is a binary file.  It cannot be managed via external methods easily.  You can write a macro that can do this.  Post in the Excel forum and explain what you are seeing and get the MVPs there to show you how to use the macro facility
    to edit cells.  Outside of cell text "unprintable" characters are a normal part of Excel.
    ¯\_(ツ)_/¯

  • Remove all non-number characters from a string

    hi
    How i can remove all non-number characters from a column ? for example , i have a column that contains data like
    'sd3456'
    'gfg87s989'
    '45/45fgfg'
    '4354-df4456'
    and i want to convert it to
    '3456'
    '87989'
    '4545'
    '43544456'
    thx in adv

    Or in 9i,
    Something like this ->
    satyaki>
    satyaki>with vat
      2  as
      3    (
      4      select 'sd3456' cola from dual
      5      union all
      6      select 'gfg87s989' from dual
      7      union all
      8      select '45/45fgfg' from dual
      9      union all
    10      select '4354-df4456' from dual
    11    )
    12  select translate(cola,'abcdefghijklmnopqrstuvwxyz-/*#$%^&@()/?,<>;:{}[]|\`"',' ') res
    13  from vat;
    RES
    3456
    87989
    4545
    43544456
    Elapsed: 00:00:00.00
    satyaki>
    {code}
    I checked this with minimum test cases. It will be better if you checked it with other cases.
    Regards.
    Satyaki De.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   

  • PDF generation for Non English Characters from ADF

    Hi
    We are using below piece of code to generate pdf from ADF Managed bean. It works fine. However for non English Characters(eg. Japanese,Vietnamese,Arabic)  it puts
    I got few blogs
    https://blogs.oracle.com/BIDeveloper/entry/non-english_characters_appears
    However we are not using BI Publisher product . We are using its API's
    Can anyone tell where do we need to setup fonts within ADF or Weblogic or Server ?
    Input Parameters are
    a)xml Data
    b)InputStream  ie rtf Template
    import oracle.apps.xdo.XDOException;
    import oracle.apps.xdo.template.FOProcessor;
    import oracle.apps.xdo.template.RTFProcessor;
        public static byte[] genPdfRep(String pOutFileType,byte[] pXmlOut ,InputStream pTemplate)
            byte[] dataBytes = null;
            try {
                //Process RTF template to convert to XSL-FO format
                RTFProcessor rtfp = new RTFProcessor(pTemplate);
                ByteArrayOutputStream xslOutStream = new ByteArrayOutputStream();
                rtfp.setOutput(xslOutStream);
                rtfp.process();
                //Use XSL Template and Data from the VO to generate report and return the OutputStream of report
                ByteArrayInputStream xslInStream = new ByteArrayInputStream(xslOutStream.toByteArray());
                FOProcessor processor = new FOProcessor();
                ByteArrayInputStream dataStream = new ByteArrayInputStream((byte[])pXmlOut);  
                processor.setData(dataStream);
                processor.setTemplate(xslInStream);
                ByteArrayOutputStream pdfOutStream = new ByteArrayOutputStream();
                processor.setOutput(pdfOutStream);
                byte outFileTypeByte = FOProcessor.FORMAT_PDF;
                processor.setOutputFormat(outFileTypeByte); //FOProcessor.FORMAT_HTML
                processor.generate();
                dataBytes = pdfOutStream.toByteArray();
            } catch (XDOException e) {
                e.printStackTrace();
            return dataBytes;
    Appreciate your help.
    Thanks,
    Abhijit

    Fonts are defined in the template you use to generate the pdf. Your application add the data and both is processed yb the FOP processor. Now there are two possible causes of the '???' :
    1. the data you sent to the template contains the '???' already
    2. the template can't digest the data (the special characters) and puts '???' in the pdf.
    Before going on you have to find out which one is your problem. The 2nd is the problem you better ask this in a FOP forum as you have to solve it by changing the template.
    Timo

  • How to retrieve non-english characters from a query

    Hello,
    My apologies if this post is not in its proper place, but I was a bit confused where to add it.
    I'm running a query using SQL Developer on a table which contains several companies names from many different countries, and one of the checks I need to make to ensure data consistency is to search for all rows which the name of company contains special or non-english characters (like ç, ã, ä as example).
    I don't know what can I use to do this. I tried to collate using NLS_SORT but it didn't work.
    Is there someway to select only the rows that contain these special or non-english characters, excluding from the results the rows that only have english characters? Please have in mind that we have many languages in this table.
    The field I would like to make the conditions on is VARCHAR2.
    Please let me know if there is any extra information I should provide you so that you can help me.
    Thank you in advance for the help.
    Regards,
    Luís

    Hi Luis,
    My apologies if this post is not in its proper place, but I was a bit confused where to add it.This is the Forum for the SQL Developer Data Modeler product.
    I suggest you try using the SQL and PL/SQL Forum: PL/SQL
    David

  • Formula to show non english characters from clob in crystal report

    Hi
    I am using the database Oracle 11g with a field clob in one of the table that i want to show in the crystal report
    But the problem is when i put the clob field in crystal report it is outputting the results perfectly for English characters but not for the Arabic ones and returning the string like (¿¿¿¿ ¿¿¿¿ ¿¿¿ ¿¿¿¿ ¿¿¿¿ ¿¿¿ ¿ ¿¿¿¿¿ ¿¿¿)
    so is there any way to show the arabic (non english) characters perfectly on crystal report with clob field

    Hi Azeem,
    Make sure the arabic font should install in your system.
    Try this:
    Create a text field in your Crystal Report (a label).
    Place Arabic characters into that field (just by typing them into it on the report definition).
    Run the report. If they display correctly then its probably not Crystal, but rather would point to an issue in the data retrieval and supply to Crystal via your dataset (or whatever datasource you are using).
    If they don't display - then its definitely Crystal.

  • Parsing Non English characters from OPEN DATASET

    Hi Team,
    I'm try to download the .txt file using the OPEN DATASET in the windows environment from application server, system language is set up for English.
    The problem is, in the .txt file some times I will be getting the non-English characters (Spanish). When I try to download the data into SAP it's not downloading properly eg: ñ when it downloaded into sap it took something like A+_.
    My goal is I need to download the Spanish characters properly into sap.
    Please advice how to solve this problem.
    Thanks,
    Selvaraj

    Hi Selvaraj!
    I haven't checked the situation for 4.5B, nor can I predict exact behavior, but you can give following statement a try:
    SET LOCALE LANGUAGE lg.
    This should even change content of sy-lange for whole role area (internal session), so change the value back afterwards.
    Regards,
    Christian

  • How can I remove non-numeric characters from a cell?

    I have a file an rtf file that I can open in Numbers. It puts each line in a separate cell. Each cell contains non-numeric and numeric characters. I'd like to delete the non-numeric characters so that I can add the numbers together. Is there a way to do this easily in Numbers that doesn't require doing it manually?
    Thanks,
    David

    Ok, David,
    This solution will work for vlaues up to 99,000 and if there is a space in front of your amount. There are two parts for clarity but you could wrap them up into one formula if you wanted to.
    B2 =FIND(" ",A2,LEN(A2)−9)
    C2 =MID(A2,B2,10)
    If there is a return before your amount (certain cells in your screenshot got me wondering) then the formula in column B
    =FIND("
    ",A2,1)
    It looks funny because it is finding the return.
    Let me know if this works for you.
    quinn

  • Download none-English characters from tables

    Hi guys,
    I use apex 4.1.1 multilingual - with oracle xe 11g R2.. with listener 1.1.3 deployed on Glassfish 3.1.1
    I have Arabic data stored in my tables. if I go to Object Browser and try to download it, I get question marks instead of the letters....
    I have the same problem if I want to download the data of IR...
    Can you help please ??
    Regards,
    Fateh

    The FM u said
         Replace non-standard characters with standard characters
       Functionality
         SCP_REPLACE_STRANGE_CHARS processes a text so that it only contains
         simple characters. Special characters and national characters are
         replaced in such a way that the text remains reasonably legible.
         The character set 1146 is used by default. In this case the following
         replacements are made, for example:
          Æ ==> AE        (AE)
          Â ==> A         (Acircumflex)
          Ä ==> Ae        (Adieresis)
          £ ==> L         (sterling)
         Note that the new text can be longer than the old.
    So i dont think it ll be useful for eliminating the sp. chars.
    U have to check each and every alphabet with std 26 alphabets
    Thanks & Regards
    vinsee

  • Identify Non English Character in a String

    All,
    We have a requirement to Identify the Non English Characters from the User Key In data and return an error message saying only valid English, Numeric and some special characters are allowed.
    For Example, If the User enters data like "This is a Test data" then the return value should be true. or if he enters something like "My Native Language is inglés" then it should return false. Similarly any Chinese, russian or japansese character entryies should also return false.
    How can we achieve this?
    Thanks,
    Nagarajan.

    Hi Nagarajan,
    You could use Unicode character blocks or simply craft a regular expression that contains all the characters you need. The latter is easy to understand and gives you full control over which characters you want to allow. E.g. I assume you might want something like this:
    if(!"This is a proper input string".matches("[\\s\\w\\p{Punct}]+")) {
      // Issue error message and re-get input string
    The String method matches() takes a regular expression as input parameter. If you haven't dealt with regular expressions before, check out the Java API help for class java.util.regex.Pattern. Here's a short breakdown of the pattern I used:
    <ol>
    <li>The square brackets [] enclose a list of allowed characters; here you can explicitly list all allowed characters.</li>
    <li>You can specify ranges like a-z as a character class, list individual characters like ;:| or utilize predefined character classes (\s for any whitespace character, \w for all letters a-z and A-Z, underscore and 0-9 and the posix class \p for a list of punctuation symbols). For a complete list check Java API help on java.util.regex.Pattern.
    <li>The + at the end indicates that the characters listed can occur once or more.</li>
    </ol>
    There's other ways to achieve what you want, but I think this might be an easy way to start with.
    Cheers, harald

Maybe you are looking for

  • HOW TO: From BK7 discharge to 700 in 24 months or less!

    The logic behind it is for recon purposes. IME, the CSRs are slammed Monday from all the apps/problems over the weekend and sometimes that bleeds over in to Tuesday morning. If you app Tuesday afternoon/Wednesday, you have a better chance of getting

  • IOS 8.1. Does it solve wifi/ Bluetooth/ cell signal?

    I Thought I'd start a new thread specifically for this...couldn't see one, other than comments updating on 8.0.2, which is useful but not specific for those just searching for the first time. I'm too nervous to update as I'm still on 8.0 with a few w

  • Envelope Schema Issue : Routing failure :No Subscriber is found

    Hi, This is my envelope schema: This is my original Schema: My scenario is as follows: I have two orchestration : In orchestration A,I am getting a input schema,map that input schema to above envelope schema.Then doing de-batching with the help of ca

  • Smart Business : Analyze Sentiment

    Hi All, We have installed and configured Analyze sentiment app in our landscape as per link http://help.sap.com/fiori_bs2013/helpdata/en/f9/956752127bf55fe10000000a423f68/content.htm We have configured Web dispatcher and when  we try to access applic

  • How to run executable Java app from command line: No Main( ) method

    This is the opposite question from what most users ask. I have the Java source code and a running Java executable version of the program in a windows environment. I don't want to run it in the normal way by double-clicking on it or using the executab