Removing unicode control characters from string

Hi.
I have a webservice where I return an object (with some strings) back to the client. The information is read from a database, and the string can sometimes contain invalid xml characters (like unicode 0x13). This results in an error when parsing the information at the client side.
Is there someway a easy way to set up a filter or something that checks whether the string contains characters outside the valid range specified for XML's (lower than unicode 0x20 etc), and removes them/replace them with a different character?

If you have to get rid of the control chars then       String someText = "a\nb\nc\td\re\r\nf";
        String someTextWithoutControlChars = someText.replaceAll( "[\u0000-\u0020]","");
        System.out.println(someTextWithoutControlChars);but like kaj says, some control chars are valid.

Similar Messages

  • Removing the Control Characters from a text file

    Hi,
    I am using the java.util.regex.* package to removing the control characters from a text file. I got below programming from the java.sun site.
    I am able to successfully compile the file and the when I try to run the file I got the error as
    ------------------------------------------------------------------------D:\Debi\datamigration>java Control
    Exception in thread "main" java.util.regex.PatternSyntaxException: Illegal repet
    ition
    {cntrl}
    at java.util.regex.Pattern.error(Pattern.java:1472)
    at java.util.regex.Pattern.closure(Pattern.java:2473)
    at java.util.regex.Pattern.sequence(Pattern.java:1597)
    at java.util.regex.Pattern.expr(Pattern.java:1489)
    at java.util.regex.Pattern.compile(Pattern.java:1257)
    at java.util.regex.Pattern.<init>(Pattern.java:1013)
    at java.util.regex.Pattern.compile(Pattern.java:760)
    at Control.main(Control.java:24)
    Please help me on this issue.
    Thanks&Regards
    Debi
    import java.util.regex.*;
    import java.io.*;
    public class Control {
    public static void main(String[] args)
    throws Exception {
    //Create a file object with the file name
    //in the argument:
    File fin = new File("fileName1");
    File fout = new File("fileName2");
    //Open and input and output stream
    FileInputStream fis =
    new FileInputStream(fin);
    FileOutputStream fos =
    new FileOutputStream(fout);
    BufferedReader in = new BufferedReader(
    new InputStreamReader(fis));
    BufferedWriter out = new BufferedWriter(
    new OutputStreamWriter(fos));
         // The pattern matches control characters
    Pattern p = Pattern.compile("{cntrl}");
    Matcher m = p.matcher("");
    String aLine = null;
    while((aLine = in.readLine()) != null) {
    m.reset(aLine);
    //Replaces control characters with an empty
    //string.
    String result = m.replaceAll("");
    out.write(result);
    out.newLine();
    in.close();
    out.close();

    Hi,
    I used the code below with the \p, but I didn't able to complie the file. It gave me an
    D:\Debi\datamigration>javac Control.java
    Control.java:24: illegal escape character
    Pattern p = Pattern.compile("\p{cntrl}");
    ^
    1 error
    Please help me on this issue.
    Thanks&Regards
    Debi
    // The pattern matches control characters
    Pattern p = Pattern.compile("\p{cntrl}");
    Matcher m = p.matcher("");
    String aLine = null;

  • Removing non-numeric characters from string

    Hi there,
    I need to have the ability to remove non-numeric characters from a string and I do not know how to do this.
    Does any one know a way?
    Example:
    Present String: (02)-2345-4607
    Required String: 0223454607
    Thanks in advance

    Dear NickM
    Try this this will work...........
    create or replace function char2num(mstring in varchar2) return integer
    is
    -- Function to remove Special characters and alphebets from phone no. string field
    -- Author - Valid Bharde.(India-Mumbai)
    -- Date :- 20 Sept 2006.
    -- This Function will return numeric representation.
    -- The Folowing program is gifted to NickM with respect to his post on oracle site regarding Removing non-numeric characters from string on the said date
    mstatus number :=0;
    mnum number:=0;
    mrefstring varchar2(50);
    begin
    mnum := length(mstring);
    for x in 1..mnum loop
    if (ASCII(substr(upper(mstring),x,1)) >= 48 and ASCII(substr(upper(mstring),x,1)) <= 57) then
    mrefstring := mrefstring || substr(mstring,x,1);
    end if;
    end loop;
    return mrefstring;
    end;
    copy the above program and use it at function for example
    SQL> select char2num('(022)-453452781') from dual;
    CHAR2NUM('(022)-453452781')
    22453452781
    Chao!!!

  • Removing last 2 characters from string field

    I am trying to remove the last 2 characters of a string field.
    there is no consistant length in the field
    316R1
    12364R1
    i want to remove everything after R
    i tried instrrev but i that didnt do it.
    is there a way to say
    start position1 and go the R
    thanks

    formula,
    left (field, length_formula) is the solution
    the length_formula is the number of chars form left
    f.i
    left(field, length (field)-2)
    left (field, InstrRev, field,"R")
    of course a combination with Right is also possible

  • Remove unicode control character

    hey, so i've asked this around a few places and recieved no answers, but the arch community seems like a bunch of smart people and i've recently switched over from ubuntu.
    is there a way to remove input methods and unicode control characters from the context (right-click) menu? i've searched and searched, but i've had no luck with this.
    there's a way to remove it in gnome with gconf, but i've tried that, and no luck.
    i'm using openbox, and i think i found something about my gtkrc file along these lines
    gtk-show-input-method-menu =
    gtk-show-unicode-menu =
    both are "gboolean" which i don't know what it is, or if this is even what i want.
    any help would be appreciated, as i never use these menus.

    U+2415, SYMBOL FOR NEGATIVE ACKNOWLEDGE is not a control character. It is a normal symbol character, which is a substitute display character used when the control character U+0015 NEGATIVE ACKNOWLEDGE is to be displayed instead of being interpreted.
    You need to refine your question.
    In general, you can remove any particular character or characters from a string using the SQL functions TRANSLATE or REPLACE. You can use CHR or UNISTR to encode characters that you cannot enter from a keyboard. You can use REGEXP_REPLACE with POSIX character classes to remove broader ranges of characters.
    -- Sergiusz

  • Removing Non-numeric characters from Alpha-numeric string

    Hi,
    I have one column in which i have Alpha-numeric data like
    COLUMN X
    +91 (876) 098 6789
    1-567-987-7655
    so on.
    I want to remove Non-numeric characters from above (space,'(',')',+,........)
    i want to write something generic (suppose some function to which i pass the column)
    thanks in advance,
    Mandip

    This variation uses the like operators pattern recognition to remove non alphanumeric characters. It also
    keeps decimals.
    Code Snippet
    CREATE FUNCTION dbo.RemoveChars(@Str varchar(1000))
    RETURNS VARCHAR(1000)
    BEGIN
    declare @NewStr varchar(1000),
    @i int
    set @i = 1
    set @NewStr = ''
    while @i <= len(@str)
    begin
    --grab digits or (| in regex) decimal
    if substring(@str,@i,1) like '%[0-9|.]%'
    begin
    set @NewStr = @NewStr + substring(@str,@i,1)
    end
    else
    begin
    set @NewStr = @NewStr
    end
    set @i = @i + 1
    end
    RETURN Rtrim(Ltrim(@NewStr))
    END
    GO
    Code to validate:
    Code Snippet
    declare @t table(
    TestStr varchar(100)
    insert into @t values ('+91 (8.76) \098 6789');
    insert into @t values ('1-567-987-7655');
    select dbo.RemoveChars(TestStr)
    from @t

  • How to stop control characters from being tokenized?

    In Oracle 11.2.0.3
    I have documents which has embeded Arabic and so there is a control character \u202b to indicate right-to-left string. How do I stop these kinds of control characters from being tokenized. Doing a search with CONTAINS(search_string,' \u202b')>0 finds documents. Do I just add the '\u202b' as a stopword?

    What lexer are you using, Amin?
    I assume from what you say that the characters are getting indexed as whole tokens?  If so, adding them as stopwords should work, but I'm surprised they're getting indexed at all - sounds like a fault in the "alphanumeric indentification" code to me.
    I'm going to take a guess this is the World Lexer.  Am I right?

  • Removing non printable characters from an excel file using powershell

    Hello,
    anyone know how to remove non printable characters from an excel file using powershell?
    thanks,
    jose.

    To add - Excel is a binary file.  It cannot be managed via external methods easily.  You can write a macro that can do this.  Post in the Excel forum and explain what you are seeing and get the MVPs there to show you how to use the macro facility
    to edit cells.  Outside of cell text "unprintable" characters are a normal part of Excel.
    ¯\_(ツ)_/¯

  • Remove un recognized characters from a field of a table.

    Is there any way to remove un recognized characters from content of field in a table. Please help.
    Thx

    If you know the characters that you want removed, you may have success using the TRANSLATE function.
    The key here as stated in the SQL Language Reference is
    the extra characters at the end of from_string have no corresponding characters in to_string. If these extra characters appear in char, then they are removed from the return value.
    The other useful function to remove unwanted junk is REGEXP_REPLACE - but you must be on 10g or later for regular expression support. Translate is supported at least as far back as 8i, perhaps further.

  • Removing unwanted control characters in exported text files

    I am currently evaluating Crystal Reports 2008 to determine applicability to our requirements. I need to export data files to continuous text to be read by other application software. I have successfully created the files but have what I believe to be page feed or end-of-page control characters (small rectangles) in the output. Can someone enlighten me as to how I can suppress or remove these control characters?

    In the export to text options enter 0 for the number of lines per page. This will produce an unpaginated text document without the page control markers.

  • Removing non english characters from my string input source

    Guys,
    I have problem where I need to remove all non english (Latin) characters from a string, what should be the right API to do this?
    One I'm using right now is:
    s.replaceAll("[^\\x00-\\x7F]", "");//s is a string having chinese characters.
    I'm looking for a standard Solution for such problems, where we deal with multiple lingual characters.
    TIA
    Nitin

    Nitin_tiwari wrote:
    I have a string which has Chinese as well as Japanese characters, and I only want to remove only Chinese characters.
    What's the best way to go about it?Oh, I see!
    Well, the problem here is that Strings don't have any information on the language. What you can get out of a String (provided you have the necessary data from the Unicode standard) is the script that is used.
    A script can be used for multiple languages (for example English and German use mostly the same script, even if there are a few characters that are only used in German).
    A language can use multiple scripts (for example Japanese uses Kanji, Hiragana and Katakana).
    And if I remember correctly, then Japanese and Chinese texts share some characters on the Unicode plane (I might be wrong, 'though, since I speak/write neither of those languages).
    These two facts make these kinds of detections hard to do. In some cases they are easy (separating latin-script texts from anything else) in others it may be much tougher or even impossible (Chinese/Japanese).

  • Removing Non-Ascii Characters from a String

    Hi Everyone,
    I would like to remove all NON-ASCII characters from a large string. For example, I am taking text from websites and would like to remove all the strange arabic and asian characters. How can I accomplish this?
    Thank you in advance.

    I would like to remove all NON-ASCII characters from a large string. I don't know if its a good method but try this:
    str="\u6789gj";
    output="";
    for(char c:str.toCharArray()){
         if((c&(char)0xff00)==0){
              output=output+c;
    System.out.println(output);
    all the strange arabic and asian characters.Don't call them so.... I am an Indian Muslim ;-) ....
    Thanks!

  • Removing number of characters from end of string

    Hi all....
    Dooza very kindly helped me with trimming a string in an
    earlier post, but
    now i want to remove the last four characters from a string.
    I really should know how to do this and will have to do some
    bedtime reading
    :-|
    But, for now, could someone help!
    Thanks
    Andy

    Ah Dooza - Thank You.
    To the rescue again :-D
    You're helping me to see the logic...
    Thanks Again
    Andy
    "Dooza" <[email protected]> wrote in message
    news:gbafel$ra3$[email protected]..
    > Andy wrote:
    >> Hi all....
    >> Dooza very kindly helped me with trimming a string
    in an earlier post,
    >> but now i want to remove the last four characters
    from a string.
    >> I really should know how to do this and will have to
    do some bedtime
    >> reading :-|
    >>
    >> But, for now, could someone help!
    >
    > Hi Andy,
    > Try something like this:
    > <%
    > myStr = "this is my really long string"
    > Response.Write(LEFT(myStr,LEN(myStr) -4))
    > %>
    >
    > Dooza

  • Removing characters from string after a certain point

    Im very new to java programming, so if this is a realy stupid question, please forgive me
    I want to remove all the characters in my string that are after a certain character, such as a backslash. eg: if i have string called "myString" and it contains this:
    109072\We Are The Champions\GEMINI
    205305\We Are The Champions\Queen
    4416\We Are The Champions\A Knight's Tale
    a00022723\We Are The Champions\GREEN DAYi would like to remove all the characters after the first slash (109072*\*We...) leaving the string as "109072"
    the main problem is that the number of characters before and after is not the always the same, and there is often letters in the string i want, so it cant be separated by removing all letters and leaving the numbers
    Is there any way that this can be done?
    Thanks in advance for all help.

    You must learn to use the Javadoc for the standard classes. You can download it or reference it on line. For example [http://java.sun.com/javase/6/docs/api/java/lang/String.html|http://java.sun.com/javase/6/docs/api/java/lang/String.html].

  • How to remove end of line from string?

    Hello,
    I'd like to remove ends of line from the string. I've tried:
        static final Pattern END_LINE_PATTERN = Pattern.compile("$+");
        strBuf.append(input);
        Matcher m = END_LINE_PATTERN.matcher(input);
        int startIndex = -1;
        int endIndex;
        while (m.find()) {
         startIndex = m.start();
         endIndex = m.end();
         if (endIndex == strBuf.length() - 1) break;
        if (startIndex > -1) {
         strBuf.setLength(startIndex);
        }For strings "hello\n" and "hello\r" it works properly, but for string "hello\n\n" I get first occurrence at index 6 (at second \n), so as the result I get "hello\n". For the string "hello\r\n" I get first occurrence at index 5 (it's OK), but the end index is 5 as well, and the next occurrence I get at index 7, which doesn't give me any sense.
    Hope somebody can help me.
    Agata

    What you're trying to do is remove one or more line separators from the end of a string ("\n", "\r", and "\r\n" each count as one line separator, but "\n\n" is two line separators). This is all you need to do: str = str.replaceAll("[\r\n]+$", ""); "$" doesn't match any characters, line separators or otherwise; it matches the position at the end of the string. In MULTILINE mode, it also matches the position before a line separator, but it still doesn't match the separator itself.

Maybe you are looking for

  • Setting classpath and library path...

    hello.. i have downloaded the jfreechart1.0.4 for adding chart in my java application...though i have gone through the installation guide...but i havent figured it out how to make the packages available to my java application.... i am very new to jav

  • I cannot access the settings tab in the management console

    I am in the management console for my verizon d-link router and i cannot access the settings tab.. i got to the router/verizon management console via the default gateway ip address and used the default password which worked and then tried to access s

  • Unable to launch jar in dist folder

    Hello Everyone. I'm using Netbeans 7.1, javafx 2.02. I have a small application in JavaFX, the software executed fine from Netbeans IDE, however when i tried to launch it by double clicking the jar file directly, i got this "JavaFx launcher Error, ex

  • How can I know how many views are maintained for specific or group material

    Hello, All, How can I get know how many views which are already maintained in the material master for one specific or one group material? And which views? Is there any T-code or table to get those views list for material? Thank you very much

  • Int/Double workaround with an imported class.

    Hi, I've been having a heck of a time trying to figure something out. I have a bunch of data that is in double format (eg 10.351). However, I'm importing a class awt.Polygon for a bunch of reasons, and one of it's key methods (addPoints();) only acce