Removing unicode control characters from string

Hi.
I have a webservice where I return an object (with some strings) back to the client. The information is read from a database, and the string can sometimes contain invalid xml characters (like unicode 0x13). This results in an error when parsing the information at the client side.
Is there someway a easy way to set up a filter or something that checks whether the string contains characters outside the valid range specified for XML's (lower than unicode 0x20 etc), and removes them/replace them with a different character?

If you have to get rid of the control chars then       String someText = "a\nb\nc\td\re\r\nf";
        String someTextWithoutControlChars = someText.replaceAll( "[\u0000-\u0020]","");
        System.out.println(someTextWithoutControlChars);but like kaj says, some control chars are valid.

Similar Messages

Removing the Control Characters from a text file

Hi,
I am using the java.util.regex.* package to removing the control characters from a text file. I got below programming from the java.sun site.
I am able to successfully compile the file and the when I try to run the file I got the error as
------------------------------------------------------------------------D:\Debi\datamigration>java Control
Exception in thread "main" java.util.regex.PatternSyntaxException: Illegal repet
ition
{cntrl}
at java.util.regex.Pattern.error(Pattern.java:1472)
at java.util.regex.Pattern.closure(Pattern.java:2473)
at java.util.regex.Pattern.sequence(Pattern.java:1597)
at java.util.regex.Pattern.expr(Pattern.java:1489)
at java.util.regex.Pattern.compile(Pattern.java:1257)
at java.util.regex.Pattern.<init>(Pattern.java:1013)
at java.util.regex.Pattern.compile(Pattern.java:760)
at Control.main(Control.java:24)
Please help me on this issue.
Thanks&Regards
Debi
import java.util.regex.*;
import java.io.*;
public class Control {
public static void main(String[] args)
throws Exception {
//Create a file object with the file name
//in the argument:
File fin = new File("fileName1");
File fout = new File("fileName2");
//Open and input and output stream
FileInputStream fis =
new FileInputStream(fin);
FileOutputStream fos =
new FileOutputStream(fout);
BufferedReader in = new BufferedReader(
new InputStreamReader(fis));
BufferedWriter out = new BufferedWriter(
new OutputStreamWriter(fos));
// The pattern matches control characters
Pattern p = Pattern.compile("{cntrl}");
Matcher m = p.matcher("");
String aLine = null;
while((aLine = in.readLine()) != null) {
m.reset(aLine);
//Replaces control characters with an empty
//string.
String result = m.replaceAll("");
out.write(result);
out.newLine();
in.close();
out.close();

Hi,
I used the code below with the \p, but I didn't able to complie the file. It gave me an
D:\Debi\datamigration>javac Control.java
Control.java:24: illegal escape character
Pattern p = Pattern.compile("\p{cntrl}");
^
1 error
Please help me on this issue.
Thanks&Regards
Debi
// The pattern matches control characters
Pattern p = Pattern.compile("\p{cntrl}");
Matcher m = p.matcher("");
String aLine = null;

Removing non-numeric characters from string

Hi there,
I need to have the ability to remove non-numeric characters from a string and I do not know how to do this.
Does any one know a way?
Example:
Present String: (02)-2345-4607
Required String: 0223454607
Thanks in advance

Dear NickM
Try this this will work...........
create or replace function char2num(mstring in varchar2) return integer
is
-- Function to remove Special characters and alphebets from phone no. string field
-- Author - Valid Bharde.(India-Mumbai)
-- Date :- 20 Sept 2006.
-- This Function will return numeric representation.
-- The Folowing program is gifted to NickM with respect to his post on oracle site regarding Removing non-numeric characters from string on the said date
mstatus number :=0;
mnum number:=0;
mrefstring varchar2(50);
begin
mnum := length(mstring);
for x in 1..mnum loop
if (ASCII(substr(upper(mstring),x,1)) >= 48 and ASCII(substr(upper(mstring),x,1)) <= 57) then
mrefstring := mrefstring || substr(mstring,x,1);
end if;
end loop;
return mrefstring;
end;
copy the above program and use it at function for example
SQL> select char2num('(022)-453452781') from dual;
CHAR2NUM('(022)-453452781')
22453452781
Chao!!!

Removing last 2 characters from string field

I am trying to remove the last 2 characters of a string field.
there is no consistant length in the field
316R1
12364R1
i want to remove everything after R
i tried instrrev but i that didnt do it.
is there a way to say
start position1 and go the R
thanks

formula,
left (field, length_formula) is the solution
the length_formula is the number of chars form left
f.i
left(field, length (field)-2)
left (field, InstrRev, field,"R")
of course a combination with Right is also possible

Remove unicode control character

hey, so i've asked this around a few places and recieved no answers, but the arch community seems like a bunch of smart people and i've recently switched over from ubuntu.
is there a way to remove input methods and unicode control characters from the context (right-click) menu? i've searched and searched, but i've had no luck with this.
there's a way to remove it in gnome with gconf, but i've tried that, and no luck.
i'm using openbox, and i think i found something about my gtkrc file along these lines
gtk-show-input-method-menu =
gtk-show-unicode-menu =
both are "gboolean" which i don't know what it is, or if this is even what i want.
any help would be appreciated, as i never use these menus.

U+2415, SYMBOL FOR NEGATIVE ACKNOWLEDGE is not a control character. It is a normal symbol character, which is a substitute display character used when the control character U+0015 NEGATIVE ACKNOWLEDGE is to be displayed instead of being interpreted.
You need to refine your question.
In general, you can remove any particular character or characters from a string using the SQL functions TRANSLATE or REPLACE. You can use CHR or UNISTR to encode characters that you cannot enter from a keyboard. You can use REGEXP_REPLACE with POSIX character classes to remove broader ranges of characters.
-- Sergiusz

Removing Non-numeric characters from Alpha-numeric string

Hi,
I have one column in which i have Alpha-numeric data like
COLUMN X
+91 (876) 098 6789
1-567-987-7655
so on.
I want to remove Non-numeric characters from above (space,'(',')',+,........)
i want to write something generic (suppose some function to which i pass the column)
thanks in advance,
Mandip

This variation uses the like operators pattern recognition to remove non alphanumeric characters. It also
keeps decimals.
Code Snippet
CREATE FUNCTION dbo.RemoveChars(@Str varchar(1000))
RETURNS VARCHAR(1000)
BEGIN
declare @NewStr varchar(1000),
@i int
set @i = 1
set @NewStr = ''
while @i <= len(@str)
begin
--grab digits or (| in regex) decimal
if substring(@str,@i,1) like '%[0-9|.]%'
begin
set @NewStr = @NewStr + substring(@str,@i,1)
end
else
begin
set @NewStr = @NewStr
end
set @i = @i + 1
end
RETURN Rtrim(Ltrim(@NewStr))
END
GO
Code to validate:
Code Snippet
declare @t table(
TestStr varchar(100)
insert into @t values ('+91 (8.76) \098 6789');
insert into @t values ('1-567-987-7655');
select dbo.RemoveChars(TestStr)
from @t

How to stop control characters from being tokenized?

In Oracle 11.2.0.3
I have documents which has embeded Arabic and so there is a control character \u202b to indicate right-to-left string. How do I stop these kinds of control characters from being tokenized. Doing a search with CONTAINS(search_string,' \u202b')>0 finds documents. Do I just add the '\u202b' as a stopword?

What lexer are you using, Amin?
I assume from what you say that the characters are getting indexed as whole tokens? If so, adding them as stopwords should work, but I'm surprised they're getting indexed at all - sounds like a fault in the "alphanumeric indentification" code to me.
I'm going to take a guess this is the World Lexer. Am I right?

Removing non printable characters from an excel file using powershell

Hello,
anyone know how to remove non printable characters from an excel file using powershell?
thanks,
jose.

To add - Excel is a binary file. It cannot be managed via external methods easily. You can write a macro that can do this. Post in the Excel forum and explain what you are seeing and get the MVPs there to show you how to use the macro facility
to edit cells. Outside of cell text "unprintable" characters are a normal part of Excel.
¯\_(ツ)_/¯

Remove un recognized characters from a field of a table.

Is there any way to remove un recognized characters from content of field in a table. Please help.
Thx

If you know the characters that you want removed, you may have success using the TRANSLATE function.
The key here as stated in the SQL Language Reference is
the extra characters at the end of from_string have no corresponding characters in to_string. If these extra characters appear in char, then they are removed from the return value.
The other useful function to remove unwanted junk is REGEXP_REPLACE - but you must be on 10g or later for regular expression support. Translate is supported at least as far back as 8i, perhaps further.

Removing unwanted control characters in exported text files

I am currently evaluating Crystal Reports 2008 to determine applicability to our requirements. I need to export data files to continuous text to be read by other application software. I have successfully created the files but have what I believe to be page feed or end-of-page control characters (small rectangles) in the output. Can someone enlighten me as to how I can suppress or remove these control characters?

In the export to text options enter 0 for the number of lines per page. This will produce an unpaginated text document without the page control markers.

Removing non english characters from my string input source

Guys,
I have problem where I need to remove all non english (Latin) characters from a string, what should be the right API to do this?
One I'm using right now is:
s.replaceAll("[^\\x00-\\x7F]", "");//s is a string having chinese characters.
I'm looking for a standard Solution for such problems, where we deal with multiple lingual characters.
TIA
Nitin

Nitin_tiwari wrote:
I have a string which has Chinese as well as Japanese characters, and I only want to remove only Chinese characters.
What's the best way to go about it?Oh, I see!
Well, the problem here is that Strings don't have any information on the language. What you can get out of a String (provided you have the necessary data from the Unicode standard) is the script that is used.
A script can be used for multiple languages (for example English and German use mostly the same script, even if there are a few characters that are only used in German).
A language can use multiple scripts (for example Japanese uses Kanji, Hiragana and Katakana).
And if I remember correctly, then Japanese and Chinese texts share some characters on the Unicode plane (I might be wrong, 'though, since I speak/write neither of those languages).
These two facts make these kinds of detections hard to do. In some cases they are easy (separating latin-script texts from anything else) in others it may be much tougher or even impossible (Chinese/Japanese).

Removing Non-Ascii Characters from a String

Hi Everyone,
I would like to remove all NON-ASCII characters from a large string. For example, I am taking text from websites and would like to remove all the strange arabic and asian characters. How can I accomplish this?
Thank you in advance.

I would like to remove all NON-ASCII characters from a large string. I don't know if its a good method but try this:
str="\u6789gj";
output="";
for(char c:str.toCharArray()){
if((c&(char)0xff00)==0){
output=output+c;
System.out.println(output);
all the strange arabic and asian characters.Don't call them so.... I am an Indian Muslim ;-) ....
Thanks!

Removing number of characters from end of string

Hi all....
Dooza very kindly helped me with trimming a string in an
earlier post, but
now i want to remove the last four characters from a string.
I really should know how to do this and will have to do some
bedtime reading
:-|
But, for now, could someone help!
Thanks
Andy

Ah Dooza - Thank You.
To the rescue again :-D
You're helping me to see the logic...
Thanks Again
Andy
"Dooza" <[email protected]> wrote in message
news:gbafel$ra3$[email protected]..
> Andy wrote:
>> Hi all....
>> Dooza very kindly helped me with trimming a string
in an earlier post,
>> but now i want to remove the last four characters
from a string.
>> I really should know how to do this and will have to
do some bedtime
>> reading :-|
>>
>> But, for now, could someone help!
>
> Hi Andy,
> Try something like this:
> <%
> myStr = "this is my really long string"
> Response.Write(LEFT(myStr,LEN(myStr) -4))
> %>
>
> Dooza

Removing characters from string after a certain point

Im very new to java programming, so if this is a realy stupid question, please forgive me
I want to remove all the characters in my string that are after a certain character, such as a backslash. eg: if i have string called "myString" and it contains this:
109072\We Are The Champions\GEMINI
205305\We Are The Champions\Queen
4416\We Are The Champions\A Knight's Tale
a00022723\We Are The Champions\GREEN DAYi would like to remove all the characters after the first slash (109072*\*We...) leaving the string as "109072"
the main problem is that the number of characters before and after is not the always the same, and there is often letters in the string i want, so it cant be separated by removing all letters and leaving the numbers
Is there any way that this can be done?
Thanks in advance for all help.

You must learn to use the Javadoc for the standard classes. You can download it or reference it on line. For example [http://java.sun.com/javase/6/docs/api/java/lang/String.html|http://java.sun.com/javase/6/docs/api/java/lang/String.html].

How to remove end of line from string?

Hello,
I'd like to remove ends of line from the string. I've tried:
    static final Pattern END_LINE_PATTERN = Pattern.compile("$+");
    strBuf.append(input);
    Matcher m = END_LINE_PATTERN.matcher(input);
    int startIndex = -1;
    int endIndex;
    while (m.find()) {
     startIndex = m.start();
     endIndex = m.end();
     if (endIndex == strBuf.length() - 1) break;
    if (startIndex > -1) {
     strBuf.setLength(startIndex);
    }For strings "hello\n" and "hello\r" it works properly, but for string "hello\n\n" I get first occurrence at index 6 (at second \n), so as the result I get "hello\n". For the string "hello\r\n" I get first occurrence at index 5 (it's OK), but the end index is 5 as well, and the next occurrence I get at index 7, which doesn't give me any sense.
Hope somebody can help me.
Agata

What you're trying to do is remove one or more line separators from the end of a string ("\n", "\r", and "\r\n" each count as one line separator, but "\n\n" is two line separators). This is all you need to do: str = str.replaceAll("[\r\n]+$", ""); "$" doesn't match any characters, line separators or otherwise; it matches the position at the end of the string. In MULTILINE mode, it also matches the position before a line separator, but it still doesn't match the separator itself.

Removing unicode control characters from string

Similar Messages

Maybe you are looking for