Stripping non text characters

I am generating a fixed format file from data out of a MSSQL DB ( that i have read only permissions) that is an import file to another program ( which i also don't have access to source code) . My problem is that some of the DB data has non text characters which causes the program that is accepting the data to throw
java.io.IOException: java.lang.IllegalArgumentException: The char '0x19' in 'whatEver String that has a bad character' is not a valid XML character
My solution was
     for( int i = 0; i < myString.length(); i++){
          b = (byte)myString.charAt(i);
          if( b > 31 && b <127)
          sb.append( blk.charAt( i));     
     }

Better solution: Get the encoding right. At some point your code (or some other application) does some conversion from binary (byte[]/InputStream/OutputStream/...) to text (char[]/Reader/Writer/String/...). That place does so incorrectly.
Find out the encoding of the binary data and apply that when transforming.
Read [this excellent introduction into Unicode|http://www.joelonsoftware.com/articles/Unicode.html].

Similar Messages

  • Safari "tel:" phone number links stripping non-numeric characters

    After a week of owning my iPhone (two weeks after it launched) I made myself a little web interface that contain various iPhone-helpful links. Four of these were AT&T specific telephone links:
    *646# - check how many minutes remaining
    *729 - pay your bill
    etc etc..
    (e.g. <a href="tel:*729">click here to pay your bill</a>)
    Well, needless to say this was a very convenient feature, however these links no longer function properly. It seems that the iPhone is now stripping any/all non-numeric characters from the link (although I haven't done extensive testing to see exactly which characters .. only with * and $ in particular).
    I've even tried HTML entities but they were stripped too.
    Now I know apple had a security fix a few patches back, one that fixed a vulnerability involving "tel:" links (CVE-2007-3755), but I haven't found any documentation regarding special characters now being ignored/stripped.
    So, my question is does anyone know if there is a workaround for this, or did the patch inadvertently and permanently destroy the ability to send these very helpful text messages via a web interface?
    Thanks,
    Marc

    I'm having a similarly annoying behavior with "tel:" links. I'm doing the same thing -- writing a web page to store some convenient numbers, but for me, it's for teleconference numbers. You dial into a toll-free number, then after a pause, dial in a passcode.
    On the iPhone's contact list, you can program in the pauses as the letter "p."
    However, in a tel: link, the p's change to 7's. Interesting!
    What I've discovered is that, for me (FW 1.1.2), alphabetical characters are translated into their numerical equivalents. So 1-800-COMCAST (a number I've had to dial all too much recently) gets rendered to 18002662278.
    This is a difference between Safari 3 and the iPhone version. On Safari 3 on the Mac, both the HTML and a mouseover shows that the tel link contains the letters. On the iPhone, though, clicking the link brings up a dialog box with the letters "translated."
    So, interesting -- but now I have a problem. How do I enter a pause into a tel link?

  • Need help in removing non printable characters

    hi
    I am having an issue with non printable characters in webservice. This webservice dishes out xml in B2B communication to my clients programs. Due to data corruption in oracle (dont know who is creating bad data ) I am having non printable characters in the xml file which is generated from database. I am dishing out this to our customers. since the data in updated every day it is imposible to fix the data every time. I need to write a very very effficient method to strip non printable characters from strings from the xml. Can some one Please help on this one. I want to make sure this method is very efficient because this method could be potentially be called lots of times. I am using JDK 1.3.1 and oracle 8i
    Any help will be appreciated
    Thanks
    Ashok Pappu

    At some point you existing program is probably converting from String data to the XML bytes through a CharsetEncoder, probably inside a java.io.Writer.
    Perhaps your best approach might be to write your own java.nio.charset.CharsetEncoder which deals with the bad characters as you see fit.
    You can register a new java.nio.charset.CharSet as a private character set type. Because this should result in simply replacing a standard CharsetEncoder with a non-standard one hopefully the overheads would be low.

  • How to Identify non-english characters in a Text

    Hi Experts,
    I have a text coming from KNA1-NAME1 which contains non-english characters / language at times. I want to identify them in my code so that I can skip them.
    Can you please guide with some Command / FM that help to identify these non-english characters?
    Regards,
    Nirmal

    Hi,
    I am fine with english characters A-Z, a-z or 0-9 or special characters. But it contains some chinese, japanes or non-english language characters which I dont want.
    The logic explained by you above would expect me to list all the valid characters. Also it would be a performance constraint. Hence i wanted something as FM or standard procedure. Can we use ASCII somehow ?
    Regards,
    Nirmal

  • Predictive text non-English characters to be made ...

    I just filled an enhancement request on this feature, please vote for it HERE:
    Predictive text non-English characters to be made optional
    The story is: while using predictive text for non-English languages (Polish in my case) the dictionary words are grammar correct which include special characters like: ą,ę,ć,ś,ż,ź,ó,ł etc. For texting (SMS) operators count these as 3 characters making a message much longer than it looks. Therefore I can tell you no one uses these characters while texting and people use EN only characters instead a,e,c,s,z,o,l... which makes using the predictive text useless for eg Polish language.
    I'd like to have an option to switch using these non-en chars off for predicting text, which is grammatically not correct but in real life that's how people type.
    So basically if there's an option to disable lang specific characters I would be getting an example suggestion of 'Prosze' instead of grammatically correct 'Proszę'. 'Prosze' is a 6 character word, 'Proszę' is a 5+3=8 character word. Considering a single SMS message a 300 chars, than it really makes a difference.
    Simple solution would be to replace every char ą with a, ć with c, ó with o etc... in each word suggested for the ones who have this option enabled.

    HI,
    You can write a code in PAI of main screen. there by using loop at screen you can make that field editable or disabled.
    Code sample:
    loop at screen.
    ****condition for value check
    if screen-name = 'TEXT_EDIT_NAME'
    screen-output = 1.
    screen-input = 0.
    modify screen.
    endif.
    endloop.
    Hope this will help you.

  • Text to speech non english characters

    I like Alex a lot.  And I like how I can high text, press a hotkey, and Alex will start reading away.
    However, I read material that includes some non English characters.  It's -very- annoying when Alex announces what language and alphabet he's reading before saying the sound.  It would be so much better if he just said the sound.  Or just skipped it.  I could live with the latter just fine.  Are there any options anywhere to adjust this?

    see this article:
    http://homepage.mac.com/thgewecke/iwebchars.html
    max

  • Non printable characters in a text file..

    hi,
    How to get blank lines and non-printable characters
    and remove those characters from the text file being uploaded from application server .
    thanks,
    Anil.

    Take a look at the constants in cl_abap_char_utilities. A simpler solution would be to ask for a file without such characters...

  • Remove all text/non-number characters

    Odd question, but is there a way to remove all non-number characters from a column of cells? For example, if I have:
    6dfasfads
    12Randomletters.
    Is there a way to get rid of the letters?
    <Edited by Host>

    Here is a script doing the trick with no extraneous table or even column.
    Barry's response is a clever one. I'm a bit bored to miss this track.
    --[SCRIPT cleaner]
    Enregistrer le script en tant que Script : cleaner.scpt
    déplacer le fichier ainsi créé dans le dossier
    <VolumeDeDémarrage>:Users:<votreCompte>:Library:Scripts:Applications:Numbers:
    Il vous faudra peut-être créer le dossier Numbers et peut-être même le dossier Applications.
    Sélectionner le groupe de cellules dont le contenu doit être nettoyé
    Aller au menu Scripts , choisir Numbers puis choisir cleaner
    Les cellules dont le contenu renferme des caractères non-numériques seront débarassées de ceux-ci.
    --=====
    L'aide du Finder explique:
    L'Utilitaire AppleScript permet d'activer le Menu des scripts :
    Ouvrez l'Utilitaire AppleScript situé dans le dossier Applications/AppleScript.
    Cochez la case "Afficher le menu des scripts dans la barre de menus".
    --=====
    Save the script as a Script: cleaner.scpt
    Move the newly created file into the folder:
    <startup Volume>:Users:<yourAccount>:Library:Scripts:Applications:Numbers:
    Maybe you would have to create the folder Numbers and even the folder Applications by yourself.
    Select the range of cells whose content must be cleaned.
    Go to the Scripts Menu, choose Numbers, then choose "cleaner"
    The cells whose content embed non numerical characters will drop them.
    --=====
    The Finder's Help explains:
    To make the Script menu appear:
    Open the AppleScript utility located in Applications/AppleScript.
    Select the "Show Script Menu in menu bar" checkbox.
    --=====
    Yvan KOENIG (VALLAURIS, France)
    2010/07/09
    --=====
    property alloweds : "0123456789.,-"
    on run
    set {dName, sname, tname, rname, rowNum1, colNum1, rowNum2, colNum2} to my getSelParams()
    tell application "Numbers" to tell document dName to tell sheet sname to tell table tname
    repeat with c from colNum1 to colNum2
    tell column c
    repeat with r from rowNum1 to rowNum2
    set maybe to value of cell r
    try
    maybe * 1
    on error (*
    The cell doesn't contain a true number so it must be cleaned
    set clean to {}
    repeat with k in maybe
    set k to k as text
    if k is in alloweds then copy k to end of clean
    end repeat
    set value of cell r to my recolle(clean, "")
    end try
    end repeat -- r
    end tell -- column c
    end repeat -- c
    end tell --Numbers
    end run
    --=====
    set {rowNum1, colNum1, rowNum2, colNum2} to my getCellsAddresses(dname,s_name,t_name,arange)
    on getCellsAddresses(d_Name, s_Name, t_Name, r_Name)
    local two_Names, row_Num1, col_Num1, row_Num2, col_Num2
    tell application "Numbers"
    set d_Name to name of document d_Name (* useful if we passed a number *)
    tell document d_Name
    set s_Name to name of sheet s_Name (* useful if we passed a number *)
    tell sheet s_Name
    set t_Name to name of table t_Name (* useful if we passed a number *)
    end tell -- sheet
    end tell -- document
    end tell -- Numbers
    if r_Name contains ":" then
    set two_Names to my decoupe(r_Name, ":")
    set {row_Num1, col_Num1} to my decipher(d_Name, s_Name, t_Name, item 1 of two_Names)
    if item 2 of two_Names = item 1 of two_Names then
    set {row_Num2, col_Num2} to {row_Num1, col_Num1}
    else
    set {row_Num2, col_Num2} to my decipher(d_Name, s_Name, t_Name, item 2 of two_Names)
    end if
    else
    set {row_Num1, col_Num1} to my decipher(d_Name, s_Name, t_Name, r_Name)
    set {row_Num2, col_Num2} to {row_Num1, col_Num1}
    end if -- r_Name contains…
    return {row_Num1, col_Num1, row_Num2, col_Num2}
    end getCellsAddresses
    --=====
    set { dName, sName, tName, rname, rowNum1, colNum1, rowNum2, colNum2} to my getSelParams()
    on getSelParams()
    local r_Name, t_Name, s_Name, d_Name
    set {d_Name, s_Name, t_Name, r_Name} to my getSelection()
    if r_Name is missing value then
    if my parleAnglais() then
    error "No selected cells"
    else
    error "Il n'y a pas de cellule sélectionnée !"
    end if
    end if
    return {d_Name, s_Name, t_Name, r_Name} & my getCellsAddresses(d_Name, s_Name, t_Name, r_Name)
    end getSelParams
    --=====
    set {rowNumber, columnNumber} to my decipher(docName,sheetName,tableName,cellRef)
    apply to named row or named column !
    on decipher(d, s, t, n)
    tell application "Numbers" to tell document d to tell sheet s to tell table t to ¬
    return {address of row of cell n, address of column of cell n}
    end decipher
    --=====
    set { d_Name, s_Name, t_Name, r_Name} to my getSelection()
    on getSelection()
    local _, theRange, theTable, theSheet, theDoc, errMsg, errNum
    tell application "Numbers" to tell document 1
    repeat with i from 1 to the count of sheets
    tell sheet i
    set x to the count of tables
    if x > 0 then
    repeat with y from 1 to x
    try
    (selection range of table y) as text
    on error errMsg number errNum
    set {_, theRange, _, theTable, _, theSheet, _, theDoc} to my decoupe(errMsg, quote)
    return {theDoc, theSheet, theTable, theRange}
    end try
    end repeat -- y
    end if -- x>0
    end tell -- sheet
    end repeat -- i
    end tell -- document
    return {missing value, missing value, missing value, missing value}
    end getSelection
    --=====
    on parleAnglais()
    local z
    try
    tell application "Numbers" to set z to localized string "Cancel"
    on error
    set z to "Cancel"
    end try
    return (z is not "Annuler")
    end parleAnglais
    --=====
    on decoupe(t, d)
    local oTIDs, l
    set oTIDs to AppleScript's text item delimiters
    set AppleScript's text item delimiters to d
    set l to text items of t
    set AppleScript's text item delimiters to oTIDs
    return l
    end decoupe
    --=====
    on recolle(l, d)
    local oTIDs, l
    set oTIDs to AppleScript's text item delimiters
    set AppleScript's text item delimiters to d
    set t to l as text
    set AppleScript's text item delimiters to oTIDs
    return t
    end recolle
    --=====
    --[/SCRIPT]
    Yvan KOENIG (VALLAURIS, France) vendredi 9 juillet 2010 11:11:17

  • Detecting non printables characters in a text file

    Hi,
    I need to remove some non printable characters like tabs, carriage returns, line feeds,.... and so!
    i want to do something like
    aString.replaceAll(<the non-printable char>, "");

    str = str.replaceAll("\\P{Print}+", "");From http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html, \p{print} is printable characters and \P is the negative of \p.

  • Non-English characters processed correctly by XML Parser 2 XSLT?

    I'm trying to transform an XML document (parsed as an XMLDocmument) using an XSL stylesheet (parsed as an XSLStylesheet) and the XSLProcessor class in Java, I encounter the following problem:
    Non-US characters such as German umlauts, stored in the XML in &#xxx; style, are not processed properly. "|" (&#252;), for example, comes out as "C<". Is this a bug in the XSLProcessor class or am I doing something wrong? I'm using this stylesheet declaration:
    <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns="http://www.w3.org/1999/xhtml1/strict"> Or should I mess with the encoding attribute of the ?xml ...? PI?
    tia
    John Smith

    I have not specified any encoding.
    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="completeproduct.xsl"?>
    <PRODUCT connection="demosample" xmlns:xsql="urn:oracle-xsql">
    <xsql:query>
    select * from products
    </xsql:query>
    should i specify encoding
    null

  • Non English characters in BIP email

    Hi, my report contains Japanese characters, when I view the output in HTML format. It is displayed properly. But when I click on send button , enter email parameters like to, cc, bcc, subject , etc and send it, in the mail I receive, the japanese characters are not getting displayed properly. The same problem occurs for spanish and portugese texts-in general to all non english characters. I am using Oracle Business Intelligence Publisher Release 10.1.3.4. If someone has faced a similar issue, kindly help. Thanks in advance

    Suggestions
    1) Try with NLS_LANG as
    SWEDISH_SWEDEN.WE8DEC
    2) Make a paramform and enter via paramform (unencoded)
    (This is just for testing purpose)
    3) Change machine locale to swedish and try
    4) Which reports version is this ?
    Please see
    BUG 2713695 - NLS CHARACTERS FOR PARAMETERS CHANGE TO QUESTION MARKS WHEN PASSED ON URL BAR
    Get in touch with Support to see if this is the issue and if "yes" get a one-off patch.
    [    All Docs for all versions    ]
    http://otn.oracle.com/documentation/reports.html
    [     Publishing reports to web  - 10G  ]
    http://download.oracle.com/docs/html/B10314_01/toc.htm (html)
    http://download.oracle.com/docs/pdf/B10314_01.pdf (pdf)
    [   Building reports  - 10G ]
    http://download.oracle.com/docs/pdf/B10602_01.pdf (pdf)
    http://download.oracle.com/docs/html/B10602_01/toc.htm (html)
    [   Forms Reports Integration whitepaper  9i ]
    http://otn.oracle.com/products/forms/pdf/frm9isrw9i.pdf
    ---------------------------------------------------------------------------------

  • Non ascii characters being sent from a parameter in a form

    Hi!
    I have seen many topics posted on passing non ascii characters through parameters from one servlet to another and converting them into whatever format is necessary.
    However, I have not seen anyone answer the following question. I have a jsp page (html) with the character encoding set to utf-8. The user inputs some data in to a text field which is inside a form. The data could be in non ascii characters such as hebrew or arabic. This form is then sent to another jsp where i try to retreive the data from teh text field. No matter what i do, i cannot get the data presented correctly. It is either question marks or other wierd symbols.
    I have tried every permetation of encoding of the actual html page, the ecoding of the string from request.getParameter etc but it still is not presented on the new html page correctly.
    Can anyone help??
    Spencer

    Ok, I solved the problem.
    I had to put at the top request.setCharacterEncoding("utf-8");
    Spencer

  • Replacing non-ASCII characters with HTML charcter references

    Hi All,
    In Oracle 10g or greater is there a built-in function that will convert a string with non-ASCII characters like this
    a b č 뮼
    into an ASCII string with HTML character references like this?
    a b & # x 0 1 0 D ; & # x B B B C ;
    (note I had to include spaces between each character in the sample code for message to prevent the forum software from converting my text)
    I tried using
    utl_i18n.escape_reference( val, 'us7ascii' )
    but for some reason it returns
    a b c & # x B B B C ;
    Note how it converted the Western European character "č" to its unaccented counterpart "c", not "& # x 0 1 0 D ;" (is this a bug?).
    I also tried a custom solution using regexp_replace and asciistr (which I can't include here because the forum software chokes on it) but it only returns the correct result for values <=4000 characters long. Unfortunately asciistr doesn't appear to accept CLOB values larger than 4000 characters. It returns an error message like
    (ORA-22835: Buffer too small for CLOB to CHAR or BLOB to RAW conversion (actual: 30251, maximum: 4000) ).
    I'm looking for a solution that works on CLOB data of any size.
    Thanks in advance for any insight you can provide.
    Joe Fuda

    So with that (UTF8) in mind, let's take another look.....
    As shown below, I used a AL32UTF8 database.
    Note: I did not use a unicode capable tool for querying. So I set console mode code page to 1250 just to have č displayed properly (instead of posing as an è).
    Also, as a result of using windows-1250 for client character set, in the val column and in the second select's ncr column (iso8859-1), è (00e8) has been replaced with e through character set conversion going from server back to client.
    Running the same code on a database with a db character set such as we8mswin1252, that doesn't define the č (latin small c with caron) character, would yield results with a c in the ncr column.
    C:\>chcp 1250
    Aktuell teckentabell: 1250
    C:\>set nls_lang=.ee8mswin1250
    C:\>sqlplus test/test
    SQL*Plus: Release 11.1.0.6.0 - Production on Fri May 23 21:25:29 2008
    Copyright (c) 1982, 2007, Oracle.  All rights reserved.
    Connected to:
    Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - Production
    With the OLAP option
    SQL> select * from nls_database_parameters where parameter like '%CHARACTERSET';
    PARAMETER              VALUE
    NLS_CHARACTERSET       AL32UTF8
    NLS_NCHAR_CHARACTERSET AL16UTF16
    SQL> select unistr('\010d \00e8') val, utl_i18n.escape_reference(unistr('\010d \00e8'),'us7ascii') NCR from dual;
    VAL  NCR
    č e  c e
    SQL> select unistr('\010d \00e8') val, utl_i18n.escape_reference(unistr('\010d \00e8'),'we8iso8859p1') NCR from dual;
    VAL  NCR
    č e  &# x10d; e     <- "è"
    SQL> select unistr('\010d \00e8') val, utl_i18n.escape_reference(unistr('\010d \00e8'),'ee8iso8859p2') NCR from dual;
    VAL  NCR
    č e  č &# xe8;
    SQL> select unistr('\010d \00e8') val, utl_i18n.escape_reference(unistr('\010d \00e8'),'cl8iso8859p5') NCR from dual;
    VAL  NCR
    č e  &# x10d; &# xe8;In the US7ASCII case, where it should be possible for all non-ascii characters to be escaped, it seems as if the actual escape step is skipped over.
    Hope this helps to understand whether utl_i8n is usable or not in your case.
    Message was edited by:
    orafad
    Fixed replaced character references :)

  • Encoding non english characters with utf 8 on jsp (Critical!!)

    I am inserting hebrew characters from JSP into oracle db and everything is fine until this point. But when I try to retrieve the information from the database, the characters are not displayed properly (I get some garbage characters). I am sure that the data stored in the database is correct, but not sure why there is a problem in displaying the data in the JSP.
    I came across a thread on TSS
    http://www.theserverside.com/discussions/thread.tss?thread_id=28944
    and followed the suggestions given there like having
    <%@ page contentType="text/html; charset=UTF-8" pageEncoding="UTF-8" %>
    <META http-equiv="Content-Type" content="text/html; charset=UTF-8">and also this
    <%
    //Some JDBC and sql statement query UTF-8 data and then ...
    String str = rs.getString("utf8_data");
    str = new String(str.getBytes("ISO-8859-1"),"UTF-8");
    %>
    <%= str %>Now, the data getting displayed is partly correct, I mean to say, some characters are still coming as squares.
    Any ideas will be of great help.

    even i doubt the database charset for this issue. But what I dont understand is how only certain hebrew characters are getting stored properly and why others are corrupted?
    Also, can anyone let me know how i can view the Non-English characters present in the database directly, as TOAD is not able to display them

  • Support issue for non-English characters (in html forms)

    Hi group!
    I just want to post an issue here and see if anyone else has the same problem. First off, Im running Windows XP MCE but the French version (not the english version). This may help find out where the problem really is.
    Second, I know a bit of html and such, and I'm referring to HTML Character entities for this thread, there's a quite complete list here for reference: http://www.faqs.org/docs/htmltut/characterentitiesfamsupp69.html
    I noticed that some, not all, non-English characters written in a textarea (which is, basically, a multi-lined input box) doesnt pass well or at all to the server when sending the form from Safari. Most of the time, the content of the text area is reduced to the beginning and ends where the first accentued character is met.
    The most used French accents (&eacute;, &agrave;) are usually well interpreted (but may, once in a while, produce that bug too) by safari, but &ocirc; and &icirc; doesnt do that well.
    Oddly, this bug doesnt happen all the time and doesnt "crash" in the same manner everytime.
    So I started a thread just to see if there's anyone else having issues with any non-english characters mostly in forms. Probably flash/shockwave does work, but I'm not sure- I have not tested yet.
    Acer Aspire 5044   Windows XP   Turion 1.8GHz, 1Gb SDRam, ATI 200M xpress

    Yes, it is a known issue. I also noticed that it sometimes works, but most of the time it does not. It will hopefully be solved in the future. According to http://www.apple.com/safari/download/ changes that will come include:
    # Support for International users
    # International text input methods
    # Advanced text (contextual forms, international scripts)
    Sony Vaio   Windows XP  

Maybe you are looking for