Replacing non-ASCII characters with HTML charcter references

Hi All,
In Oracle 10g or greater is there a built-in function that will convert a string with non-ASCII characters like this
a b č 뮼
into an ASCII string with HTML character references like this?
a b & # x 0 1 0 D ; & # x B B B C ;
(note I had to include spaces between each character in the sample code for message to prevent the forum software from converting my text)
I tried using
utl_i18n.escape_reference( val, 'us7ascii' )
but for some reason it returns
a b c & # x B B B C ;
Note how it converted the Western European character "č" to its unaccented counterpart "c", not "& # x 0 1 0 D ;" (is this a bug?).
I also tried a custom solution using regexp_replace and asciistr (which I can't include here because the forum software chokes on it) but it only returns the correct result for values <=4000 characters long. Unfortunately asciistr doesn't appear to accept CLOB values larger than 4000 characters. It returns an error message like
(ORA-22835: Buffer too small for CLOB to CHAR or BLOB to RAW conversion (actual: 30251, maximum: 4000) ).
I'm looking for a solution that works on CLOB data of any size.
Thanks in advance for any insight you can provide.
Joe Fuda

So with that (UTF8) in mind, let's take another look.....
As shown below, I used a AL32UTF8 database.
Note: I did not use a unicode capable tool for querying. So I set console mode code page to 1250 just to have č displayed properly (instead of posing as an è).
Also, as a result of using windows-1250 for client character set, in the val column and in the second select's ncr column (iso8859-1), è (00e8) has been replaced with e through character set conversion going from server back to client.
Running the same code on a database with a db character set such as we8mswin1252, that doesn't define the č (latin small c with caron) character, would yield results with a c in the ncr column.
C:\>chcp 1250
Aktuell teckentabell: 1250
C:\>set nls_lang=.ee8mswin1250
C:\>sqlplus test/test
SQL*Plus: Release 11.1.0.6.0 - Production on Fri May 23 21:25:29 2008
Copyright (c) 1982, 2007, Oracle.  All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - Production
With the OLAP option
SQL> select * from nls_database_parameters where parameter like '%CHARACTERSET';
PARAMETER              VALUE
NLS_CHARACTERSET       AL32UTF8
NLS_NCHAR_CHARACTERSET AL16UTF16
SQL> select unistr('\010d \00e8') val, utl_i18n.escape_reference(unistr('\010d \00e8'),'us7ascii') NCR from dual;
VAL  NCR
č e  c e
SQL> select unistr('\010d \00e8') val, utl_i18n.escape_reference(unistr('\010d \00e8'),'we8iso8859p1') NCR from dual;
VAL  NCR
č e  &# x10d; e     <- "è"
SQL> select unistr('\010d \00e8') val, utl_i18n.escape_reference(unistr('\010d \00e8'),'ee8iso8859p2') NCR from dual;
VAL  NCR
č e  č &# xe8;
SQL> select unistr('\010d \00e8') val, utl_i18n.escape_reference(unistr('\010d \00e8'),'cl8iso8859p5') NCR from dual;
VAL  NCR
č e  &# x10d; &# xe8;In the US7ASCII case, where it should be possible for all non-ascii characters to be escaped, it seems as if the actual escape step is skipped over.
Hope this helps to understand whether utl_i8n is usable or not in your case.
Message was edited by:
orafad
Fixed replaced character references :)

Similar Messages

  • Replacing non-ascii characters in String

    I have a site where the user enters data in a rich text
    editor (ktml4) that gets stored into a database (mysql). There are
    non ascii characters getting into the data, I'm assuming that they
    are copying and pasting from Word. Unfortunately in this situation,
    changing that process isn't an option.
    Currently, this is the only character that is causing me
    problems:
    http://www.zvon.org/other/charSearch/PHP/search.php?request=ffa0&searchType=3
    I would just like to replace the non-ascii characters with a
    space when I read them from the database. Something like:
    #Replace(result.column, '\xffa0', ' ')#
    However, I believe that code looks for the string "\xffa0",
    not the character \xffa0.
    Is there anyway to do this?

    quote:
    Originally posted by:
    BuckLemke
    quote:
    Originally posted by:
    Dan Bracuk
    rereplace might work.
    Can you give an example of how to pass a non-ascii character
    to REReplace?
    Regular expressions are not my strength, but the approach I
    was considering was, "if it's not an ascii character, make it a
    space". Then you pass the entire string at once.

  • Replace Non-Numeric Characters with a Numeric Character in a String

    Hi Guys,
    I need to replace all the non-numeric characters (including embedded blanks & hyphen) in a string to a numeric character '1'.
    The trailing blanks should not be replaced.
    e.g. "P22233344455566" should be changed to "122233344455566"
    &    "49-1234567           " should be changed to "4911234567          "
    Please help.

    Use [replace|http://help.sap.com/abapdocu_70/en/ABAPREPLACE_IN_PATTERN.htm] with a regular expression to translate any non-numeric character (i.e. any character not between 0 and 9) to 1:
      REPLACE ALL OCCURENCES OF REGEX '[^0-9]' IN value WITH '1'.
    Cheers, harald
    p.s.: In older releases [translate|http://help.sap.com/abapdocu_70/en/ABAPTRANSLATE.htm] would also do the trick, but is more lengthy, because one would need to specify each individual character that should be replaced, e.g.:
      TRANSLATE value TO UPPER CASE.
      TRANSLATE value USING
          ' 1_1-1a1b1c1d1e1f1g1h1i1j1k1l1m1n1o1p1q1r1s1t1u1v1w1x1y1z1'.

  • Replace non ascii characters

    I have a person that is called "josée".
    I need "JOSEE" to be inserted into the database, not "JOSéE".
    I also have a person called "jürgen".
    I need "JURGEN" to be inserted into the database.
    Is there a way to do this ?
    (replace the non-ascii character by his corresponding ascii character)

    Thanks ebrian, I tried your suggestion:
    SELECT VALUE, REGEXP_REPLACE(VALUE, '^[:ascii:]') FROM AUDIT_TAB_COLUMNS WHERE VALUE = '' ;
    It sill leaves in the square unfortuneatly.
    Dave

  • How to replace non-alphanumeric characters with " "  in a String?

    Hi,
    Anyone can help with this?
    I guess I should use the replaceAll-method??

    I need to keep characters that are generally ok to
    use in
    sentences, like ".", ",", "!", and "-" and alsoall
    digits and letters
    (numbers and alphabetic characters).Add those characters to the pattern in that case. Add
    them just before ]
    but placing the '-' char as the last in the list.

  • Replace non numerics characters with 0

    Hi,
    Can someone help me out with this?
    I basically need to check my column value.
    In case thr is an occurence of any non numeric character ,I need to replcae it with a 0.
    By non numeric I mean any character other than [0-9].
    For example 'rte$36^r'
    I would like to convert it o '00003600'.
    Thanks in advance.

    There are several options, like TRANSLATE or REGEXP_REPLACE (only 10g):
    WITH t AS (SELECT 'rte$36^r' col1
                 FROM dual
    SELECT t.col1
         , REGEXP_REPLACE(t.col1, '[^0-9]', '0') col1_new1
         , TRANSLATE(t.col1, '0123456789' || t.col1, '0123456789' || LPAD('0', LENGTH(t.col1), '0')) col1_new2
      FROM t
    COL1     COL1_NEW1            COL1_NEW2
    rte$36^r 00003600             00003600C.

  • Issue with Download and Loss of Non-ASCII Characters

    I have a need to allow my user to download the contents of an HTML Region as a file. This region contains some Greek letters, i.e. non-ASCII, used with some common finance formulas.
    I am able to copy the contents off this region using JavaScript without any issue.
    Moreover, I can copy the contents from JavaScript into a Page Item and then render the region with PL/SQL. Again, this works without an issue.
    However, when I try to download the region, the Greek letters are lost in the downloaded document. Instead they are replaced with this weird series of characters: (Δ
    I've created a sample app to demonstrate this problem at apex.oracle.com:
    URL: http://apex.oracle.com/pls/apex/f?p=34765:1
    UID: GUEST_DEV
    PWD: greeksgone
    Click the button labeled "Copy HTML Via JS" and you will see the statically populated region copied into the second region.
    Click the button labeled "Copy HTML Via APEX" and you will see the statically populated region copied into the third region. This is achieved by copying the HTML into a Page Item and then submitting the page. When the page returns, the value of this Page Item is then used to populate the third region. As you can see, the Greek letters are there as normal.
    However, if you click the "Download HTML" button you will see the the Greek letters are not present in the resulting file.
    -Joe

    Joe Upshaw wrote:
    I am just totally stuck here.
    This is what the document looks like without the required meta tag:
    <HTML>
    <BODY>
    <STYLE>
    <DIV>
    div.riskScenarioMatrixDiv
    overflow:auto;
    ....This version does not display the greek letters.
    If I could simply add this one meta tag in, everything would work beautifully:
    <HTML>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
    <BODY>
    <STYLE>
    <DIV>
    div.riskScenarioMatrixDiv
    overflow:auto;
    ....However, I have tried every combination I can think of in the code block but, any time that I add that meta tag, I get a *404 Page Not Found* error.
    The only thing standing between what we have and what we need is getting that meta tag in the output but, I just can't seem to find a way to do this. Actually, we'd really like to have, within the head tags; the meta tag, the style and the title but, not being able to get that meta tag in is the difference between acceptable and broken. It works with the others in the body.
    DECLARE
    ls_RiskMatrixTitle  VARCHAR2(32767);
    ls_RiskMatrixHTML   VARCHAR2(32767);
    ls_DefaultFileName  VARCHAR2(512);
    BEGIN
    ls_RiskMatrixHTML   := :P1_HTML;
    ls_DefaultFileName  := 'TestMe.html';         
    ls_RiskMatrixTitle  := 'Test of Download';        
    OWA_UTIL.MIME_HEADER( 'text/html',  False, 'UTF8' );
    HTP.P( 'Content-Disposition: attachment; filename=' || ls_DefaultFileName );
    --HTP.META( 'Content-Type',  null, 'text/html; charset=utf-8' );          
    --HTP.TITLE( ls_RiskMatrixTitle ); 
    OWA_UTIL.HTTP_HEADER_CLOSE;
    HTP.HTMLOPEN;   
    HTP.BODYOPEN;
    HTP.STYLE('<DIV>' || :P1_MATRIX_STYLE || '</DIV>');
    HTP.P(ls_RiskMatrixTitle);
    HTP.P(ls_RiskMatrixHTML);
    APEX_APPLICATION.G_UNRECOVERABLE_ERROR := True;
    END;
    You appear to be confusing HTTP and HTML.
    The HTTP header != HTML <tt>head</tt> element.
    HTP.META( 'Content-Type',  null, 'text/html; charset=utf-8' );          
    HTP.TITLE( ls_RiskMatrixTitle );  This generates HTML content. It does not go in the HTTP header. You should be generating an HTML <tt>head</tt> element containing this (and the <tt>style</tt> element) between <tt>HTP.HTMLOPEN</tt> and <tt> HTP.BODYOPEN</tt>.
    Also note that these web toolkit methods generate really obsolete HTML, therefore I never use them (and nor does APEX these days).
    Don't have time to get more into this now...

  • Cannot rename file with non-ASCII characters when using the

    My application moves files from one directory to another by calling File[] srcFiles = srcDir.listFiles() to get a list of files in the source directory, and then calling srcFiles.renameTo(destFile) to rename each file.
    This does not work (renameTo returns false and the file is not moved) under the following circumstances:
    - the file's leaf name contains non-ASCII characters, for example "�"
    - the OS is Solaris 9
    - the LANG and LC_* environment variables are unset, i.e. the C locale is being used
    If I set the LANG environment variable to, for example, en_GB.UTF-8 then the rename succeeds.
    I have tried calling srcFiles[index].getName().getBytes("UTF-8") and the non-ASCII characters are being replaced with ? (0x3f) characters when LANG is unset.
    Is this a bug in the JRE? I would argue that since my code does not actually manipulate the filename (I just use the File object that File.listFiles() gives me) then the rename should succeed. Of course I would not expect the file name to be displayed correctly if I printed it out.
    I have reproduced this behaviour with JDK 1.4.2_05 and 1.5.0_04 on Solaris 9.
    Francis

    Thanks for the info Alan.
    I considered setting the locale in the environment (this sounds like the "correct" fix to me and we might implement it later), but this application shares a WebLogic server with many other applications so we would have to do a huge amount of testing to make sure that the locale change wouldn't break the other apps. In the end I worked around the problem by making the code that generates the filenames in the first place strip out any non-ASCII characters (the names of the files are not critically important).
    Looking forward to JSR-203, in the meantime perhaps a note about this behaviour in the java.io.File javadoc would be useful.

  • Filling clob with non ascii characters

    Hello,
    I have had some problems with clobs and usage of german
    umlauts (����). I was'nt able to insert or update
    strings containing umlaute in combination with string
    binding. After inserting or updating the umlaut
    characters were replaced by strange (spanish) '?'
    which were upside down.
    However, it was working when I did not use string bindung.
    I tried varios things, after some time I tracked
    the problem down to to oracle.toplink.queryframework.SQLCall.java. In the
    prepareStatement(...) you find something
    like
    ByteArrayInputStream inputStream = new ByteArrayInputStream(((String) parameter).getBytes());
    // Binding starts with a 1 not 0.
    statement.setAsciiStream(index + 1, inputStream,((String) parameter).getBytes().length);
    I replaced the usage of ByteArrayInputStram with CharArrayReader:
    // TH changed, 26.11.2003, Umlaut will not work with this.
    CharArrayReader reader = new CharArrayReader(((String) parameter).toCharArray());     
    statement.setCharacterStream(index + 1, reader, ((String) parameter).length() );
    and this worked.
    Is there any other way achieving this? Did anyone
    get clobs with non ascii characters to work?
    Regards -- Tobias
    (Toplink 9.0.3, Clob was mapped to String, Driver was Oracle OCI)

    I don't think the console font is the problem. I use Lat2-Terminus16 because I read the Beginner's Guide on the wiki while installing the system.
    My /etc/vconsole.conf:
    KEYMAP=de
    FONT=Lat2-Terminus16
    showconsolefont even shows me the characters missing in the file names; e.g.: Ö, Ä, Ü

  • Cannot login with password containing non-ascii characters

    Hello,
    I have web application, form based login. UTF-8 is specified "everywhere".
    And it works, except for passwords.
    If user register itself with password containing non-ascii characters, it is correctly written in database, but when doing either programmatic login or normal form based login, if fails.
    If the password is only ascii, it works.
    Username of login could be ascii or non-ascii, it doesn't matter, both works.
    I'm using sun java application server 9.1.
    jdbc realm.
    I'm not using hashing passwords, just clean (now)
    I tried configure realm Charset: UTF8 as last chance, but it doesn't work either.
    The problem is only with non-ascii characters in password.
    Any help very appreciated
    Thanks a lot

    hi,
    I know all that, but that's not the case. My app uses preparedStatements, everything is properly configured, in all pages, utf-8 is going from user to db and back without any problems.
    The only problem is with password field. As I am using form based login, with jdbc realm configured (again, nicely working when only ascii characters), I have very little chance to do something bad through the login phase.
    I'm not talking about special characters, I'm talking about non-ascii characters, let's say - Chinese, arabish, Russian alphabet etc.
    When user registers (my code), the fields are properly written to db. I have checked that, trust me.
    But the Sun app server realm seems to have some problems with the password field.
    (realm uses jdbc connection to mysql, the url contains all extra parameters to be sure about utf8. there is nothing more what can be configured...)
    If I try other alphabet codes in login and ascii in password, it works. But soon, as I use other alphabet code also in password, it doesn't work anymore.
    My only idea is, that I could try MD5 to create ascii only characters (I hope it works that way) on the client with javascript and then set Digest to MD5 in realm configuration. But still, it seems very strange. The clear way storage should also function? (now set Digest to 'none')
    Is it a bug of Sun App Server?
    thanks

  • Problems with non-ASCII characters on Linux Unit Test Import

    I found a problem with non-ASCII characters in the Unit Test Import for Linux.  This problem does not appear in the Unit Test Import for Windows.
    I have attached a Unit Test export called PROC1.XML  It tests a procedure that is included in another attachment called PROC1.txt. The unit test includes 2 implementations.  Both implementations pass non-ASCII characters to the procedure and return them unchanged.
    In Linux, the unit test import will change the non-ASCII characters in the XML file to xFFFD. If I copy/paste the the non-ASCII characters into the Unit Test after the import, they will be stored and executed correctly.
    Amazon Ubuntu 3.13.0-45-generic / lubuntu-core
    Oracle 11g Express Edition - AL32UTF8
    SQL*Developer 4.0.3.16 Build MAIN-16.84
    Java(TM) SE Runtime Environment (build 1.7.0_76-b13)
    Java HotSpot(TM) 64-Bit Server VM (build 24.76-b04, mixed mode)
    In Windows, the unit test will import the non-ASCII characters unchanged from the XML file.
    Windows 7 Home Premium, Service Pack 1
    Oracle 11g Express Edition - AL32UTF8
    SQL*Developer 4.0.3.16 Build MAIN-16.84
    Java(TM) SE Runtime Environment (build 1.8.0_31-b13)
    Java HotSpot(TM) 64-Bit Server VM (build 25.31-b07, mixed mode)
    If SQL*Developer is coded the same between Windows and Linux, The JVM must be causing the problem.

    Set the System property "mail.mime.decodeparameters" to "true" to enable the RFC 2231 support.
    See the javadocs for the javax.mail.internet package for the list of properties.
    Yes, the FAQ entry should contain those details as well.

  • [SOLVED] KDEmod - problem with mounting b/c of non-ASCII characters

    Hi guys!
    I finally set aside a few gigabites for Archlinux - it is no more in a virtual machine So far I managed to configure everything with the excellent wiki. It's runnin' and kickin'. I run accross only one problem:
    When I insert a CD with a label that has non-ASCII characters (some Polish ones in my case) and I click on it's icon in Konqueror I get the message that "file such-and-such doesn't exist" - and the Polish characters are clearly misspelled (it is not a fonts' problem - I double checked). I can access the folder either via console or via konqueror if I go to the /media folder, though.
    Any ideas how I can fix it? If you need more info, let me know.
    Last edited by JeremyTheWicked (2008-05-31 14:46:07)

    You're welcome . Now it's advisable for you to edit the title of your initial post: add [SOLVED]. Perhaps more clear wording would be in order, too, for the benefit of the search engine. The problem seems to be a trifle in retrospect, but somehow it takes some effort to find the solution, doesn't it ?

  • Problems with password including non-ASCII characters

    I am a German language user with a German keyboard but an English OS as main language. Therefore my passwords (simple user and admin) includes non-ASCII characters used in German, French and Spanish language, which increases security. This works fine in the majority of login scenarios. There are, however, 3 scenarios where neither my non-ASCII simple user nor my non-ASCII admin PW are accepted:
    1) running "sudo" in Terminal;
    2) When I try to shut down and another user account is still open. Doing this brings up a login window asking for the PW of the other user that does not accept non-ASCII;
    3) Using Leopard/SnowLeopard CacheCleaner. Upon opening, this app asks for an admin PW, but does not recognize non-ASCII.
    Am I right in assuming that this has to do with non-ASCII PWs? I thought ASCII times were gone given the remarkable language flexibility of Mac OS over the years. I know this stupid problem only from Win XP. There it is even worse.
    Is there a way to overcome this problem without always temporarily changing my PW? Thanks.

    I think the problem is with the applications themselves and should be reported to the developer. Although some non-ASCII characters are acceptable for an admin password, in my experience most Unix systems don't like non-ASCII characters in passwords. It may be easier to avoid them if you can.
    OS X should simply request your admin password to shut down when another user account is open. An alert dialog usually appears warning that the other user is still logged in and giving you the option to log the other account out then shut down. But in my experience the only authorization needed is for your admin account.

  • Problem with non-ASCII characters on TTY

    Although I'm not a native speaker, I want my system language to be English (US), since that's what I'm used to. However I have a lot of files which have German language in their file names.
    My /etc/locale.conf has en_US.UTF-8 and de_DE.UTF-8 enabled. My /etc/locale.conf contains only the line
    LANG=en_US.UTF-8
    The German file names show up fine within Dolphin and Konsole (ls -a). But they look weird on either of the TTYs (the "console" you get to by pressing e.g. ctrl+alt+F1). They have other characters like '>>' or the paragraph symbol where non-ASCII characters should be. Is it possible to fix this?

    I don't think the console font is the problem. I use Lat2-Terminus16 because I read the Beginner's Guide on the wiki while installing the system.
    My /etc/vconsole.conf:
    KEYMAP=de
    FONT=Lat2-Terminus16
    showconsolefont even shows me the characters missing in the file names; e.g.: Ö, Ä, Ü

  • Non ascii characters being sent from a parameter in a form

    Hi!
    I have seen many topics posted on passing non ascii characters through parameters from one servlet to another and converting them into whatever format is necessary.
    However, I have not seen anyone answer the following question. I have a jsp page (html) with the character encoding set to utf-8. The user inputs some data in to a text field which is inside a form. The data could be in non ascii characters such as hebrew or arabic. This form is then sent to another jsp where i try to retreive the data from teh text field. No matter what i do, i cannot get the data presented correctly. It is either question marks or other wierd symbols.
    I have tried every permetation of encoding of the actual html page, the ecoding of the string from request.getParameter etc but it still is not presented on the new html page correctly.
    Can anyone help??
    Spencer

    Ok, I solved the problem.
    I had to put at the top request.setCharacterEncoding("utf-8");
    Spencer

Maybe you are looking for

  • How do I upgrade first generation touch software to 3.1.3

    I am running old version on my first generation touch. I  would like to upgrade to 3.1.3, but cannot find how to download it from apple.

  • Java 1.6 won't download library jar files

    I am developing 2 applets for use on an embedded web server, similar to what you would see when you connect to the web interface of a network router. I need charting functionality in both applets, so I'm using an open source Java charting library cal

  • Ffmpeg and php video conversion

    Hello I have recently been trying to covert uploaded video into flv so that its playable on a flash player. I trying to use ffmpeg which is installed on my hosting plan server. I have been able to grab thumbnails using ffmpeg but now need to convert

  • Count in package

    Is this possible in a package? Count1 is a local variable. SELECT COUNT(      SELECT location      FROM table1      WHERE IDNo = p_IDNo      AND orderNo = '1241'      UNION      SELECT x.location      FROM table2 x, table3 y      WHERE x.name = y.nam

  • Java applet funktioniert nicht

    der Druck von DHL Online Porto Marke funktioniert nicht. Woran kann das liegen? Java Applet ist in den Sicherheitseinstellungen von Safari aktiviert. Servus