Ignoring non-ASCII characters in XML file.

Hello all who read this,
I got the following problem. Trying to read an XML file which contains UTF-8 characters. My database doesnt support UTF-8 char set .... obviouselly.
Below you will find my procedure for parsing XML file ... now i was wondering if there is a way (and how ) to ignore those "special" characters in my XML file.
Thank you for you help!
PROCEDURE xml_import is
BEGIN
DECLARE
  l_bfile   BFILE;
  l_clob    CLOB;
  l_parser  dbms_xmlparser.Parser;
  l_doc     dbms_xmldom.DOMDocument;
  l_nl      dbms_xmldom.DOMNodeList;
  l_n       dbms_xmldom.DOMNode;
  l_temp    VARCHAR2(1000);
  FILER     VARCHAR2(1000) := 'xml.xml';  
  TYPE tab_type IS TABLE OF X_XML%ROWTYPE;
  t_tab  tab_type := tab_type();
BEGIN
  l_bfile := BFileName('XML_DIR',FILER);
  dbms_lob.createtemporary(l_clob, cache=>FALSE);
  dbms_lob.open(l_bfile, dbms_lob.lob_readonly);
  dbms_lob.loadFromFile(dest_lob => l_clob,
                        src_lob  => l_bfile,
                        amount   => dbms_lob.getLength(l_bfile));
  dbms_lob.close(l_bfile);
  dbms_session.set_nls('NLS_DATE_FORMAT','''DD-MON-YYYY''');
  l_parser := dbms_xmlparser.newParser;
  dbms_xmlparser.parseClob(l_parser, l_clob);
  l_doc := dbms_xmlparser.getDocument(l_parser);
  dbms_lob.freetemporary(l_clob);
  dbms_xmlparser.freeParser(l_parser);
  l_nl :=  dbms_xslprocessor.selectNodes(dbms_xmldom.makeNode(l_doc),'/Doc/Upk');
  FOR cur_emp IN 0 .. dbms_xmldom.getLength(l_nl) - 1 LOOP
    l_n := dbms_xmldom.item(l_nl,cur_emp);
    t_tab.extend;
    dbms_xslprocessor.valueOf(l_n,'@id',t_tab(t_tab.last).id);
    dbms_xslprocessor.valueOf(l_n,'line1',t_tab(t_tab.last).vr);
    dbms_xslprocessor.valueOf(l_n,'line2',t_tab(t_tab.last).used);
    dbms_xslprocessor.valueOf(l_n,'line3',t_tab(t_tab.last).got);
  END LOOP;
  FORALL i IN t_tab.first .. t_tab.last
    INSERT INTO X_XML VALUES t_tab(i);
  COMMIT;
  dbms_xmldom.freeDocument(l_doc);
--EXCEPTION
--  WHEN OTHERS THEN
--    dbms_lob.freetemporary(l_clob);
--    dbms_xmlparser.freeParser(l_parser);
--    dbms_xmldom.freeDocument(l_doc);
END;
END;
XML file structure:
<?xml version="1.0" encoding="UTF-8" ?>
- <AjpesDokument>
- <Ident vrsta="KOMP">
  <Datum>2011-06-17</Datum>
  <Ura>09:33:10</Ura>
  </Ident>
- <Dolznik id="88888888">
  <DatumObdelave>2011-06-10</DatumObdelave>
  <Trr>56546654545</Trr>
  <PopolnoIme>Firm d.o.o.</PopolnoIme>
  <Ulica>Kičečeva ul</Ulica>
  <HisnaStev>026</HisnaStev>
  <PostaId>5421</PostaId>
  <Posta>MELJE</Posta>
  <IzpSif>00</IzpSif>
  <IzpNaziv>MELJE</IzpNaziv>
  <IzpNaslov>ČUPAVA</IzpNaslov>
  <IzpNaslov2>MELJE</IzpNaslov2>
- <Upnik davcna="10232293">
  <Naziv>CMC GALVANIKA d.o.o.</Naziv>
  <Kraj>LEŠČE</Kraj>
  <VrstaPobota>2</VrstaPobota>
  <Prijavljeno>3227.12</Prijavljeno>
  <Pobotano>3227.12</Pobotano>
  </Upnik>
- <Upnik davcna="11212191">
  <Naziv>AŽERO  d.o.o.</Naziv>
  <Kraj>MELJE</Kraj>
  <VrstaPobota>2</VrstaPobota>
  <Prijavljeno>219.30</Prijavljeno>
  <Pobotano>219.30</Pobotano>
  </Upnik>
  </Dolznik>
</AjpesDokument>
Edited by: 869620 on 1.7.2011 2:04

869620 wrote:
Hello all who read this,
I got the following problem. Trying to read an XML file which contains UTF-8 characters. My database doesnt support UTF-8 char set .... obviouselly.It's not obvious as Oracle can certainly be configured for UTF-8.
How about giving us some example XML and what you expect as output as there can be easier ways of extracting data from XML than using the dbms_xmldom package.
The dbms_xmldom package is one way, but it's not necessarily the best, as I would tend to use that for processing XML based structure that I don't actually know (I have used it myself, for example, for parsing XML Schemas (XSD) to auto-generate code for producing XML to match that schema), but for extracting date from a known XML structure, I would probably use SQL's XMLTABLE functionality.
And when you post code/data, please use the {noformat}{noformat} tags as detailed in the FAQ: {message:id=9360002}                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       

Similar Messages

  • Cannot rename file with non-ASCII characters when using the

    My application moves files from one directory to another by calling File[] srcFiles = srcDir.listFiles() to get a list of files in the source directory, and then calling srcFiles.renameTo(destFile) to rename each file.
    This does not work (renameTo returns false and the file is not moved) under the following circumstances:
    - the file's leaf name contains non-ASCII characters, for example "�"
    - the OS is Solaris 9
    - the LANG and LC_* environment variables are unset, i.e. the C locale is being used
    If I set the LANG environment variable to, for example, en_GB.UTF-8 then the rename succeeds.
    I have tried calling srcFiles[index].getName().getBytes("UTF-8") and the non-ASCII characters are being replaced with ? (0x3f) characters when LANG is unset.
    Is this a bug in the JRE? I would argue that since my code does not actually manipulate the filename (I just use the File object that File.listFiles() gives me) then the rename should succeed. Of course I would not expect the file name to be displayed correctly if I printed it out.
    I have reproduced this behaviour with JDK 1.4.2_05 and 1.5.0_04 on Solaris 9.
    Francis

    Thanks for the info Alan.
    I considered setting the locale in the environment (this sounds like the "correct" fix to me and we might implement it later), but this application shares a WebLogic server with many other applications so we would have to do a huge amount of testing to make sure that the locale change wouldn't break the other apps. In the end I worked around the problem by making the code that generates the filenames in the first place strip out any non-ASCII characters (the names of the files are not critically important).
    Looking forward to JSR-203, in the meantime perhaps a note about this behaviour in the java.io.File javadoc would be useful.

  • Problems with non-ASCII characters on Linux Unit Test Import

    I found a problem with non-ASCII characters in the Unit Test Import for Linux.  This problem does not appear in the Unit Test Import for Windows.
    I have attached a Unit Test export called PROC1.XML  It tests a procedure that is included in another attachment called PROC1.txt. The unit test includes 2 implementations.  Both implementations pass non-ASCII characters to the procedure and return them unchanged.
    In Linux, the unit test import will change the non-ASCII characters in the XML file to xFFFD. If I copy/paste the the non-ASCII characters into the Unit Test after the import, they will be stored and executed correctly.
    Amazon Ubuntu 3.13.0-45-generic / lubuntu-core
    Oracle 11g Express Edition - AL32UTF8
    SQL*Developer 4.0.3.16 Build MAIN-16.84
    Java(TM) SE Runtime Environment (build 1.7.0_76-b13)
    Java HotSpot(TM) 64-Bit Server VM (build 24.76-b04, mixed mode)
    In Windows, the unit test will import the non-ASCII characters unchanged from the XML file.
    Windows 7 Home Premium, Service Pack 1
    Oracle 11g Express Edition - AL32UTF8
    SQL*Developer 4.0.3.16 Build MAIN-16.84
    Java(TM) SE Runtime Environment (build 1.8.0_31-b13)
    Java HotSpot(TM) 64-Bit Server VM (build 25.31-b07, mixed mode)
    If SQL*Developer is coded the same between Windows and Linux, The JVM must be causing the problem.

    Set the System property "mail.mime.decodeparameters" to "true" to enable the RFC 2231 support.
    See the javadocs for the javax.mail.internet package for the list of properties.
    Yes, the FAQ entry should contain those details as well.

  • Non ascii characters are padding without reason

    Hi all,
    here is my issue :
    I have description field in my jsp page, when user enters some non ASCII characters as shown below :
    This is user had entered the input string is :
    "Hello ������� Hello" (without quotes)
    When he edits for the first time without change anything in the the string and save, then it became :
    "Hello Ã?Â?Ã?Â?Ã?Â?Ã?Â?Ã?Â?Ã?Â?Ã?Â?Ã? Hello" (without quotes)
    Then again one more time, it became :
    "Hello Ã?Â?Ã?Â?Ã?Â?Ã?Â?Ã?Â?Ã?Â?Ã?Â?Ã?Â?Ã?Â?Ã?Â?Ã?Â?Ã?Â?Ã?Â?Ã?Â?Ã?Â? Hello" (without quotes)
    Like wise it is getting increased its size. I'm saving the data to Oracle 10g database from jsp.
    Please suggest what to do to rectify the problem.
    Thanks & regards,
    Achchayya

    But I didnt changed anything from server.xml except two changes
    1) from : <?xml version='1.0' encoding='iso-8851-1'?>
        to : <?xml version='1.0' encoding='UTF-8'?>
    2)
    from : <Connector port="8888" protocol="HTTP/1.1"
                   connectionTimeout="20000"
                   redirectPort="8443" />
    to : <Connector port="8888" protocol="HTTP/1.1"
                   connectionTimeout="20000"
                   redirectPort="8443"
                   URIEncoding="UTF-8"/>here is my JSP :
    <%@ page pageEncoding="UTF-8" %>
    <html>
    <head>
    <title>The servlet example </title>
    <meta http-equiv="content-type" content="text/html; charset=utf-8">
    </head>
    <body>
    <h1>A simple web application</h1>
    <form method="POST" action="HelloWorld">
    <label for="name">Enter your name </label>
    <input type="text" id="name" name="name"/><br><br>
    <input type="submit" value="Submit Form"/>
    <input type="reset" value="Reset Form"/>
    </form>
    </body>
    </html>Servlet Code :
    import java.io.IOException;
    import java.io.PrintWriter;
    import javax.servlet.ServletException;
    import javax.servlet.http.HttpServlet;
    import javax.servlet.http.HttpServletRequest;
    import javax.servlet.http.HttpServletResponse;
    public class HelloWorld extends HttpServlet {
    protected void doPost(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
    * Get the value of form parameter
         System.out.println("response char set "+response.getCharacterEncoding());
         System.out.println("request char set "+request.getCharacterEncoding());
    response.setCharacterEncoding("UTF-8");
    String name = request.getParameter("name");
    String welcomeMessage = "Welcome "+name;
    System.out.println("Name : "+name);
    * Set the content type(MIME Type) of the response.
    response.setContentType("text/html");
    PrintWriter out = response.getWriter();
    * Write the HTML to the response
    out.println("<html>");
    out.println("<head>");
    out.println("<title> A very simple servlet example</title>");
    out.println("</head>");
    out.println("<body>");
    out.println("<h1>"+welcomeMessage+"</h1>");
    out.println("</body>");
    out.println("</html>");
    out.close();
    protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
         response.setCharacterEncoding("UTF-8");
         doPost(request, response);
    } web.xml :
    <?xml version="1.0" encoding="UTF-8"?>
    <web-app version="2.4"
    xmlns="http://java.sun.com/xml/ns/j2ee"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://java.sun.com/xml/ns/j2ee
    http://java.sun.com/xml/ns/j2ee/web-app_2_4.xsd">
    <servlet>
    <servlet-name>HelloWorld</servlet-name>
    <servlet-class>HelloWorld</servlet-class>
    </servlet>
    <servlet-mapping>
    <servlet-name>HelloWorld</servlet-name>
    <url-pattern>/HelloWorld</url-pattern>
    </servlet-mapping>
    <welcome-file-list>
    <welcome-file>
    HelloWorld.html
    </welcome-file>
    </welcome-file-list>
    <context-param>
         <param-name>PARAMETER_ENCODING</param-name>
         <param-value>UTF-8</param-value>
    </context-param>
    </web-app>please have a look at my files and suggest me BalusC.
    Thanks for your patience.
    regards,
    Achchayya
    Edited by: achayya on Oct 16, 2009 7:26 AM

  • Problem with non-ASCII characters on TTY

    Although I'm not a native speaker, I want my system language to be English (US), since that's what I'm used to. However I have a lot of files which have German language in their file names.
    My /etc/locale.conf has en_US.UTF-8 and de_DE.UTF-8 enabled. My /etc/locale.conf contains only the line
    LANG=en_US.UTF-8
    The German file names show up fine within Dolphin and Konsole (ls -a). But they look weird on either of the TTYs (the "console" you get to by pressing e.g. ctrl+alt+F1). They have other characters like '>>' or the paragraph symbol where non-ASCII characters should be. Is it possible to fix this?

    I don't think the console font is the problem. I use Lat2-Terminus16 because I read the Beginner's Guide on the wiki while installing the system.
    My /etc/vconsole.conf:
    KEYMAP=de
    FONT=Lat2-Terminus16
    showconsolefont even shows me the characters missing in the file names; e.g.: Ö, Ä, Ü

  • Validation for non-ASCII characters

    Hi all,
    Requirement: I have to apply a validation on on fields like Name and Address in applicationdefination.xml. When a user types non-ASCII characters and navigates to next page then it should display the error message. Thus, I have to restrict my user to ASCII values only.
    Present Situation: I'm using regular expression for this problem. In Jheadstart there is an option regular expression under the heading Validation. I have written following values in regular expression and Regular Expression Error Message options.
    Regular Expression
    ^\s*[\w\.\,\-\_\(\)\#\'\/\\\ u0022\u0026\*\;\:\s]+\s*$
    Regular Expression Error Message
    It is important to note that foreign characters are not accepted on our system. Please ensure only standard English letters are entered
    Since, i was getting error in jspx page due to double quotes(") and ampercent(&), So i have replaced the double quotes(") and amprecent(&) by their unicodes. Thus, the expression has become like ^\s*[\w\.\,\-\_\(\)\#\'\/\\\u0022\u0026\*\;\:\s]+\s*$.
    This expression is validating many characters like Ã,µ,Ç,Ï,Ö,§,¥,{,} but not all non ASCII characters like ѓ є ѕ ї Њ Щ Ώ Ω Ϊ Ά Ή Θ Λ Ξ Π τ ẫ ờ Ỡ Ứ Ỷ ự Ẁ ỹ ị Ọ ň ũ ť ţ Έ Ϊ ﻍ. Thus, its not fulfilling the requirement.
    Please suggest some valid solution to this problem. It’s very urgent.

    Hi,
    The validation seems to be performed in Java or Javascript depending on the layout (I'm sorry I can't remember the exact details). The expression suggested above by theEternalStudent works very well in Java, but not in Javascript.
    We came up with an expression which works in both. It rejects strings which contain &# by doing a lookahead before the main pattern - you might want to expand this to look for &#nnn; but for our purposes &# is enough.
    Here is the "platform neutral" solution:
    (?!.*\u0026#.*)^[\w\.\,\-\_\(\)\#\'\/\\\u0022\u0026\*\;\:\s]+$
    I think in future we will write a javascript function and amend the templates to call it directly.
    thanks,
    Michael

  • [SOLVED] KDEmod - problem with mounting b/c of non-ASCII characters

    Hi guys!
    I finally set aside a few gigabites for Archlinux - it is no more in a virtual machine So far I managed to configure everything with the excellent wiki. It's runnin' and kickin'. I run accross only one problem:
    When I insert a CD with a label that has non-ASCII characters (some Polish ones in my case) and I click on it's icon in Konqueror I get the message that "file such-and-such doesn't exist" - and the Polish characters are clearly misspelled (it is not a fonts' problem - I double checked). I can access the folder either via console or via konqueror if I go to the /media folder, though.
    Any ideas how I can fix it? If you need more info, let me know.
    Last edited by JeremyTheWicked (2008-05-31 14:46:07)

    You're welcome . Now it's advisable for you to edit the title of your initial post: add [SOLVED]. Perhaps more clear wording would be in order, too, for the benefit of the search engine. The problem seems to be a trifle in retrospect, but somehow it takes some effort to find the solution, doesn't it ?

  • Search for users and non-ASCII characters

    I am having a little issue with the "Accounts - Find Users" functionality. The search breaks on what I assume is non-ASCII characters (we use the following three up here in Denmark: �, �, �). To be precise, I have a user with the first name "J�rgen". Searching for first names starting with "J" works just fine but "J�" returns zero matches.
    My setup is with two machines, one (A) holding the MySQL database and one (B) serving Identity Manager on top of tomcat.
    Both A and B are RHEL boxes, and both have da_DK.UTF-8 as default locale.
    MySQL's /etc/my.cnf file has the following entry (as recommended in create_waveset_tables.mysql):
    [mysqld]
    default-character-set=utf8
    default-collation=binFor clarity, some functionality works just fine in Identity Manager with these non-ASCII characters such as adding a user whose name contains non-ASCII characters (not only ��� but also � for example). At the moment, it appears to be the search functionality which is not working correctly as I would expect it to. I'm still on the fence concerning whether I've missed something in terms of configuration, or whether this is a limitation.
    Does anyone know whether this problem is on my side or the software's side?

    I am having a little issue with the "Accounts - Find Users" functionality. The search breaks on what I assume is non-ASCII characters (we use the following three up here in Denmark: �, �, �). To be precise, I have a user with the first name "J�rgen". Searching for first names starting with "J" works just fine but "J�" returns zero matches.
    My setup is with two machines, one (A) holding the MySQL database and one (B) serving Identity Manager on top of tomcat.
    Both A and B are RHEL boxes, and both have da_DK.UTF-8 as default locale.
    MySQL's /etc/my.cnf file has the following entry (as recommended in create_waveset_tables.mysql):
    [mysqld]
    default-character-set=utf8
    default-collation=binFor clarity, some functionality works just fine in Identity Manager with these non-ASCII characters such as adding a user whose name contains non-ASCII characters (not only ��� but also � for example). At the moment, it appears to be the search functionality which is not working correctly as I would expect it to. I'm still on the fence concerning whether I've missed something in terms of configuration, or whether this is a limitation.
    Does anyone know whether this problem is on my side or the software's side?

  • CMSDK Non-ASCII Characters and WebFolders

    Hi,
    i have the follow problems with the CMSDK and Microsoft.
    windows-explorer:
    It is impossible to enter a folder that contains non-ascii characters in the name.(the clientrequest will never send.)
    After the doubleclick on the foldername, i get a errormessage an then the url in the editbox is ISO-8859-1 encoded, but this url will never send to the sever.
    Other operations like create, rename, ... have no problems with non-ascii chars.
    with the IE, i can enter without a problem.
    MS Word:
    I can't save any file with non-ascii chars in the name.
    Only this methods are send:
    PROPFIND
    PROPFIND
    GET
    GET
    But never a "PUT", without a non-ascii char in the name the traffic looks like this:
    PROPFIND
    PROPFIND
    GET
    GET
    PROPFIND
    LOCK
    PUT
    It is also impossible to enter a folder containing non-ascii characters in the name, form the word filesave-dialog.
    URL UTF-8 encoding is enabled in the IE options and other operations (MOVE,COPY) are send correctly UTF-8 encoded.
    is there any solution?
    thanks
    Maik

    Have you set the following DAV Server configuration property:
    IFS.SERVER.PROTOCOL.DAV.Webfolders.DefaultCharset
    for your domain?
    You have to configure it for the character set that you want your clients to use when connecting to iFS via WebDAV.
    (You can use the web admin tool to change this property.)
    The reason for this is that the Microsoft WebFolders client software does not transmit the client's character set to the server, so the server has no way of knowing what to expect.

  • Filling clob with non ascii characters

    Hello,
    I have had some problems with clobs and usage of german
    umlauts (����). I was'nt able to insert or update
    strings containing umlaute in combination with string
    binding. After inserting or updating the umlaut
    characters were replaced by strange (spanish) '?'
    which were upside down.
    However, it was working when I did not use string bindung.
    I tried varios things, after some time I tracked
    the problem down to to oracle.toplink.queryframework.SQLCall.java. In the
    prepareStatement(...) you find something
    like
    ByteArrayInputStream inputStream = new ByteArrayInputStream(((String) parameter).getBytes());
    // Binding starts with a 1 not 0.
    statement.setAsciiStream(index + 1, inputStream,((String) parameter).getBytes().length);
    I replaced the usage of ByteArrayInputStram with CharArrayReader:
    // TH changed, 26.11.2003, Umlaut will not work with this.
    CharArrayReader reader = new CharArrayReader(((String) parameter).toCharArray());     
    statement.setCharacterStream(index + 1, reader, ((String) parameter).length() );
    and this worked.
    Is there any other way achieving this? Did anyone
    get clobs with non ascii characters to work?
    Regards -- Tobias
    (Toplink 9.0.3, Clob was mapped to String, Driver was Oracle OCI)

    I don't think the console font is the problem. I use Lat2-Terminus16 because I read the Beginner's Guide on the wiki while installing the system.
    My /etc/vconsole.conf:
    KEYMAP=de
    FONT=Lat2-Terminus16
    showconsolefont even shows me the characters missing in the file names; e.g.: Ö, Ä, Ü

  • A Download servlet: non-ASCII characters not working

    This is my servlet used for file download:
    public void doPost(HttpServletRequest request, HttpServletResponse response) {
      String filepath = request.getParameter("filepath");
      String filename = request.getParameter("filename");
      response.setContentType("application/zip");
      response.setHeader("Content-Disposition", "attachment;filename=\""+filename+"\";");
      ServletOutputStream sos = null;
      BufferedInputStream bis = null;
      try {
        sos = response.getOutputStream();
        bis = new BufferedInputStream(new FileInputStream(source));
        byte buffer[] = new byte[2048];
        int c;
        while((c = bis.read(buffer)) != -1)
          sos.write(buffer, 0, c);
      } catch(Exception e) {
      } finally {
        bis.close();
        sos.close();
    }It does not work when the filename contains non-ASCII characters (e.g., extended ASCII, CJK ...)
    What do I fix this? Thanks!

    One possiblitiy that occurs to me is you have too many encoding things going on and you are sorta "over-encoding" things, as it were....
    All I can think to do is give you this sample JSP page that I created when I was trying to figure all this web encoding stuff with forms back in the day. So perhaps you can use this as a basis for your own page.
    // _lang.jsp
    <%@ page language="java" contentType="text/html; charset=UTF-8" %>
    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
    <html>
    <head>
         <title></title>
         <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
    </head>
    <body bgcolor="#ffffff" background="" text="#000000" link="#ff0000" vlink="#800000" alink="#ff00ff">
    <%
    request.setCharacterEncoding("UTF-8");
    String str = "\u7528\u6237\u540d";
    String name = request.getParameter("name");
    %>
    req enc: <%= request.getCharacterEncoding() %><br />
    rsp enc: <%= response.getCharacterEncoding() %><br />
    str: <%= str %><br />
    name: <%= name %><br />
    <br />
    <a href="_lang.jsp?name=<%= java.net.URLEncoder.encode(str, "UTF-8") %>">as link</a>
    <br />
    <br />
    <form method="GET" action="_lang.jsp" encoding="UTF-8">
    Name: <input type="text" name="name" value="" >
    <input type="submit" name="submit" value="GET Submit" />
    </form>
    <form method="POST" action="_lang.jsp" encoding="UTF-8">
    Name: <input type="text" name="name" value="" >
    <input type="submit" name="submit" value="POST Submit" />
    </form>
    </body>
    </html>

  • Non ascii characters being sent from a parameter in a form

    Hi!
    I have seen many topics posted on passing non ascii characters through parameters from one servlet to another and converting them into whatever format is necessary.
    However, I have not seen anyone answer the following question. I have a jsp page (html) with the character encoding set to utf-8. The user inputs some data in to a text field which is inside a form. The data could be in non ascii characters such as hebrew or arabic. This form is then sent to another jsp where i try to retreive the data from teh text field. No matter what i do, i cannot get the data presented correctly. It is either question marks or other wierd symbols.
    I have tried every permetation of encoding of the actual html page, the ecoding of the string from request.getParameter etc but it still is not presented on the new html page correctly.
    Can anyone help??
    Spencer

    Ok, I solved the problem.
    I had to put at the top request.setCharacterEncoding("utf-8");
    Spencer

  • Replacing non-ASCII characters with HTML charcter references

    Hi All,
    In Oracle 10g or greater is there a built-in function that will convert a string with non-ASCII characters like this
    a b č 뮼
    into an ASCII string with HTML character references like this?
    a b & # x 0 1 0 D ; & # x B B B C ;
    (note I had to include spaces between each character in the sample code for message to prevent the forum software from converting my text)
    I tried using
    utl_i18n.escape_reference( val, 'us7ascii' )
    but for some reason it returns
    a b c & # x B B B C ;
    Note how it converted the Western European character "č" to its unaccented counterpart "c", not "& # x 0 1 0 D ;" (is this a bug?).
    I also tried a custom solution using regexp_replace and asciistr (which I can't include here because the forum software chokes on it) but it only returns the correct result for values <=4000 characters long. Unfortunately asciistr doesn't appear to accept CLOB values larger than 4000 characters. It returns an error message like
    (ORA-22835: Buffer too small for CLOB to CHAR or BLOB to RAW conversion (actual: 30251, maximum: 4000) ).
    I'm looking for a solution that works on CLOB data of any size.
    Thanks in advance for any insight you can provide.
    Joe Fuda

    So with that (UTF8) in mind, let's take another look.....
    As shown below, I used a AL32UTF8 database.
    Note: I did not use a unicode capable tool for querying. So I set console mode code page to 1250 just to have č displayed properly (instead of posing as an è).
    Also, as a result of using windows-1250 for client character set, in the val column and in the second select's ncr column (iso8859-1), è (00e8) has been replaced with e through character set conversion going from server back to client.
    Running the same code on a database with a db character set such as we8mswin1252, that doesn't define the č (latin small c with caron) character, would yield results with a c in the ncr column.
    C:\>chcp 1250
    Aktuell teckentabell: 1250
    C:\>set nls_lang=.ee8mswin1250
    C:\>sqlplus test/test
    SQL*Plus: Release 11.1.0.6.0 - Production on Fri May 23 21:25:29 2008
    Copyright (c) 1982, 2007, Oracle.  All rights reserved.
    Connected to:
    Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - Production
    With the OLAP option
    SQL> select * from nls_database_parameters where parameter like '%CHARACTERSET';
    PARAMETER              VALUE
    NLS_CHARACTERSET       AL32UTF8
    NLS_NCHAR_CHARACTERSET AL16UTF16
    SQL> select unistr('\010d \00e8') val, utl_i18n.escape_reference(unistr('\010d \00e8'),'us7ascii') NCR from dual;
    VAL  NCR
    č e  c e
    SQL> select unistr('\010d \00e8') val, utl_i18n.escape_reference(unistr('\010d \00e8'),'we8iso8859p1') NCR from dual;
    VAL  NCR
    č e  &# x10d; e     <- "è"
    SQL> select unistr('\010d \00e8') val, utl_i18n.escape_reference(unistr('\010d \00e8'),'ee8iso8859p2') NCR from dual;
    VAL  NCR
    č e  č &# xe8;
    SQL> select unistr('\010d \00e8') val, utl_i18n.escape_reference(unistr('\010d \00e8'),'cl8iso8859p5') NCR from dual;
    VAL  NCR
    č e  &# x10d; &# xe8;In the US7ASCII case, where it should be possible for all non-ascii characters to be escaped, it seems as if the actual escape step is skipped over.
    Hope this helps to understand whether utl_i8n is usable or not in your case.
    Message was edited by:
    orafad
    Fixed replaced character references :)

  • Cannot login with password containing non-ascii characters

    Hello,
    I have web application, form based login. UTF-8 is specified "everywhere".
    And it works, except for passwords.
    If user register itself with password containing non-ascii characters, it is correctly written in database, but when doing either programmatic login or normal form based login, if fails.
    If the password is only ascii, it works.
    Username of login could be ascii or non-ascii, it doesn't matter, both works.
    I'm using sun java application server 9.1.
    jdbc realm.
    I'm not using hashing passwords, just clean (now)
    I tried configure realm Charset: UTF8 as last chance, but it doesn't work either.
    The problem is only with non-ascii characters in password.
    Any help very appreciated
    Thanks a lot

    hi,
    I know all that, but that's not the case. My app uses preparedStatements, everything is properly configured, in all pages, utf-8 is going from user to db and back without any problems.
    The only problem is with password field. As I am using form based login, with jdbc realm configured (again, nicely working when only ascii characters), I have very little chance to do something bad through the login phase.
    I'm not talking about special characters, I'm talking about non-ascii characters, let's say - Chinese, arabish, Russian alphabet etc.
    When user registers (my code), the fields are properly written to db. I have checked that, trust me.
    But the Sun app server realm seems to have some problems with the password field.
    (realm uses jdbc connection to mysql, the url contains all extra parameters to be sure about utf8. there is nothing more what can be configured...)
    If I try other alphabet codes in login and ascii in password, it works. But soon, as I use other alphabet code also in password, it doesn't work anymore.
    My only idea is, that I could try MD5 to create ascii only characters (I hope it works that way) on the client with javascript and then set Digest to MD5 in realm configuration. But still, it seems very strange. The clear way storage should also function? (now set Digest to 'none')
    Is it a bug of Sun App Server?
    thanks

  • When I try to send an email I get a message - Non ASCII characters in the local part of the recipient address.

    I am trying to send an emails to Italy. When I click send I get a message ( Non-ASCII characters in the local part of the recipient address). [email protected]  is one of the email address I am trying to send to. My other email address' work OK. I have sent emails to these Italian address before with no problem.

    Restart the operating system in '''[http://en.wikipedia.org/wiki/Safe_mode safe mode with Networking]'''. This loads only the very basics needed to start your computer while enabling an Internet connection. Click on your operating system for instructions on how to start in safe mode: [http://windows.microsoft.com/en-us/windows-8/windows-startup-settings-including-safe-mode Windows 8], [http://windows.microsoft.com/en-us/windows/start-computer-safe-mode#start-computer-safe-mode=windows-7 Windows 7], [http://windows.microsoft.com/en-us/windows/start-computer-safe-mode#start-computer-safe-mode=windows-vista Windows Vista], [http://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/boot_failsafe.mspx?mfr=true" Windows XP], [http://support.apple.com/kb/ht1564 OSX]
    ; If safe mode for the operating system fixes the issue, there's other software in your computer that's causing problems. Possibilities include but not limited to: AV scanning, virus/malware, background downloads such as program updates.

Maybe you are looking for