URLDecoder / URLEncoder don't like non-ASCII, even with UTF-8 explicit

Here's my code:
import java.net.*;
import java.io.*;
public class Decoder
    public static void main(String[] args)
     if (args.length != 1) {
         System.err.println("Usage: Decoder UTF-8-URL-encoded-string");
         System.exit(1);
     String sampleString = "auflerhalb";
     try {
         for (int k = 0; k < args[0].length(); k++) {
          System.out.println("Char is " + Character.codePointAt(args[0], k));
         System.out.println();
         String queryText = URLDecoder.decode(args[0], "UTF-8");
         for (int k = 0; k < queryText.length(); k++) {
          System.out.println("Char is " + Character.codePointAt(queryText, k));
         String newText = URLEncoder.encode(queryText, "UTF-8");
         System.out.println(newText);
         System.out.println();
         for (int k = 0; k < sampleString.length(); k++) {
              System.out.println("Char is " + Character.codePointAt(sampleString, k));
         System.out.println(URLEncoder.encode(sampleString, "UTF-8"));
     catch ( UnsupportedEncodingException e ) {
         System.err.println("Couldn't decode querytext as UTF-8");
     System.exit(0);
}The problem I'm encountering is that characters such as the German S-like character aren't being decoded correctly, or encoded correctly. When I run this file, on Linux, MacOS X or Solaris 10, I get this:
% java -version
java version "1.5.0_07"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_07-b03)
Java HotSpot(TM) 64-Bit Server VM (build 1.5.0_07-b03, mixed mode)
% java -cp . Decoder "au%DFerhalb"
Char is 97
Char is 117
Char is 37
Char is 68
Char is 70
Char is 101
Char is 114
Char is 104
Char is 97
Char is 108
Char is 98
Char is 97
Char is 117
Char is 65533
Char is 101
Char is 114
Char is 104
Char is 97
Char is 108
Char is 98
au%EF%BF%BDerhalb
Char is 97
Char is 117
Char is 223
Char is 101
Char is 114
Char is 104
Char is 97
Char is 108
Char is 98
au%C3%9Ferhalb
The decimal code point of the German S is 223; the hex code point is DF. Yet the decoder completely mangles the string containing %DF, and the internal string, which clearly has the right decimal code point, is mangled by the encoder; C3 9F has absolutely nothing to do with 223.
I can't figure out for the life of me what I'm doing wrong. I've taken a look at the source code for URLDecoder.java, and it really appears that it's broken; it decodes hex sequences into a byte array, but Java bytes are signed, between -128 and 128. Could this be what's going on? Hex code points up to 127 work fine; above it don't work at all. Try my function with %7E instead of %DF, and then with %81.
Can this really be broken?
Thanks in advance -
Sam Bayer
The MITRE Corporation
[email protected]

The decimal code point of the German S is 223; the hex code point is DF.
Yet the decoder completely mangles the string containing %DF, and the
internal string, which clearly has the right decimal code point, is mangled
by the encoder; C3 9F has absolutely nothing to do with 223.I don't think you understand what's going on. I haven't checked the Unicode table but let's assume that hex-DF is the Unicode code point for that character. That means it's in the range which UTF-8 will encode in two bytes. I haven't run it through the algorithm but it's quite likely that the result would be C3 9F, since that's what happened when you did it. It would certainly be two bytes and not one, contrary to what you expect.

Similar Messages

  • My ipod nano 6th generation plays , stops , fast foward and rewinds songs by itself. it also speaks song titles non stop even with the setting off . please help me.

    my ipod nano 6th generation plays , stops , fast foward and rewinds songs by itself. it also speaks song titles non stop even with the setting off . please help me.

    Found this solution in another thread, what you want to do is go into the Nike Fitness thing, go to run, go to Spoken Feedback and set it to off. This should fix your problems.

  • Replacing non-ASCII characters with HTML charcter references

    Hi All,
    In Oracle 10g or greater is there a built-in function that will convert a string with non-ASCII characters like this
    a b č 뮼
    into an ASCII string with HTML character references like this?
    a b & # x 0 1 0 D ; & # x B B B C ;
    (note I had to include spaces between each character in the sample code for message to prevent the forum software from converting my text)
    I tried using
    utl_i18n.escape_reference( val, 'us7ascii' )
    but for some reason it returns
    a b c & # x B B B C ;
    Note how it converted the Western European character "č" to its unaccented counterpart "c", not "& # x 0 1 0 D ;" (is this a bug?).
    I also tried a custom solution using regexp_replace and asciistr (which I can't include here because the forum software chokes on it) but it only returns the correct result for values <=4000 characters long. Unfortunately asciistr doesn't appear to accept CLOB values larger than 4000 characters. It returns an error message like
    (ORA-22835: Buffer too small for CLOB to CHAR or BLOB to RAW conversion (actual: 30251, maximum: 4000) ).
    I'm looking for a solution that works on CLOB data of any size.
    Thanks in advance for any insight you can provide.
    Joe Fuda

    So with that (UTF8) in mind, let's take another look.....
    As shown below, I used a AL32UTF8 database.
    Note: I did not use a unicode capable tool for querying. So I set console mode code page to 1250 just to have č displayed properly (instead of posing as an è).
    Also, as a result of using windows-1250 for client character set, in the val column and in the second select's ncr column (iso8859-1), è (00e8) has been replaced with e through character set conversion going from server back to client.
    Running the same code on a database with a db character set such as we8mswin1252, that doesn't define the č (latin small c with caron) character, would yield results with a c in the ncr column.
    C:\>chcp 1250
    Aktuell teckentabell: 1250
    C:\>set nls_lang=.ee8mswin1250
    C:\>sqlplus test/test
    SQL*Plus: Release 11.1.0.6.0 - Production on Fri May 23 21:25:29 2008
    Copyright (c) 1982, 2007, Oracle.  All rights reserved.
    Connected to:
    Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - Production
    With the OLAP option
    SQL> select * from nls_database_parameters where parameter like '%CHARACTERSET';
    PARAMETER              VALUE
    NLS_CHARACTERSET       AL32UTF8
    NLS_NCHAR_CHARACTERSET AL16UTF16
    SQL> select unistr('\010d \00e8') val, utl_i18n.escape_reference(unistr('\010d \00e8'),'us7ascii') NCR from dual;
    VAL  NCR
    č e  c e
    SQL> select unistr('\010d \00e8') val, utl_i18n.escape_reference(unistr('\010d \00e8'),'we8iso8859p1') NCR from dual;
    VAL  NCR
    č e  &# x10d; e     <- "è"
    SQL> select unistr('\010d \00e8') val, utl_i18n.escape_reference(unistr('\010d \00e8'),'ee8iso8859p2') NCR from dual;
    VAL  NCR
    č e  č &# xe8;
    SQL> select unistr('\010d \00e8') val, utl_i18n.escape_reference(unistr('\010d \00e8'),'cl8iso8859p5') NCR from dual;
    VAL  NCR
    č e  &# x10d; &# xe8;In the US7ASCII case, where it should be possible for all non-ascii characters to be escaped, it seems as if the actual escape step is skipped over.
    Hope this helps to understand whether utl_i8n is usable or not in your case.
    Message was edited by:
    orafad
    Fixed replaced character references :)

  • Every time i want to delete something  the iMac wants my password.. before it was not like this.. even with pictures.. how can i solve the problem so can delete my pics without my password all the time

    With everything i want to delete the iMac wants my password, before it was not like that, i'm getting crazy of it.. Even with pics he's asking for my password..
    How can i solve the problem and delete stuff without filling in my password every time..
    Already thnx for your support and help..

    Normally, that's caused because of a permissions problem. Have a look at this site > http://www.thexlab.com/faqs/trash.html#Anchor-Files-46919

  • Encoding non english characters with utf 8 on jsp (Critical!!)

    I am inserting hebrew characters from JSP into oracle db and everything is fine until this point. But when I try to retrieve the information from the database, the characters are not displayed properly (I get some garbage characters). I am sure that the data stored in the database is correct, but not sure why there is a problem in displaying the data in the JSP.
    I came across a thread on TSS
    http://www.theserverside.com/discussions/thread.tss?thread_id=28944
    and followed the suggestions given there like having
    <%@ page contentType="text/html; charset=UTF-8" pageEncoding="UTF-8" %>
    <META http-equiv="Content-Type" content="text/html; charset=UTF-8">and also this
    <%
    //Some JDBC and sql statement query UTF-8 data and then ...
    String str = rs.getString("utf8_data");
    str = new String(str.getBytes("ISO-8859-1"),"UTF-8");
    %>
    <%= str %>Now, the data getting displayed is partly correct, I mean to say, some characters are still coming as squares.
    Any ideas will be of great help.

    even i doubt the database charset for this issue. But what I dont understand is how only certain hebrew characters are getting stored properly and why others are corrupted?
    Also, can anyone let me know how i can view the Non-English characters present in the database directly, as TOAD is not able to display them

  • Problems with password including non-ASCII characters

    I am a German language user with a German keyboard but an English OS as main language. Therefore my passwords (simple user and admin) includes non-ASCII characters used in German, French and Spanish language, which increases security. This works fine in the majority of login scenarios. There are, however, 3 scenarios where neither my non-ASCII simple user nor my non-ASCII admin PW are accepted:
    1) running "sudo" in Terminal;
    2) When I try to shut down and another user account is still open. Doing this brings up a login window asking for the PW of the other user that does not accept non-ASCII;
    3) Using Leopard/SnowLeopard CacheCleaner. Upon opening, this app asks for an admin PW, but does not recognize non-ASCII.
    Am I right in assuming that this has to do with non-ASCII PWs? I thought ASCII times were gone given the remarkable language flexibility of Mac OS over the years. I know this stupid problem only from Win XP. There it is even worse.
    Is there a way to overcome this problem without always temporarily changing my PW? Thanks.

    I think the problem is with the applications themselves and should be reported to the developer. Although some non-ASCII characters are acceptable for an admin password, in my experience most Unix systems don't like non-ASCII characters in passwords. It may be easier to avoid them if you can.
    OS X should simply request your admin password to shut down when another user account is open. An alert dialog usually appears warning that the other user is still logged in and giving you the option to log the other account out then shut down. But in my experience the only authorization needed is for your admin account.

  • Issue with Download and Loss of Non-ASCII Characters

    I have a need to allow my user to download the contents of an HTML Region as a file. This region contains some Greek letters, i.e. non-ASCII, used with some common finance formulas.
    I am able to copy the contents off this region using JavaScript without any issue.
    Moreover, I can copy the contents from JavaScript into a Page Item and then render the region with PL/SQL. Again, this works without an issue.
    However, when I try to download the region, the Greek letters are lost in the downloaded document. Instead they are replaced with this weird series of characters: (Δ
    I've created a sample app to demonstrate this problem at apex.oracle.com:
    URL: http://apex.oracle.com/pls/apex/f?p=34765:1
    UID: GUEST_DEV
    PWD: greeksgone
    Click the button labeled "Copy HTML Via JS" and you will see the statically populated region copied into the second region.
    Click the button labeled "Copy HTML Via APEX" and you will see the statically populated region copied into the third region. This is achieved by copying the HTML into a Page Item and then submitting the page. When the page returns, the value of this Page Item is then used to populate the third region. As you can see, the Greek letters are there as normal.
    However, if you click the "Download HTML" button you will see the the Greek letters are not present in the resulting file.
    -Joe

    Joe Upshaw wrote:
    I am just totally stuck here.
    This is what the document looks like without the required meta tag:
    <HTML>
    <BODY>
    <STYLE>
    <DIV>
    div.riskScenarioMatrixDiv
    overflow:auto;
    ....This version does not display the greek letters.
    If I could simply add this one meta tag in, everything would work beautifully:
    <HTML>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
    <BODY>
    <STYLE>
    <DIV>
    div.riskScenarioMatrixDiv
    overflow:auto;
    ....However, I have tried every combination I can think of in the code block but, any time that I add that meta tag, I get a *404 Page Not Found* error.
    The only thing standing between what we have and what we need is getting that meta tag in the output but, I just can't seem to find a way to do this. Actually, we'd really like to have, within the head tags; the meta tag, the style and the title but, not being able to get that meta tag in is the difference between acceptable and broken. It works with the others in the body.
    DECLARE
    ls_RiskMatrixTitle  VARCHAR2(32767);
    ls_RiskMatrixHTML   VARCHAR2(32767);
    ls_DefaultFileName  VARCHAR2(512);
    BEGIN
    ls_RiskMatrixHTML   := :P1_HTML;
    ls_DefaultFileName  := 'TestMe.html';         
    ls_RiskMatrixTitle  := 'Test of Download';        
    OWA_UTIL.MIME_HEADER( 'text/html',  False, 'UTF8' );
    HTP.P( 'Content-Disposition: attachment; filename=' || ls_DefaultFileName );
    --HTP.META( 'Content-Type',  null, 'text/html; charset=utf-8' );          
    --HTP.TITLE( ls_RiskMatrixTitle ); 
    OWA_UTIL.HTTP_HEADER_CLOSE;
    HTP.HTMLOPEN;   
    HTP.BODYOPEN;
    HTP.STYLE('<DIV>' || :P1_MATRIX_STYLE || '</DIV>');
    HTP.P(ls_RiskMatrixTitle);
    HTP.P(ls_RiskMatrixHTML);
    APEX_APPLICATION.G_UNRECOVERABLE_ERROR := True;
    END;
    You appear to be confusing HTTP and HTML.
    The HTTP header != HTML <tt>head</tt> element.
    HTP.META( 'Content-Type',  null, 'text/html; charset=utf-8' );          
    HTP.TITLE( ls_RiskMatrixTitle );  This generates HTML content. It does not go in the HTTP header. You should be generating an HTML <tt>head</tt> element containing this (and the <tt>style</tt> element) between <tt>HTP.HTMLOPEN</tt> and <tt> HTP.BODYOPEN</tt>.
    Also note that these web toolkit methods generate really obsolete HTML, therefore I never use them (and nor does APEX these days).
    Don't have time to get more into this now...

  • Replacing non-ascii characters in String

    I have a site where the user enters data in a rich text
    editor (ktml4) that gets stored into a database (mysql). There are
    non ascii characters getting into the data, I'm assuming that they
    are copying and pasting from Word. Unfortunately in this situation,
    changing that process isn't an option.
    Currently, this is the only character that is causing me
    problems:
    http://www.zvon.org/other/charSearch/PHP/search.php?request=ffa0&searchType=3
    I would just like to replace the non-ascii characters with a
    space when I read them from the database. Something like:
    #Replace(result.column, '\xffa0', ' ')#
    However, I believe that code looks for the string "\xffa0",
    not the character \xffa0.
    Is there anyway to do this?

    quote:
    Originally posted by:
    BuckLemke
    quote:
    Originally posted by:
    Dan Bracuk
    rereplace might work.
    Can you give an example of how to pass a non-ascii character
    to REReplace?
    Regular expressions are not my strength, but the approach I
    was considering was, "if it's not an ascii character, make it a
    space". Then you pass the entire string at once.

  • CMSDK Non-ASCII Characters and WebFolders

    Hi,
    i have the follow problems with the CMSDK and Microsoft.
    windows-explorer:
    It is impossible to enter a folder that contains non-ascii characters in the name.(the clientrequest will never send.)
    After the doubleclick on the foldername, i get a errormessage an then the url in the editbox is ISO-8859-1 encoded, but this url will never send to the sever.
    Other operations like create, rename, ... have no problems with non-ascii chars.
    with the IE, i can enter without a problem.
    MS Word:
    I can't save any file with non-ascii chars in the name.
    Only this methods are send:
    PROPFIND
    PROPFIND
    GET
    GET
    But never a "PUT", without a non-ascii char in the name the traffic looks like this:
    PROPFIND
    PROPFIND
    GET
    GET
    PROPFIND
    LOCK
    PUT
    It is also impossible to enter a folder containing non-ascii characters in the name, form the word filesave-dialog.
    URL UTF-8 encoding is enabled in the IE options and other operations (MOVE,COPY) are send correctly UTF-8 encoded.
    is there any solution?
    thanks
    Maik

    Have you set the following DAV Server configuration property:
    IFS.SERVER.PROTOCOL.DAV.Webfolders.DefaultCharset
    for your domain?
    You have to configure it for the character set that you want your clients to use when connecting to iFS via WebDAV.
    (You can use the web admin tool to change this property.)
    The reason for this is that the Microsoft WebFolders client software does not transmit the client's character set to the server, so the server has no way of knowing what to expect.

  • How can I build this cluster? I don't like the one I have created (too big)

    Hi everybody, I'm working with a Vision VI and the trouble is that I need to create a cluster to connect it to a terminal called Settings of the Vision VI (IMAQ Count objects). These settings includes booleans, long and unsigned word type values. I've seen this one in a pdf document:
    I like how it looks because it's in a reduced space.
    This is the one I've created:
    (Please ignore the red numbers 1 and 2)
    And the block diagram looks like this:
    If someone knows how make a VI as the first image, please post it.
    Thank you so much for your answers and comments.
    P.S. I use LabView 2009, maybe some features are not availables in newers versions.
    Impossible is nothing
    Solved!
    Go to Solution.

    You did not create a cluster, just some individual controls that you are bundling in the code. that's not the same!
    I would recommend a combination approach from what we heard so far.
    First, create a control on the subVI input. Now we know it has the correct type.
    Right-click each control you don't like and replace it with one from a different palette (modern, classic, system), making sure you keep the same datatype. Keep the wire connected to the subVI. If the wire breaks, you picked a wrong datatype. simply undo and try again. (This way, the cluster order should not change).
    Now rearrange the controls the way you want.
    Here's are two quick examples I just made from scratch.
    LabVIEW Champion . Do more with less code and in less time .
    Attachments:
    Settings.PNG ‏17 KB

  • I'm not able to create my apple id without my credit card number in my iphone 5c.there is no way to skip this step.there is no option like "NONE" to skip that step.kindly help me out.im unable to download even free apps

    i'm not able to create my apple id without my credit card number in my iphone 5c.there is no way to skip this step.there is no option like "NONE" to skip that step.kindly help me out.im unable to download even free apps

    See
    Why can’t I select None when I edit my Apple ID payment information?
    and
    Creating an iTunes Store, App Store, iBooks Store, and Mac App Store account without a credit card
    Step 3 is important, no matter whether you do this on a Mac or an iPad / iPhone:
    Important: Before proceeding to the next step, you must download and install a free application. ...
    Important: Before proceeding to the next step, you must download and install the free application by tapping Free followed by tapping Install App. …
    First you must download a free app from the App Store. When you are asked to sign in with your Apple ID, select "Create new account". Accept the terms and conditions checkbox, then click Continue. After you enter all the requested personal data, click Continue.
    When you are asked to select a payment method, select "None". 
    That's all there is to it.

  • I READ A LOT OF BAD ABOUT THE NEW O.S. IF I DON'T LIKE IT HOW DO I GO BACK TO 10.8.5  I AM COMPETENT BUT NOT A SUPERGEEK OR EVEN A PRO SO K.I.S.S. WOULD BE GOOD

    I READ A LOT OF BAD ABOUT THE NEW O.S. IF I DON'T LIKE IT HOW DO I GO BACK TO 10.8.5  I AM COMPETENT BUT NOT A SUPERGEEK OR EVEN A PRO SO K.I.S.S. WOULD BE GOOD

    Make a bootable backup of your current system before upgrading or make a new partition on your hard drive and install the new OS on the new volume. Now two more things for someone who believes they are competent:
    1. Don't believe everything you read on these forums. Only users with problems post here. Millions of other users don't post here because they manage not to have problems.
    2. Don't use all CAPS. That is considered shouting. Shouting is bad netiquette. Please use mixed case.

  • How do I go back to the previous version. I can't even find the back button. I don't like the new style. It may be more secure but it's not easier to use.

    I don't like the new toolbars or task bars or address bars or whatever you call them. In fact I refused to update my other computer because of the confusion the new version has caused me. I don't like the new location of the home button, I don't like having to click on the tab then go to a drop down menu to reload the tab (more clicks). And I really don't like the dark color of the top portion of the screen, it's really hard on my eyes. I don't like the blurry portion of my desktop showing through, it's confusing.

    See this article: <br />
    http://www.computertechtips.net/64/make-firefox-4-look-like-ff-3-6/

  • Problem with non-ASCII characters on TTY

    Although I'm not a native speaker, I want my system language to be English (US), since that's what I'm used to. However I have a lot of files which have German language in their file names.
    My /etc/locale.conf has en_US.UTF-8 and de_DE.UTF-8 enabled. My /etc/locale.conf contains only the line
    LANG=en_US.UTF-8
    The German file names show up fine within Dolphin and Konsole (ls -a). But they look weird on either of the TTYs (the "console" you get to by pressing e.g. ctrl+alt+F1). They have other characters like '>>' or the paragraph symbol where non-ASCII characters should be. Is it possible to fix this?

    I don't think the console font is the problem. I use Lat2-Terminus16 because I read the Beginner's Guide on the wiki while installing the system.
    My /etc/vconsole.conf:
    KEYMAP=de
    FONT=Lat2-Terminus16
    showconsolefont even shows me the characters missing in the file names; e.g.: Ö, Ä, Ü

  • Filling clob with non ascii characters

    Hello,
    I have had some problems with clobs and usage of german
    umlauts (����). I was'nt able to insert or update
    strings containing umlaute in combination with string
    binding. After inserting or updating the umlaut
    characters were replaced by strange (spanish) '?'
    which were upside down.
    However, it was working when I did not use string bindung.
    I tried varios things, after some time I tracked
    the problem down to to oracle.toplink.queryframework.SQLCall.java. In the
    prepareStatement(...) you find something
    like
    ByteArrayInputStream inputStream = new ByteArrayInputStream(((String) parameter).getBytes());
    // Binding starts with a 1 not 0.
    statement.setAsciiStream(index + 1, inputStream,((String) parameter).getBytes().length);
    I replaced the usage of ByteArrayInputStram with CharArrayReader:
    // TH changed, 26.11.2003, Umlaut will not work with this.
    CharArrayReader reader = new CharArrayReader(((String) parameter).toCharArray());     
    statement.setCharacterStream(index + 1, reader, ((String) parameter).length() );
    and this worked.
    Is there any other way achieving this? Did anyone
    get clobs with non ascii characters to work?
    Regards -- Tobias
    (Toplink 9.0.3, Clob was mapped to String, Driver was Oracle OCI)

    I don't think the console font is the problem. I use Lat2-Terminus16 because I read the Beginner's Guide on the wiki while installing the system.
    My /etc/vconsole.conf:
    KEYMAP=de
    FONT=Lat2-Terminus16
    showconsolefont even shows me the characters missing in the file names; e.g.: Ö, Ä, Ü

Maybe you are looking for