Translation of UTF8 stream to sequence of ASCII characters

Hello,
I need an advice how to translate UTF8 binary stream of characters to ASCII characters. Translation will depends on the Locale (language) used.
For example, if UTF8 character � (C381 in HEX) is used in Czech language I will need to translate it to two ASCII characters Ae; if the same � character used in French language I will need to translate it to character A. Binary Stream will have some ACSII characters which will not need any translation as well.
Please, advise.
Thank you.
A Mickelson

The Java compiler and other Java tools can only process files, which contain Latin-1 and/or Unicode-encoded (\udddd notation) characters. Native2ascii converts files, which contain other character encodings into files containing Latin-1 and/or Unicode-encoded characters.
String command = "native2ascii -encoding UTF-8 sourceFileName targetFileName�;
Process child = Runtime.getRuntime().exec(command);

Similar Messages

  • UTF8 Byte Size for ASCII Characters

    In a UTF8 database, are ASCII characters stored as a single byte, therefore using the minimum amount of db space possible? Or are they stored at a fixed amount of 2 or 3 bytes, wasting db space unnecessarily?

    Hi,
    The Ascii characters are single bytes. So don't worry about space.

  • Non US-ASCII characters in download file names

    I am trying to implement a simple file download in a JSP, and trying to get IE, Firefox and Opera to all display and handle non US-ASCII characters in the suggested download file name. Only concerned with Windows platform for now. Here's the code I am currently using:
    String agent = request.getHeader("USER-AGENT");
    if (null != agent && -1 != agent.indexOf("MSIE"))
    String codedfilename = URLEncoder.encode(cfrfilename, "UTF8");
    response.setContentType("application/x-download");
    response.setHeader("Content-Disposition","attachment;filename=" + codedfilename);
    else if (null != agent && -1 != agent.indexOf("Mozilla"))
    String codedfilename = MimeUtility.encodeText(cfrfilename, "UTF8", "B");
    response.setContentType("application/x-download");
    response.setHeader("Content-Disposition","attachment;filename=" + codedfilename);
    else
    response.setContentType("application/x-download");
    response.setHeader("Content-Disposition","attachment;filename=" + cfrfilename);
    }This URL encodes the file name if the browser is IE, MIME encodes it if the browser is Mozilla, and sends plain UTF-8 (the encoding of the JSP) for all other browsers. I get "cfrfilename" from translated properties files, and the string can contain characters from any character set - Chinese, Thai, Korean, etc.
    This code works correctly for IE - the file name is displayed correctly in the file Save as dialog, and it is saved correctly on disk, no matter which character set is used.
    For Firefox, the file name is displayed correctly in the file Save as dialog, but it is only saved correctly to disk if the file name is in a character set supported by the system locale. This seems to be a known Firefox bug (not fully using the Windows Unicode APIs), so nothing I can do about that.
    Nothing seems to work for Opera, however - I cannot get the file name to display correctly in the file Save as dialog, no matter which method I use (I have tried URL encoding and MIME encoding in addition to the plain UTF-8).
    Has anybody implemented something similar that works for at least these 3 browsers?

    I tested your code today,
                         dialog           save           open
    Firefox 1.5          OK                 OK               OK
    IE 6.0                OK                 OK                NGdailog: filename show in download popup dialog
    save: save to disk from dialog
    open: open directly from dailog

  • ASCII Characters Display As ????

    hi,
    i seem to be having an issue displaying ASCII characters, all i seem to get is black diamonds with white triangles inside, this happens on gnome-terminal, on xterm i get nothing at all. also some fonts do not appear to render in my web browser (chromium or firefox) i seem to get hollow squares instead of text.
    $ locale - a
    C
    en_GB
    en_GB.iso88591
    en_GB.utf8
    POSIX
    $ locale
    LANG=en_GB.UTF-8
    LC_CTYPE="en_GB.UTF-8"
    LC_NUMERIC=en_GB-UTF-8
    LC_TIME=en_GB-UTF-8
    LC_COLLATE="en_GB.UTF-8"
    LC_MONETARY=en_GB-UTF-8
    LC_MESSAGES="en_GB.UTF-8"
    LC_PAPER=en_GB-UTF-8
    LC_NAME="en_GB.UTF-8"
    LC_ADDRESS="en_GB.UTF-8"
    LC_TELEPHONE="en_GB.UTF-8"
    LC_MEASUREMENT=en_GB-UTF-8
    LC_IDENTIFICATION="en_GB.UTF-8"
    LC_ALL=
    # print ascii chars (program prints number followed by ascii conversion)
    180 ?
    181 ?
    182 ?
    183 ?
    184 ?
    185 ?
    186 ?
    thank you for looking, i would appreciate any help possiable with this.
    if you need more info please let me know.

    wiggly wrote:if you want to see the C Source code i can show you, alough it was not really relevant except for the fact that it demenstrates my issue.
    I know, it's just so that people could 1. test it themselves easily, 2. be sure that everyone is talking about the same thing. (In fact I have written the exact same program after reading your first post.)
    wiggly wrote:unless you include the extended characters that run from 128 - 255.
    But those are not "real" ASCII — those are various encodings, and the interpretation of those characters depends on which encoding you want to use to read them. They are incompatible with UTF-8.
    On my machine (running with i from 32 to 254):
    $ ./a.out
    180 ´
    181 µ
    182 ¶
    183 ·
    184 ¸
    185 ¹
    186 º
    But:
    $ ./a.out | less
    180 <B4>
    181 <B5>
    182 <B6>
    183 <B7>
    184 <B8>
    185 <B9>
    186 <BA>
    $ ./a.out | iconv -t utf-8
    128 iconv: illegal input sequence at position 662
    $ ./a.out >a; file a
    a: data
    The question: why do you care about / want to use non-UTF-8 encodings?

  • Replacing non-ASCII characters with HTML charcter references

    Hi All,
    In Oracle 10g or greater is there a built-in function that will convert a string with non-ASCII characters like this
    a b č 뮼
    into an ASCII string with HTML character references like this?
    a b & # x 0 1 0 D ; & # x B B B C ;
    (note I had to include spaces between each character in the sample code for message to prevent the forum software from converting my text)
    I tried using
    utl_i18n.escape_reference( val, 'us7ascii' )
    but for some reason it returns
    a b c & # x B B B C ;
    Note how it converted the Western European character "č" to its unaccented counterpart "c", not "& # x 0 1 0 D ;" (is this a bug?).
    I also tried a custom solution using regexp_replace and asciistr (which I can't include here because the forum software chokes on it) but it only returns the correct result for values <=4000 characters long. Unfortunately asciistr doesn't appear to accept CLOB values larger than 4000 characters. It returns an error message like
    (ORA-22835: Buffer too small for CLOB to CHAR or BLOB to RAW conversion (actual: 30251, maximum: 4000) ).
    I'm looking for a solution that works on CLOB data of any size.
    Thanks in advance for any insight you can provide.
    Joe Fuda

    So with that (UTF8) in mind, let's take another look.....
    As shown below, I used a AL32UTF8 database.
    Note: I did not use a unicode capable tool for querying. So I set console mode code page to 1250 just to have č displayed properly (instead of posing as an è).
    Also, as a result of using windows-1250 for client character set, in the val column and in the second select's ncr column (iso8859-1), è (00e8) has been replaced with e through character set conversion going from server back to client.
    Running the same code on a database with a db character set such as we8mswin1252, that doesn't define the č (latin small c with caron) character, would yield results with a c in the ncr column.
    C:\>chcp 1250
    Aktuell teckentabell: 1250
    C:\>set nls_lang=.ee8mswin1250
    C:\>sqlplus test/test
    SQL*Plus: Release 11.1.0.6.0 - Production on Fri May 23 21:25:29 2008
    Copyright (c) 1982, 2007, Oracle.  All rights reserved.
    Connected to:
    Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - Production
    With the OLAP option
    SQL> select * from nls_database_parameters where parameter like '%CHARACTERSET';
    PARAMETER              VALUE
    NLS_CHARACTERSET       AL32UTF8
    NLS_NCHAR_CHARACTERSET AL16UTF16
    SQL> select unistr('\010d \00e8') val, utl_i18n.escape_reference(unistr('\010d \00e8'),'us7ascii') NCR from dual;
    VAL  NCR
    č e  c e
    SQL> select unistr('\010d \00e8') val, utl_i18n.escape_reference(unistr('\010d \00e8'),'we8iso8859p1') NCR from dual;
    VAL  NCR
    č e  &# x10d; e     <- "è"
    SQL> select unistr('\010d \00e8') val, utl_i18n.escape_reference(unistr('\010d \00e8'),'ee8iso8859p2') NCR from dual;
    VAL  NCR
    č e  č &# xe8;
    SQL> select unistr('\010d \00e8') val, utl_i18n.escape_reference(unistr('\010d \00e8'),'cl8iso8859p5') NCR from dual;
    VAL  NCR
    č e  &# x10d; &# xe8;In the US7ASCII case, where it should be possible for all non-ascii characters to be escaped, it seems as if the actual escape step is skipped over.
    Hope this helps to understand whether utl_i8n is usable or not in your case.
    Message was edited by:
    orafad
    Fixed replaced character references :)

  • Cannot login with password containing non-ascii characters

    Hello,
    I have web application, form based login. UTF-8 is specified "everywhere".
    And it works, except for passwords.
    If user register itself with password containing non-ascii characters, it is correctly written in database, but when doing either programmatic login or normal form based login, if fails.
    If the password is only ascii, it works.
    Username of login could be ascii or non-ascii, it doesn't matter, both works.
    I'm using sun java application server 9.1.
    jdbc realm.
    I'm not using hashing passwords, just clean (now)
    I tried configure realm Charset: UTF8 as last chance, but it doesn't work either.
    The problem is only with non-ascii characters in password.
    Any help very appreciated
    Thanks a lot

    hi,
    I know all that, but that's not the case. My app uses preparedStatements, everything is properly configured, in all pages, utf-8 is going from user to db and back without any problems.
    The only problem is with password field. As I am using form based login, with jdbc realm configured (again, nicely working when only ascii characters), I have very little chance to do something bad through the login phase.
    I'm not talking about special characters, I'm talking about non-ascii characters, let's say - Chinese, arabish, Russian alphabet etc.
    When user registers (my code), the fields are properly written to db. I have checked that, trust me.
    But the Sun app server realm seems to have some problems with the password field.
    (realm uses jdbc connection to mysql, the url contains all extra parameters to be sure about utf8. there is nothing more what can be configured...)
    If I try other alphabet codes in login and ascii in password, it works. But soon, as I use other alphabet code also in password, it doesn't work anymore.
    My only idea is, that I could try MD5 to create ascii only characters (I hope it works that way) on the client with javascript and then set Digest to MD5 in realm configuration. But still, it seems very strange. The clear way storage should also function? (now set Digest to 'none')
    Is it a bug of Sun App Server?
    thanks

  • Search for users and non-ASCII characters

    I am having a little issue with the "Accounts - Find Users" functionality. The search breaks on what I assume is non-ASCII characters (we use the following three up here in Denmark: �, �, �). To be precise, I have a user with the first name "J�rgen". Searching for first names starting with "J" works just fine but "J�" returns zero matches.
    My setup is with two machines, one (A) holding the MySQL database and one (B) serving Identity Manager on top of tomcat.
    Both A and B are RHEL boxes, and both have da_DK.UTF-8 as default locale.
    MySQL's /etc/my.cnf file has the following entry (as recommended in create_waveset_tables.mysql):
    [mysqld]
    default-character-set=utf8
    default-collation=binFor clarity, some functionality works just fine in Identity Manager with these non-ASCII characters such as adding a user whose name contains non-ASCII characters (not only ��� but also � for example). At the moment, it appears to be the search functionality which is not working correctly as I would expect it to. I'm still on the fence concerning whether I've missed something in terms of configuration, or whether this is a limitation.
    Does anyone know whether this problem is on my side or the software's side?

    I am having a little issue with the "Accounts - Find Users" functionality. The search breaks on what I assume is non-ASCII characters (we use the following three up here in Denmark: �, �, �). To be precise, I have a user with the first name "J�rgen". Searching for first names starting with "J" works just fine but "J�" returns zero matches.
    My setup is with two machines, one (A) holding the MySQL database and one (B) serving Identity Manager on top of tomcat.
    Both A and B are RHEL boxes, and both have da_DK.UTF-8 as default locale.
    MySQL's /etc/my.cnf file has the following entry (as recommended in create_waveset_tables.mysql):
    [mysqld]
    default-character-set=utf8
    default-collation=binFor clarity, some functionality works just fine in Identity Manager with these non-ASCII characters such as adding a user whose name contains non-ASCII characters (not only ��� but also � for example). At the moment, it appears to be the search functionality which is not working correctly as I would expect it to. I'm still on the fence concerning whether I've missed something in terms of configuration, or whether this is a limitation.
    Does anyone know whether this problem is on my side or the software's side?

  • Ascii characters in the beginin of each data indexed part. How can i fix it?

    My data are indexed (coming out from For loop structure) and the saved file has some ascii characters in the beginin of each indexed part. For example,
    #@@% 1,1 2,3
    3,4 5,4
    5,6 7,6 (end of first acquisition)
    #$% 2,3 7,9 (beginin of second )
    3,1 4,5
    5,4 6,4
    The file is a sequence of arrays that form, all together, a single and bigger array. If i acquire data, say 10 times, and save to file, the problem will be repeated 10 times, each one to each acquisition. I attached the program (Labview 6.0). You can put some random numbers just to make it works. Thank you.
    Attachments:
    CaryPlot22.vi ‏71 KB

    Hello
    I tested your vi under LV 7 and it runs fine. I think you are reading your file as a string, and you recorded it as "spreadsheet string" ( the format is different) . That means you have extra characters for spreadsheet info like tabs, etc. If you read your data using "read from spreadsheet.vi" your arrays of data should be ok.
    Hope it helps you.
    Alipio
    "Qod natura non dat, Salmantica non praestat"

  • NSString with Unicode + ASCII characters

    Hi All,
    I am developing an mac application in 10.5.
    I need to deal with normal ASCII characters and Unicode characters.
    I want to calculate number of bytes taken to store the string.
    I am using [[unicodeStr dataUsingEncoding:NSUTF8StringEncoding]bytes].
    This is working fine if "unicodeStr" string has only unicode characters.
    If "unicodeStr" it is combination of (unicode characters + ASCII Characters) "NSUTF8StringEncoding" is not working.
    Can anybody tell me how to proceed??
    Thanks in advance.

    There is no such thing as a combination of Unicode and ASCII. ASCII is part of Unicode. Any ASCII string is a valid UTF8 string.

  • Convert smart quotes and other high ascii characters to HTML

    I'd like to set up Dreamweaver CS4 Mac to automatically convert smart quotes and other high ASCII characters (m-dashes, accent marks, etc.) pasted from MS Word into HTML code. Dreamweaver 8 used to do this by default, but I can't find a way to set up a similar auto-conversion in CS 4.  Is this possible?  If not, it really should be a preference option. I code a lot of HTML emails and it is very time consuming to convert every curly quote and dash.
    Thanks,
    Robert
    Digital Arts

    I too am having a related problem with Dreamweaver CS5 (running under Windows XP), having just upgraded from CS4 (which works fine for me) this week.
    In my case, I like to convert to typographic quotes etc. in my text editor, where I can use macros I've written to speed the conversion process. So my preferred method is to key in typographic letters & symbols by hand (using ALT + ASCII key codes typed in on the numeric keypad) in my text editor, and then I copy and paste my *plain* ASCII text (no formatting other than line feeds & carriage returns) into DW's DESIGN view. DW displays my high-ASCII characters just fine in DESIGN view, and writes the proper HTML code for the character into the source code (which is where I mostly work in DW).
    I've been doing it this way for years (first with GoLive, and then with DW CS4) and never encountered any problems until this week, when I upgraded to DW CS5.
    But the problem I'm having may be somewhat different than what others have complained of here.
    In my case, some high-ASCII (above 128) characters convert to HTML just fine, while others do not.
    E.g., en and em dashes in my cut-and-paste text show as such in DESIGN mode, and the right entries
        &ndash;
        &mdash;
    turn up in the source code. Same is true for the ampersand
        &amp;
    and the copyright symbol
        &copy;
    and for such foreign letters as the e with acute accent (ALT+0233)
        &eacute;
    What does NOT display or code correctly are the typographic quotes. E.g., when I paste in (or special paste; it doesn't seem to make any difference which I use for this) text with typographic double quotes (ALT+0147 for open quote mark and ALT+0148 for close quote mark), which should appear in source code as
        &ldquo;[...]&rdquo;
    DW strips out the ASCII encoding, displaying the inch marks in DESIGN mode, and putting this
        &quot;[...]&quot;
    in my source code.
    The typographic apostrophe (ALT+0146) is treated differently still. The text I copy & paste into DW should appear as
        [...]&rsquo;[...]
    in the source code, but instead I get the foot mark (both in DESIGN and CODE views):
    I've tried adjusting the various DW settings for "encoding"
        MODIFY > PAGE PROPERTIES > TITLE/ENCODING > Encoding:
    and for fonts
        EDIT > PREFERENCES > FONTS
    but switching from "Unicode (UTF-8)" to "Western European" hasn't solved the problem (probably because in my case many of the higher ASCII characters convert just fine). So I don't think it's the encoding scheme I use that's the problem.
    Whatever the problem is, it's caused me enough headaches and time lost troubleshooting that I'm planning to revert to CS4 as soon as I post this.
    Deborah

  • Username with ascii characters

    Hello, i'm having and html form and i would like the user in
    the username field to type ONLY ascii characters.
    For example, in other fields of the form i
    would like the user to type his mother language but
    as far as the username and password fields are concerned
    the characters have to be ascii.
    How am i supposed to check when the username is accepted/correct (*consists of ascii characters*)?
    and which are the desirable characters a username must have (e.g. *?* is a desirable character in a username , *:* this one?)
    Thanks, in advance!

    g_p_java wrote:
    How am i supposed to check when the username is accepted/correct (*consists of ascii characters*)?ASCII characters are the Unicode characters whose code points are between 0 and 127.
    and which are the desirable characters a username must have (e.g. *?* is a desirable character in a username , *:* this one?)I don't understand this. You have already said they must be ASCII. You have other requirements? Fine, go ahead and program them and ask questions if you have problems with that. Personally I don't think that requiring somebody to have a question mark in their user name is a good idea -- but probably you didn't mean it when you suggested that.

  • Non ascii characters being sent from a parameter in a form

    Hi!
    I have seen many topics posted on passing non ascii characters through parameters from one servlet to another and converting them into whatever format is necessary.
    However, I have not seen anyone answer the following question. I have a jsp page (html) with the character encoding set to utf-8. The user inputs some data in to a text field which is inside a form. The data could be in non ascii characters such as hebrew or arabic. This form is then sent to another jsp where i try to retreive the data from teh text field. No matter what i do, i cannot get the data presented correctly. It is either question marks or other wierd symbols.
    I have tried every permetation of encoding of the actual html page, the ecoding of the string from request.getParameter etc but it still is not presented on the new html page correctly.
    Can anyone help??
    Spencer

    Ok, I solved the problem.
    I had to put at the top request.setCharacterEncoding("utf-8");
    Spencer

  • How can I convert ASCII characters to ISO8859?

    Hi All,
    I have written a little application that renames a TV episode by scraping a TV listing site for the episode name. It is written in SWT and works great apart from on small problem. When getting the html back from the site, it sometimes contains special ASCII characters that are not in the ISO8859 (Windows filesystem) character set.
    For example, this is the line that I have to parse:
    <td style='padding-left: 6px;' class='b2'><a href='/Prison_Break/episodes/569183/03x01'>Orientaci��n</a></td>When viewing it in a browser, it is:
    <td style="padding-left: 6px;" class="b2"><a href="/Prison_Break/episodes/569183/03x01">Orientaci�n</a></td>Notice that the o in the title has an accent on it. While researching this problem I stumbled across 'HTML Entities to ISO 8859-1 Converter' at http://www.inweb.de/chetan/English/Resources/Java/HTML%202%20ISO.html. This open source project takes in an html entity like & and returns '&'.
    So that is not quite what I want, as my BufferedReader is converting the html entity into the ASCII representation already. I need a way of detecting a non ISO8859 character within an ASCII string, and hopefully replacing its natural 'equivalent' (would be o in this case).
    Does anyone know how I could do it without having to check for every special char and replacing (not really an option unless someone has done it before!!)
    If not that then, perhaps another way to attack the problem?
    Any help greatly appreciated ;)
    Dave

    Hi,
    NZ_Dave wrote:
    For example, this is the line that I have to parse:
    <td style='padding-left: 6px;' class='b2'><a href='/Prison_Break/episodes/569183/03x01'>Orientaci��n</a></td>
    This is coded in UTF-8. If you convert the bytes to a String using the UTF-8 encoding, then you will have the correct characters "Orientaci�n" in the string.
    Check your parser where it converts the bytes (coming from e.g. an InputStream) to characters. Use UTF-8 as the charset when doing that conversion.

  • How can I use ASCII Characters on the iPad?

    I have an iPad (1) and use iOS 4.3.5. Is there an easier Way of using ASCII Characters than having to use the "copy-Paste" procedure? If not, can we expect to see the feature of entering the ASCII code straight through the standard keyboard soon?

    Norisouro wrote:
    . Is there an easier Way of using ASCII Characters than having to use the "copy-Paste" procedure?
    You need to give some details about what it is you want to do, because ASCII Characters are what are already on the keyboard.  It is non-ASCII that you might need to copy/paste.
    http://en.wikipedia.org/wiki/ASCII
    I think you can be sure that Apple is never going to include a feature in iOS that has you input special characters by typing in numbers like Windows does it.  Mac's have always used a different approach.

  • Given filename or path contains Unicode or double-byte characters.Retry using ASCII characters for filename and path What does this mean? it happen when I publish an OAM

    Given file name or path contains Unicode or double-byte characters. Retry using ASCII characters for filename and path
    What does this mean? It is happening when I try to publish an OAM for Dreamweaver.
    Also: How can I specify the browser in Edge Animate? It is just going wherever. Are there no Preferences for Edge Animate?
    BTW. Just call it Edge. Seriously. Do you call it Illustrator Draw? Photoshop Retouching?

    No, my file name is mainContent.oam
    My project name is mainContent.an
    This error happens when I try to import into Dreamweaver. Sorry, I wasn't clear on that earlier.
    I thought maybe it was because I had saved my image as a png. So re-saved as a svg, still get the error.
    DO I have a setting is Dreamweaver CC that is wrong? Should I try this in Dreamweaver CS6? I might try that next.
    Why is this program so difficult? I know Flash. I know After Effects. I can work the timeline part just great. It's always in the export that I have problems.
    On a MacPro, 10.7.
    Are you an Adobe person or just a nice helper?

Maybe you are looking for

  • Creating a PDF document

         I have been trying, without success, to transform a jpeg document from my scanner into a pdf document.   can anyone please help me walk through this. Thank you.   JanRealtor    

  • Oracle 9i 9.2.0.6 on HPUX 11, now want to migrate HPUX 11i migration

    Hi, Query 1:Currently our database is running on HPUX 11, we would like to migrate the OS to HPUX 11i. Can anybody help us whether the 9.2.0.6 is compatible with HPUX 11i? If yes, is there any list of patches that needs to be applied on Oracle 9.2.0.

  • ITunes is sending a error message! Please Help

    Ok so i click on my iTunes and nothing happens then all of a sudden a error message pops up and it says "iTunes can not run because some of its required files are missing. Please reinstall iTunes." i have reinstalled it 10 times seriously and even ha

  • Is there a way to unregister a computer?

    When I log in online, I get the "we don't recognize this computer" message and I clicked through before unchecking the "register this computer" box.  Is there a way to unregister a computer? I don't like the idea of "registered computers" that have n

  • Interconnect MQSeries adapter exception MQJMS2013: invalid security authentication

    Hi, I get the MQJMS2013: invalid security authentication supplied for MQQueueManager in my MQ Series adapter for Oracle interconnect. Is there more specific information about this error. Maybe in an IBM user manual. Has anybody experience with the IB