Convert to ASCII characters

We are extracting information, such as article descriptions to send to some legacy systems. These systems can only accept ASCII characters (space, A-Z, 0-9, a-z, and the standard characters on the US keyboard). These are decimal values 32 through 126 or hex values 20 through 7E.
SAP apparently allows entries of decimal values 127 and greater (such as ½).
We need to either remove or convert such characters to spaces.
Is there a function to do this? Or some simple command?
Thanks in advance for your help.

Thank you both for your suggestions but I couldn't get any of these functions to do what I want.
We have a description for an article that includes "Size 7 ½". The ½ symbol causes a problem with interfaces to some of our legacy systems.
I know I can replace the ½ symbol with a space, but eventually some user will use yet another symbol (such as ¼) and I don't want to try to create a replace for each of these symbols.
I would like to investigate the decimal or hex equivilent of each character and replace those out of range.
Is there any function that would allow this?
Thanks,

Similar Messages

Converting non-ascii characters generated by MS word

Hello,
I've encountered some files that were originally exported from MS Word as html. The problem is they contain some characters that fall into the 128 to 255 range. Some appear to be fancy quotes and apostrophes, but others I just can't figure out. On a mac or Firefox on windows they appear as:
Ö ë í ì î ñ ô † © Æ ∑ ∆ “ ÷ › · Î Ï Ì Ó Ô Ò Ù
The decimal values of the above chars are:
133 145 146 147 148 150 153 160 169 174 183 198 210 214 221 225 235 236 237 238 239 241 244
As charater entities they appear as:
… ‘ ’ “ ” – ™ © ® · Æ Ò Ö Ý á ë ì í î ï ñ ô
Before I try to reinvent a square wheel, I thought I'd ask here if anyone knows of an existing command line tool that might help with this.
Cole
15 PB Mac OS X (10.3.9)

Thanks for all the replies. I think I've solved the problem. It indeed was a problem with high bit WinLatin1 (cp 1252) characters. Here's a technote that discusses the problem. So I wrote a short perl script based on this table:
<pre style="overflow: auto;font-size:small; font-family: Monaco, 'Courier New', Courier, monospace; color: #222; background: #ddd; padding: .3em .8em .3em .8em; font-size: 10px;">#!/usr/bin/perl -wpi
# Define an array for double byte unicode characters
# Undefined characters are marked as 0.
my @uni = (
8364, 0, 8218, 402, 8222, 8230, 8224, 8225,
710, 8240, 352, 8249, 338, 0, 381, 0, 0,
8216, 8217, 8220, 8221, 8226, 8211, 8212,
732, 8482, 353, 8250, 339, 0, 382, 376
# Characters 128 through 159 are mixed set of double byte unicode characters,
# so get these out of our $uni array. Undefined characters in this range are deleted.
s/([\x80-\x9f])/ $uni[ord($1)-128] ? sprintf("&#%d;", $uni[ord($1)-128]) : ""/eg;
# Characters 160 through 255 can be used as is.
s/([\xa0-\xff])/sprintf("&#%d;", ord($1))/eg
</pre>I only hope that perl is clever enough to not create the $uni array for each line. Anyone happen to know?
Thanks for any tips.
Cole

Convert smart quotes and other high ascii characters to HTML

I'd like to set up Dreamweaver CS4 Mac to automatically convert smart quotes and other high ASCII characters (m-dashes, accent marks, etc.) pasted from MS Word into HTML code. Dreamweaver 8 used to do this by default, but I can't find a way to set up a similar auto-conversion in CS 4. Is this possible? If not, it really should be a preference option. I code a lot of HTML emails and it is very time consuming to convert every curly quote and dash.
Thanks,
Robert
Digital Arts

I too am having a related problem with Dreamweaver CS5 (running under Windows XP), having just upgraded from CS4 (which works fine for me) this week.
In my case, I like to convert to typographic quotes etc. in my text editor, where I can use macros I've written to speed the conversion process. So my preferred method is to key in typographic letters & symbols by hand (using ALT + ASCII key codes typed in on the numeric keypad) in my text editor, and then I copy and paste my *plain* ASCII text (no formatting other than line feeds & carriage returns) into DW's DESIGN view. DW displays my high-ASCII characters just fine in DESIGN view, and writes the proper HTML code for the character into the source code (which is where I mostly work in DW).
I've been doing it this way for years (first with GoLive, and then with DW CS4) and never encountered any problems until this week, when I upgraded to DW CS5.
But the problem I'm having may be somewhat different than what others have complained of here.
In my case, some high-ASCII (above 128) characters convert to HTML just fine, while others do not.
E.g., en and em dashes in my cut-and-paste text show as such in DESIGN mode, and the right entries
    –
    —
turn up in the source code. Same is true for the ampersand
    &
and the copyright symbol
    ©
and for such foreign letters as the e with acute accent (ALT+0233)
    é
What does NOT display or code correctly are the typographic quotes. E.g., when I paste in (or special paste; it doesn't seem to make any difference which I use for this) text with typographic double quotes (ALT+0147 for open quote mark and ALT+0148 for close quote mark), which should appear in source code as
    “[...]”
DW strips out the ASCII encoding, displaying the inch marks in DESIGN mode, and putting this
    "[...]"
in my source code.
The typographic apostrophe (ALT+0146) is treated differently still. The text I copy & paste into DW should appear as
    [...]’[...]
in the source code, but instead I get the foot mark (both in DESIGN and CODE views):
I've tried adjusting the various DW settings for "encoding"
    MODIFY > PAGE PROPERTIES > TITLE/ENCODING > Encoding:
and for fonts
    EDIT > PREFERENCES > FONTS
but switching from "Unicode (UTF-8)" to "Western European" hasn't solved the problem (probably because in my case many of the higher ASCII characters convert just fine). So I don't think it's the encoding scheme I use that's the problem.
Whatever the problem is, it's caused me enough headaches and time lost troubleshooting that I'm planning to revert to CS4 as soon as I post this.
Deborah

How can I convert ASCII characters to ISO8859?

Hi All,
I have written a little application that renames a TV episode by scraping a TV listing site for the episode name. It is written in SWT and works great apart from on small problem. When getting the html back from the site, it sometimes contains special ASCII characters that are not in the ISO8859 (Windows filesystem) character set.
For example, this is the line that I have to parse:
<td style='padding-left: 6px;' class='b2'><a href='/Prison_Break/episodes/569183/03x01'>Orientaci��n</a></td>When viewing it in a browser, it is:
<td style="padding-left: 6px;" class="b2"><a href="/Prison_Break/episodes/569183/03x01">Orientaci�n</a></td>Notice that the o in the title has an accent on it. While researching this problem I stumbled across 'HTML Entities to ISO 8859-1 Converter' at http://www.inweb.de/chetan/English/Resources/Java/HTML%202%20ISO.html. This open source project takes in an html entity like & and returns '&'.
So that is not quite what I want, as my BufferedReader is converting the html entity into the ASCII representation already. I need a way of detecting a non ISO8859 character within an ASCII string, and hopefully replacing its natural 'equivalent' (would be o in this case).
Does anyone know how I could do it without having to check for every special char and replacing (not really an option unless someone has done it before!!)
If not that then, perhaps another way to attack the problem?
Any help greatly appreciated ;)
Dave

Hi,
NZ_Dave wrote:
For example, this is the line that I have to parse:
<td style='padding-left: 6px;' class='b2'><a href='/Prison_Break/episodes/569183/03x01'>Orientaci��n</a></td>
This is coded in UTF-8. If you convert the bytes to a String using the UTF-8 encoding, then you will have the correct characters "Orientaci�n" in the string.
Check your parser where it converts the bytes (coming from e.g. an InputStream) to characters. Use UTF-8 as the charset when doing that conversion.

Converting a hexadecimal string to ASCII-characters.

I have long hexadecimal strings that I wish to convert to their corresponding ascii-characters. I know there are a series of functions for doing things like this - hex to number, number to string etc.
At the moment, however, I am stuck at entering the hexadecimal string. I connect it to "hexadecimal string to number". What I get out is the decimal value of the two last digits of the hexadecimal number. No other wires are connected to the function. This means data is lost. How do I get around this? Is this particular function at all suitable for what I am trying to do?

Hi Tzench,
"hexadecimal string" isn't very accurate and conversion questions have been discussed many times before...
See example on conversion of two different "hexadecimal" strings. There are other conversion methods, but those are most easiest to understand
Message Edited by GerdW on 12-08-2008 11:27 AM
Best regards,
GerdW
CLAD, using 2009SP1 + LV2011SP1 + LV2014SP1 on WinXP+Win7+cRIO
Kudos are welcome
Attachments:
HexString_LV71.vi ‏35 KB

Problem convertting certain extended ascii characters

I'm having problems with the extended ascii characters in the range 128-159. I'm working with SQL server environment using java. I originally had problems with characters in the range 128-159 when I did a 'select char_col from my_table' I always get junk when I try to retreive it from the ResultSet using the code 'String str = rs.getString(1)'. For example char_col would have the ascii character (in hex) '0x83' but when I retrieved it from the database, my str equaled '0x192'. I'm aware there is a gap in the range 128-159 in ISO-8859-1 charset. I've tracked the problem to be a charset issue converting the extended ascii characters in ISO-8859-1 into java's unicode charset.
I looked on the forum and it said to try to specify the charset when I retreived it from the resultset so I did 'String str = new String(rs.getBytes(), "ISO-8859-1")' and it was able to read the characters 128-159 correctly except for five characters (129, 141, 143, 144, 157). These characters always returned the character 63 or 0x3f. Does anyone who what's happening here? How come these characters didn't work? Is there a workaround this? I need to use only use java and its default charsets and I don't want to switch to the windows Cp1252 charset cuz I'm using the java code in a unix environment as well.
thanks.
-B

Normally your JDBC driver should understand the charset used in the database, and it should use that charset to produce a correct value for the result of getString(). However it does sometimes happen that the database is created by programs in some other language that ignore the database's charset and do their own encoding, bypassing the database's facilities. It is often difficult to deal with that problem, because the custodians of those other programs don't have a problem, everything is consistent for them, and they will not allow you to "repair" the database.
I don't mean to say that really is your problem, it is a possibility though. You are using an SQL Server JDBC driver, aren't you? Does its connection URL allow you to specify the charset? If so, try specifying that SQL-Latin1 thing and see if it works.

Non ASCII characters are converted to '?' or ASCII characters

Non ASCII symbols like æ ø in the xml file have been converted to ? or other ascii characters.
What could be the reason behind this.

Mayil wrote:
This file we are loading through the Flex application in the front end.
Through java class file we are making changes to this city.xml file and adding and deleting this information in the city.xml.
Now suddenly, i dont know what happen.. 'ø' in the city name has replaced with the '?'
If we try to chaange this to 'ø' also, it again changes to '?'.
I dont know how to rectify this error.I would suggest you start by finding out when it happens. Does it happen as soon as you change the XML through this mysterious "java class file"? Or does it happen when Flex reads it? And is the underlying file actually changing, or are you just seeing those question marks after Flex handles the file?
In short a much better problem description is necessary.

Non ascii characters being sent from a parameter in a form

Hi!
I have seen many topics posted on passing non ascii characters through parameters from one servlet to another and converting them into whatever format is necessary.
However, I have not seen anyone answer the following question. I have a jsp page (html) with the character encoding set to utf-8. The user inputs some data in to a text field which is inside a form. The data could be in non ascii characters such as hebrew or arabic. This form is then sent to another jsp where i try to retreive the data from teh text field. No matter what i do, i cannot get the data presented correctly. It is either question marks or other wierd symbols.
I have tried every permetation of encoding of the actual html page, the ecoding of the string from request.getParameter etc but it still is not presented on the new html page correctly.
Can anyone help??
Spencer

Ok, I solved the problem.
I had to put at the top request.setCharacterEncoding("utf-8");
Spencer

Replacing non-ASCII characters with HTML charcter references

Hi All,
In Oracle 10g or greater is there a built-in function that will convert a string with non-ASCII characters like this
a b č 뮼
into an ASCII string with HTML character references like this?
a b & # x 0 1 0 D ; & # x B B B C ;
(note I had to include spaces between each character in the sample code for message to prevent the forum software from converting my text)
I tried using
utl_i18n.escape_reference( val, 'us7ascii' )
but for some reason it returns
a b c & # x B B B C ;
Note how it converted the Western European character "č" to its unaccented counterpart "c", not "& # x 0 1 0 D ;" (is this a bug?).
I also tried a custom solution using regexp_replace and asciistr (which I can't include here because the forum software chokes on it) but it only returns the correct result for values <=4000 characters long. Unfortunately asciistr doesn't appear to accept CLOB values larger than 4000 characters. It returns an error message like
(ORA-22835: Buffer too small for CLOB to CHAR or BLOB to RAW conversion (actual: 30251, maximum: 4000) ).
I'm looking for a solution that works on CLOB data of any size.
Thanks in advance for any insight you can provide.
Joe Fuda

So with that (UTF8) in mind, let's take another look.....
As shown below, I used a AL32UTF8 database.
Note: I did not use a unicode capable tool for querying. So I set console mode code page to 1250 just to have č displayed properly (instead of posing as an è).
Also, as a result of using windows-1250 for client character set, in the val column and in the second select's ncr column (iso8859-1), è (00e8) has been replaced with e through character set conversion going from server back to client.
Running the same code on a database with a db character set such as we8mswin1252, that doesn't define the č (latin small c with caron) character, would yield results with a c in the ncr column.
C:\>chcp 1250
Aktuell teckentabell: 1250
C:\>set nls_lang=.ee8mswin1250
C:\>sqlplus test/test
SQL*Plus: Release 11.1.0.6.0 - Production on Fri May 23 21:25:29 2008
Copyright (c) 1982, 2007, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - Production
With the OLAP option
SQL> select * from nls_database_parameters where parameter like '%CHARACTERSET';
PARAMETER              VALUE
NLS_CHARACTERSET       AL32UTF8
NLS_NCHAR_CHARACTERSET AL16UTF16
SQL> select unistr('\010d \00e8') val, utl_i18n.escape_reference(unistr('\010d \00e8'),'us7ascii') NCR from dual;
VAL NCR
č e c e
SQL> select unistr('\010d \00e8') val, utl_i18n.escape_reference(unistr('\010d \00e8'),'we8iso8859p1') NCR from dual;
VAL NCR
č e &# x10d; e     <- "è"
SQL> select unistr('\010d \00e8') val, utl_i18n.escape_reference(unistr('\010d \00e8'),'ee8iso8859p2') NCR from dual;
VAL NCR
č e č &# xe8;
SQL> select unistr('\010d \00e8') val, utl_i18n.escape_reference(unistr('\010d \00e8'),'cl8iso8859p5') NCR from dual;
VAL NCR
č e &# x10d; &# xe8;In the US7ASCII case, where it should be possible for all non-ascii characters to be escaped, it seems as if the actual escape step is skipped over.
Hope this helps to understand whether utl_i8n is usable or not in your case.
Message was edited by:
orafad
Fixed replaced character references :)

Non-ASCII Characters in AppleWorks

New to the forums and hope I'm not duplicating a prior post ... eons back, I used ClarisWorks on an old Mac SE/30, then moved over to AppleWorks which I got to use on my PC because I wanted to go back and work on something I had from years ago. The document in question contained non-ASCII characters. Anyway, I imported it into AppleWorks and what was once a Cyrillic font, was converted to ASCII (now a bunch of gibberish). I fully recognize I need to find the original font I used for the Cyrillic, but before I go that route, it looks like AppleWorks doesn't support non-ASCII characters ... ergo, I'm wasting my time. Am I wrong?
Thanks.
Windows 2.8 GHz Windows XP Pro

You may need to embed the font:
http://livedocs.adobe.com/flex/3/html/help.html?content=fonts_04.html
If this post answers your question or helps, please mark it as such.

[SOLVED] Networkmanager and special ascii characters kde issue

cannot connect to ssid with french characters like 'é' and other chars in networkmanager kde
it work fine in gnome but in kde the networkmanager dont recognize non english letters
have any one a way to use Hex value instead of ascii in ssid
Last edited by jambi (2014-07-14 00:03:43)

You need to configure the profile manually in
/etc//etc/NetworkManager/system-connections/
then create a file named
(null) 1
[connection]
id=(null) 1
type=802-11-wireless
[802-11-wireless]
ssid="SSID IN HEX"
mode=infrastructure
mac-address=xx:xx:xx:xx:xx
security=802-11-wireless-security
[802-11-wireless-security]
key-mgmt='wireless security'
auth-alg=open
psk=password
you can convert from ascii to hex using terminal
echo 'ssid' | xxd -u -p
connectioneditor in kde need more handling functions to be added . i hope in next release they build a robust connectioneditor
Last edited by kortez (2014-07-07 21:15:45)

Need to find out extended ASCII characters in database

Hi All,
I am looking for a query that can fetch list of all tables and columns where there is a extended ASCII character (from 128 to 256). Can any one help me?
Regards
Yadala

yadala wrote:
Hi All,
I am looking for a query that can fetch list of all tables and columns where there is a extended ASCII character (from 128 to 256). Can any one help me?
Regards
YadalaThis should match your requirement:
select t.TABLE_NAME, t.COLUMN_NAME from ALL_TAB_COLUMNS t
where length(asciistr(t.TABLE_NAME))!=length(t.TABLE_NAME)
or length(asciistr(t.COLUMN_NAME))!=length(t.COLUMN_NAME);The ASCIISTR function returns an ASCII version of the string in the database character set.
Non-ASCII characters are converted to the form \xxxx, where xxxx represents a UTF-16 code unit.
The CHR function is the opposite of the ASCII function. It returns the character based on the NUMBER code.
ASCII code 174
SQL> select CHR(174) from dual;
CHR(174)
Ž
SQL> select ASCII(CHR(174)) from dual;
ASCII(CHR(174))
174
SQL> select ASCIISTR(CHR(174)) from dual;
ASCIISTR(CHR(174))
\017DASCII code 74
SQL> select CHR(74) from dual;
CHR(74)
J
SQL> select ASCII(CHR(74)) from dual;
ASCII(CHR(74))
74
SQL> select ASCIISTR(CHR(74)) from dual;
ASCIISTR(CHR(74))
J

Translation of UTF8 stream to sequence of ASCII characters

Hello,
I need an advice how to translate UTF8 binary stream of characters to ASCII characters. Translation will depends on the Locale (language) used.
For example, if UTF8 character � (C381 in HEX) is used in Czech language I will need to translate it to two ASCII characters Ae; if the same � character used in French language I will need to translate it to character A. Binary Stream will have some ACSII characters which will not need any translation as well.
Please, advise.
Thank you.
A Mickelson

The Java compiler and other Java tools can only process files, which contain Latin-1 and/or Unicode-encoded (\udddd notation) characters. Native2ascii converts files, which contain other character encodings into files containing Latin-1 and/or Unicode-encoded characters.
String command = "native2ascii -encoding UTF-8 sourceFileName targetFileName�;
Process child = Runtime.getRuntime().exec(command);

Contains query fails for extended ascii characters

I have an Oracle 9.2 instance whose characterset is WE8MSWIN1252. I'm using the same characterset on my client. If I have a LONG column that contains extended-ascii characters (the example I'm using has the Euro character 'â¬', but I've seen the same problem with other characters), and I'm using the Intermedia service to index that column, then this select statement returns no records even though it should find several:
select id from table1 where (contains(long_col,'â¬',1) > 0);
However, the same select statement looking for something else, like 'e', works just fine.
What am I doing wrong? I can do a "like" query against a VARCHAR2 column with a Euro character, and it works correctly. I can do a "dbms_lob.instr" query against a CLOB column with a Euro character, and it also works. It's just the "contains" query against a LONG column that fails.

There are a number of limitations in using Long datatypes. If you check the SQL Reference you will see: "Oracle Corporation strongly recommends that you convert LONG columns to LOB columns as soon as possible. Creation of new LONG columns is scheduled for desupport.
LOB columns are subject to far fewer restrictions than LONG columns. Further, LOB functionality is enhanced in every release, whereas LONG functionality has been static for several releases."

How do I convert an ASCII character to an array of co-ordinates.

I need to convert and ASCII character to an array of X, Y co-ordinates. I also need to be-able to vary the size of the text (scale of graph i suppose) and position on the graph So i can desplay multiple characters on a graph. However it needs to be stored in an array (or set of arrays) so i can isue these co-ordinates to an instrument.

Maybe the attached VI can help. Using picture control functions, it get the
1bit bitmap of the character/text
on input in a 2D array of booleans.
Jean-Pierre Drolet
"m0mbaj0mba" a écrit dans le message news:
[email protected]..
> I am trying to find a simple way to convert a letter (ASCII character)
> into an array of X,Y co-ordinates. I am involved in two projects that
> involve spelling letters with lasers. At the moment we are plotting
> the points on a graph in excel, transferring the co-ordinates into a
> text file and then converting the content of these text files into a
> set on 1D arrays. As I am sure you can appreciate this is a very long
> winded process. Is there anyway of pl
otting points on an X,Y, graph
> and outputting those points to an array or set of arrays?
>
> Excel spreadshett is attached.
[Attachment GetTextBitmap.vi, see below]
LabVIEW, C'est LabVIEW
Attachments:
GetTextBitmap.vi ‏45 KB

Convert to ASCII characters

Similar Messages

Maybe you are looking for