Char set conversion to and from UTF-7

Hi,
I am working on Char set conversion to and from UTF-7. However I cannot find and ChatToByte classes for UTF-7 in i18n.jar . Any ideas as how to proceed further with this issue. I feel that there must be a way since UTF-7 is a pretty popular charset especially in emails.
Thanx in advance.
Khurram

Hi!
I had the same problem a couple of months ago. I didn't find any classes in the jdk distribution to do this, nor did I find any classes or package on the web that did this.
The solution? I got a bit of C code from one of my working mates and converted it to Java. I have the code, but my company own the code. I have to check if it�s ok to share this code with you.
BTW. If you find any package on the web that does this, pleas inform me.
Regards
Johan

Similar Messages

Char set conversion for UTF-7

Hi,
I am working on Char set conversion to and from UTF-7. However I cannot find and ChatToByte classes for UTF-7 in i18n.jar . Any ideas as how to proceed further with this issue. I feel that there must be a way since UTF-7 is a pretty popular charset especially in emails.
Thanx in advance.
Khurram

Hi Khurram,
I don't know if this will help but see this thread:
http://forums.java.sun.com/thread.jsp?forum=16&thread=25613
regards,
Joe

Char set conversion to ASMO 708, KOI8-U, x-IA5-German

hi,
I am doing some charset conversions to support various char sets widely used on the internet. The Character Sets I am supposed to support also include
"iso-8859-8-i", "ASMO-708", "koi8-u", "x-IA5-German".
However I don't find the canonical names for these encodings in the list of Supported Encodings by java.
I checked the i18n.jar but the ByteToChar classes for these encodings were not there. Is it possible to develop your own ByteToChar classes. If yes how?
Can anybody advise how to proceed forward with this?
Thanks,
Khurram

Hi Khurram,
You could catch the exceptions thrown by the String(byte[] bytes, String encoding) constructor and by thegetBytes(String encoding) method and then dispatch to your own encoding/decoding Code. Something like the following.
Post back if you need more elaboration.
Regards,
Joe
public class MyConverter {
        public static String makeString ( byte[] bytes, String encoding)
        throws UnsupportedEncodingException {
          String resultString= null;
        try {
              resultString= new String (bytes, encoding);
               return resultString;
          catch      (UnsupportedEncodingException e1){
               try {
                    resultString = convertBytes (bytes, encoding);
                    return resultString;
               catch (UnsupportedEncodingException e2) {
                    throw new UnsupportedEncodingException();
        public static String convertBytes ( byte[] bytes, String encoding)
        throws UnsupportedEncodingException {
          String resultString= null;
          if (encoding.equalsIgnoreCase ("x-IA5-German")) {
               resultString = bytesToStringx_IA5_German (bytes);
               return resultString;
          } else {
               throw new UnsupportedEncodingException();
        public static String bytesToStringx_IA5_German ( byte[] bytes) {
          //...here's where you put the actual decoding Code
}

Char Set Conversion

hi,
I am doing some charset conversions to support various char sets widely used on the internet. The Character Sets I am supposed to support also include
"iso-8859-8-i", "ASMO-708", "koi8-r", "Johab".
However I don't find the canonical names for these encodings in the list of Supported Encodings by java.
Can anybody advise how to proceed forward with this?
Thanks,
Khurram

First, look for a file called "i18n.jar" in your JVM or JRE installation. If you don't have that, you will have to get it from Sun's downloads. Once you have it, look inside it (using any zip utility). You will find a large number of files that have names like "ByteToCharBig5.class"; each of them defines a supported encoding, in the case of that example it would be "Big5". It's probably possible to create your own encoding if you don't find it in that list, but I don't know how. You might find out by searching the Internationalization forum.

Translate or conversion to and from AppleWorks

As I have been reading, there is a lot of questions about converting different formats to AppleWorks or to other formats. I have been using AppleWorks/ClarisWorks since the the beginning, Heck I still have copies of MacPaint and MacWrite from 1985. There has been a program that I have used for years, I think, I first bought it in 1996 or 97, and it has been a lifer saver for cross platform converting. It is : MacLinkPlus Deluxe, current version is 16, it is made by DataViz. There have been very few formats that I have not been able to convert and this program has been my main first option in cross application of platform conversion. My two cents for an old timer.

Hi!
I had the same problem a couple of months ago. I didn't find any classes in the jdk distribution to do this, nor did I find any classes or package on the web that did this.
The solution? I got a bit of C code from one of my working mates and converted it to Java. I have the code, but my company own the code. I have to check if it�s ok to share this code with you.
BTW. If you find any package on the web that does this, pleas inform me.
Regards
Johan

Setting "reply-to" and "from" headers"?

Using mail.app in Snow Leopard - can someone advise how I can customize the mail headers?
I use an email redirection service and publicize my "[email protected]" email address to all my friends. In reality, emails to this address get redirected to my gmail (IMAP) account which I then access via mail.app.
How do I customize the header fields such that outgoing mails that I compose on my Mac are sent with fields like:
Reply-To [email protected]
From Steve Mallard <[email protected]>
Thanks
SM

OK - found the answer to my own question - the process I followed on my Mac was perfectly correct, the issue was that Gmail was over-writing the Mac settings with its own.
You need to go into Gmail and use the "send mail from another address" option under "settings".
Hope this helps someone else avoid wasting four hours like I did!

ORA-24812: character set conversion to or from UCS2 failed

Oracle 9.2.0.4
PHP 4.3.3
data in db is in unicode UTF8
NLS_LANG=AMERICAN_AMERICA.AL32UTF8
Then reading more than N Kb from CLOB I get ORA-24812.
If clob less than N Kb it's working fine.
On 9.2.0.1 SELECT convert(myclob, 'UTF16','AL32UTF8') causing instance crash.
Can you help me ?

What PHP DB interface are you using (oci8 calls, PEAR
DB etc)?OCI8
>
How big is "N kb"? Is it always this size?I don't know ... in CLOB I store serialized php array (report), where are problem only on one realy big report.
>
What does "instance crash" mean? The whole DB
instance,
the Oracle shadow process, or the PHP session?Whole instance, but in 9.2.0.4 it fixed.

How to handle all UTF-8 char set in BizTalk?

Can any one let me know how to handle UTF-8 char set in BizTalk.
My receive file can contain any character set like ÿÑÜÜŒöäåüÖÄÅÜ . I have to support all char set under the umbrella of UTF-8.
But when i am trying to convert flat file data to xml its converting special character to ??????????.
Thanks,

That won't work because the content has been modified simply by posting it.
Let's start form the beginning:
No component will ever replace any character with '?', that just doesn't happen.
Some programs will display '?' if the byte value does not fall within the current character set, UTF-x, ANSI, ANSI+Code Page, etc.
You need to open the file with an advanced text editor such as Notepad++.
Please tell us exactly where you are seeing the '?'.
The Code Page is not an encoding itself, it is a special way of interpreting ANSI, single byte char values 0-254, in a way that supports characters beyond the traditional Extended Character Set.
You need to be absolutely sure what encoding and possibly Code Page the source app is sending. Notepad++ is pretty good at sniffing this out or just ask the sender. If you determine that it's really UTF-8, you must leave
the Code Page property blank.

CSSCAN for database character set conversion failing with ORA-01578

Hi ,
CSSCAN for database character set conversion failing with ORA-01578: ORACLE data block corrupted (file # 84, block # 23930). please help me out in this regard.
Thanks,
Sravan.

Hi Anand,
Thanks for your update. The segment is a table not an index in my case. And i got this error while running CSSCAN on Apps database for character set conversion to UTF8 from WE8ISO8859P1. Please find the snapshot below for your reference.
SQL> select segment_name, segment_type, owner from dba_extents where file_id = 84 and 23930 between block_id and block_id + blocks - 1;
SEGMENT_NAME
SEGMENT_TYPE OWNER
EDW_LOOKUP_M
TABLE POA
SQL> ANALYZE TABLE POA.EDW_LOOKUP_M VALIDATE STRUCTURE CASCADE;
ANALYZE TABLE POA.EDW_LOOKUP_M VALIDATE STRUCTURE CASCADE
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 84, block # 23930)
ORA-01110: data file 84: '/d911/oracle/dbcondata/poad01.dbf'
Thanks,
Sravan.

Character Set conversion fails

Hi,
We have 2 databases, both are UTF8. One database fetches the data from other via a DB Link. Say there are 2 databases A and B. A fetches data from B. A is in WE8ISO8859P1 character set and B is in EEC8EUROPA3 . While fetching the data european characters come as junk in database A. I tried using convert but that too fails.
Can you please suggest if there is any way where i can have the characters presereved?
Thanks.

Hi Deng,
I notice you had a similar problem to james chen, regarding
character set conversion errors and that you did indeed fix
this problem?
Re: DB Link not working
I'd really appreciate it if you could please can you post
onto James' query and let him and other members know your
solution.
Given that members of the Designer community use the forum
to work together and help other members, I think they would
also appreciate this information!
Thanks for your help.
Regards,
Dominic
Designer Prod Mgt
Oracle Corp

Character set conversion UTF-8 -- ISO-8859-1 generates question mark (?)

I'm trying to convert an XML-file in UTF-8 format to another file with character set ISO-8859-1.
My problem is that the ISO-8859-1 file generates a question mark (?) and puts it as a prefix in the file.
?<?xml version="1.0" encoding="UTF-8"?>
<ns0:messagetype xmlns:ns0="urn:olof">
<underkat>testv��rde</underkat>
</ns0:messagetype>
Is there a way to do the conversion without getting the question mark?
My code looks as follows:
public class ConvertEncoding {
     public static void main(String[] args) {
          String from = "UTF-8", to = "ISO-8859-1";
          String infile = "C:\\temp\\infile.xml", outfile = "C:\\temp\\outfile.xml";
          try {
               convert(infile, outfile, from, to);
          } catch (Exception e) {
               System.out.println(e.getMessage());
               System.exit(1);
     private static void convert(String infile, String outfile,
                                        String from, String to)
                         throws IOException, UnsupportedEncodingException
          //Set up byte streams
          InputStream in = null;
          OutputStream out = null;
          if(infile != null) {
               in = new FileInputStream(infile);
          if(outfile != null) {
               out = new FileOutputStream(outfile);
          //Set up character streams
          Reader r = new BufferedReader(new InputStreamReader(in, from));
          Writer w = new BufferedWriter(new OutputStreamWriter(out, to));
          /*Copy characters from input to output.
           * The InputSreamreader converts
           * from Unicode to the output encoding.
           * Characters that cannot be represented in
           * the output encoding are output as '?'
          char[] buffer = new char[4096];
          int len;
          while((len = r.read(buffer))!= -1) { //Read a block of output
               w.write(buffer, 0, len);
          r.close();
          w.flush();
          w.close();
}

Yes the next character is the '<'
The file that I read from is generated by an integration platform. I send a plain file to it (supposedly in UTF-8 encoding) and it returns another file (in between I call my java class that converts the characterset from UTF-8 to ISO-8859-1). The file that I get back contains the '��' if the conversion doesn't work and '?' if the conversion worked.
My solution so far is to skip the first "junk-characters" when reading from the inputstream. Something like:
private static final char UTF_BOM = '\uFEFF'; //UTF-BOM = ?
String from = "UTF-8", to = "ISO-8859-1";
if (from != null && from.toLowerCase().startsWith("utf-")) { //Are we reading an UTF encoded file?
/*Read first character of the UTF-Encoded file
It will return '?' in the first position if we are dealing with UTF encoding If ? is returned we skip this character in the read
try {
r.mark(1); //Only allow to read one char for the reset function to work
char c;
int i = r.read();
c = (char) i;
if (String.valueOf(UTF_BOM).equalsIgnoreCase(String.valueOf(c))) {
r.reset(); //reset to start position
r.skip(1); //Skip first character when reading from the stream
else {
r.reset();
} catch (IOException e) {
e.getMessage();
//return null;
}

Conversion from UTF to EBCDIC format- Urgent

I want to convert the data from UTF* format to EBCDIC format and create a flat file for the same. Is there any utility tools available in oracle. If not what are the other possibilities? pls help me.
Thanks
Krishnaraj

Hi ,
I am trying to convert the normal text data to EBCDIC. As we all know, there is a corresponding value for each normal character in ASCII/HEX/BINARY/EBCDIC etc.
Using CONVERT I am able to see some data converted correctly but rest of hte accented characters not correctly.
select convert('^', 'US7ASCII','EBCDIC' ) from dual;
select convert(';' ,'WE8EBCDIC500','US7ASCII') from dual;
^ = ascii normal txt
; = corresponding ebcdic of ^
Internally CONVERT function seems to be doing correct conversion but there is a problem wiht the actually new - to be replaced characters. SQLPLUS is not able to display all the characters correctly. For all the accented chars of "a", it shows plain english a, same wiht e , u etc.
I would like to know is there anyone who knows what client side settings need to be done so that the CONVERT function output is displayed correctly on SQLPLUS.
I am using Oracle 9i Rel 2
NLS_LANG on my client (win XP) is set to AMERICAN_AMERICA.WE8MSWIN1252
As seen in the CONVERT function, the correct charset is - 'WE8EBCDIC500'
And the db params are as follows --
===========================================
SQL> select * from NLS_DATABASE_PARAMETERS;
PARAMETER VALUE
NLS_LANGUAGE AMERICAN
NLS_TERRITORY AMERICA
NLS_CURRENCY $
NLS_ISO_CURRENCY AMERICA
NLS_NUMERIC_CHARACTERS .,
NLS_CHARACTERSET WE8ISO8859P1
NLS_CALENDAR GREGORIAN
NLS_DATE_FORMAT DD-MON-RR
NLS_DATE_LANGUAGE AMERICAN
NLS_SORT BINARY
NLS_TIME_FORMAT HH.MI.SSXFF AM
NLS_TIMESTAMP_FORMAT DD-MON-RR HH.MI.SSXFF AM
NLS_TIME_TZ_FORMAT HH.MI.SSXFF AM TZR
NLS_TIMESTAMP_TZ_FORMAT DD-MON-RR HH.MI.SSXFF AM TZR
NLS_DUAL_CURRENCY $
NLS_COMP BINARY
NLS_LENGTH_SEMANTICS BYTE
NLS_NCHAR_CONV_EXCP FALSE
NLS_NCHAR_CHARACTERSET AL16UTF16
NLS_RDBMS_VERSION 9.2.0.1.0
===================================================
Can anyone help me on this??
Thanks in advance
regards
Abhivyakti,
Pune, India

XML data from BLOB to CLOB - character set conversion

Hi All,
I'm trying to solve a problem with a character set conversion in PL/SQL in the following scenario:
1. source is an XML as a BLOB variable.
2. target is an XML as a CLOB variable.
3. the problem I have is the following:
- database character set is set to UTF-8
- XML character set could be anything (UTF-8, ISO 8859-1, ISO 8859-2, ASCII, ...)
- I need to write a procedure which converts the source BLOB content into the target CLOB taking into account the XML encoding and converts it into the DB default character set (UTF8).
I've been able to implement a simple conversion function. However, this function expects static XML encoding ISO-8859-1. The main part of the function looks as follows:
buffer := UTL_RAW.cast_to_varchar2(
UTL_RAW.convert(
DBMS_LOB.SUBSTR(source_blob_variable, 16000, pos)
, 'American_America.UTF8'
, 'American_America.we8iso8859p1')
Does anyone have an idea how to rewrite the code to handle "any" XML encoding in the source BLOB file? In other words, is there a function in Oracle which converts XML character set names into Oracle character set values (ISO-8859-1 to we8iso8859p1, UTF-8 to UTF8, ...)?
Thanks a lot for any help.
Julius

I want to pass a BLOB to some "createXML" procedure and get a proper XMLType in UTF8 character set, properly converted from whatever character set is the input in.As per documentation the generated XML has always the encoding set at the client side depending on NLS_LANG (default UTF-8), regardless of the input encoding, so I don't see a need to parse the PI of the XML:
C:\>echo %NLS_LANG%
%NLS_LANG%
C:\>sqlplus
SQL*Plus: Release 11.1.0.6.0 - Production on Wed Apr 30 08:54:12 2008
Copyright (c) 1982, 2007, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
SQL> var cur refcursor
SQL>
SQL> declare
2     b   blob := utl_raw.cast_to_raw ('<a>myxml</a>');
3 begin
4     open :cur for select xmlroot (xmltype (utl_raw.cast_to_varchar2 (b))) xml from dual;
5 end;
6 /
PL/SQL procedure successfully completed.
SQL>
SQL> print cur
XML
<?xml version="1.0" encoding="UTF-8"?><a>myxml</a>
SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
C:\>set NLS_LANG=GERMAN_GERMANY.WE8ISO8859P1
C:\>sqlplus
SQL*Plus: Release 11.1.0.6.0 - Production on Mi Apr 30 08:55:02 2008
Copyright (c) 1982, 2007, Oracle. All rights reserved.
SQL> var cur refcursor
SQL>
SQL> declare
2     b   blob := utl_raw.cast_to_raw ('<a>myxml</a>');
3 begin
4     open :cur for select xmlroot (xmltype (utl_raw.cast_to_varchar2 (b))) xml from dual;
5 end;
6 /
PL/SQL-Prozedur erfolgreich abgeschlossen.
SQL>
SQL> print cur
XML
<?xml version="1.0" encoding="ISO-8859-1"?>
<a>myxml</a>

HTTP-Receiver: Code page conversion error from UTF-8 to ISO-8859-1

Hello experts,
In one of our interfaces we are using the payload manipulation of the HTTP receiver channel to change the payload code page from UTF-8 to ISO-8859-1. And from time to time we are facing the following error:
u201CCode page conversion error UTF-8 from system code page to code page ISO-8859-1u201D
Iu2019m quite sure that this error occurs because of non-ISO-8859-1 characters in the processed message. And here comes my question:
Is it possible to change the error behaviour of the code page converter, so that the error will be ignored?
Perhaps the converter could replace the disruptive character with e.g. u201C#u201D?
Thank you in advance.
Best regards,
Thomas

Hello.
I'm not 100% sure if this will help, but it's a good Reading material on the subject (:
[How to Work with Character Encodings in Process Integration (NW7.0)|http://www.sdn.sap.com/irj/scn/index?rid=/library/uuid/502991a2-45d9-2910-d99f-8aba5d79fb42]
The part of the XSLT / Java mapping might come in handy in your situation.
you can check for problematic chars in the code.
Good luck,
Imanuel Rahamim.

Conversion ISO-8859-7- UTF-8 and UTF-8 - ISO-8859-7

Hi, I written this function to do a Charset conversion
from ISO-8859-7 to UTF-8 and vice versa
void ChangeChersetEncoding(String EncodingType)
String GrammarText;
try
GrammarText = Editor.getText();
b = GrammarText.getBytes(LastEncoding);
String strTemp = new String(b,EncodingType);
Editor.setText(strTemp);
LastEncoding = EncodingType;
catch (UnsupportedEncodingException e)
JOptionPane.showMessageDialog(this, "Error: " + e.getMessage
() , "Error", JOptionPane.ERROR_MESSAGE);
The steps followed are:
1)I initialize Editor (that is a JEditorPane) with a InputStreamReader, that use by default "CP1252"(window - latin1) charset encoding.
2)When I call the function the first time with EncodingType = "ISO-8859-7" and LastEncoding = "CP1252"(window - latin1), Editor shows greek character as I aspected.
3)When I call the function the second time with EncodingType = "UTF-8" and LastEncoding = "ISO-8859-7", Editor shows unknown character ('�') as I aspected.
4)The problem is when I call the function the third time with EncodingType = "ISO-8859-7" and LastEncoding = "UTF-8" Editor don't show the original greek text, as I didn't aspect.
Thank you for all.

b = GrammarText.getBytes(LastEncoding);
String strTemp = new String(b,EncodingType);Here you take a String (which is in Unicode) and convert it to bytes, using "LastEncoding". Next you take those bytes and convert them back to a String, assuming that they were encoded using "EncodingType". But they weren't, so at best this will do nothing and at worst it will produce garbage. It certainly won't do anything useful.
As I said all Java strings are in Unicode. If you want to convert something from one encoding to another encoding, you can only convert an array of bytes to a String using the first encoding, then convert that back to bytes using the second encoding. Converting a String to a String just makes no sense.

Char set conversion to and from UTF-7

Similar Messages

Maybe you are looking for