Convert a UTF-8 string to ISO-8859-1 string

Hello. As you can see from my other post, I am working on internationalization. I could not find an appropriate entry in the forum already.
I want to convert form data (submitted from an HTML UTF-8 charset page) from the UTF-8 format to ISO-8859-1 format. How do I do that?
I.e.
String utfFormat="視聴者";
String isoFormat="";
// Do magic here
System.out.println(isoFormat); // out: "しての" (or whatever it is)
Can you help?
Dailysun
null

As I said in the other thread (did you read that, BTW?), you shouldn't have to bother with actual character-set conversions. You just tell the InputStream what the Charset is when you read it in, and the OutputStream what Charset to use when you write it out.
What you're doing is escaping characters by replacing them with numeric entity references--the opposite of what you asked in the other thread. The process is just as simple: cast the char to an int, convert that to a string with String.valueOf(int), and add the "&#" and ";". You can use a regex-based approach like I did over there, but going in this direction, it will be just as easy without them.
Hiwa, check out that other thread; I think you'll find it amusing (in light of that second link you posted).

Similar Messages

  • Converting String to ISO-8859-1 html charset

    i want to convert string to ISO-8859-1 html charset or vice versa
    For example i need to replace "ö" as  "ö"
    How can i do that?
    http://www.unicodetools.com/unicode/utf8-to-latin-converter.php

    i want to convert string to ISO-8859-1 html charset or vice versa
    For example i need to replace "ö" as  "ö"
    How can i do that?
    http://www.unicodetools.com/unicode/utf8-to-latin-converter.php
    This seems to return #246; but not ö for ö. Unless the & character is not getting displayed for some reason.
    HttpUtility.HtmlEncode Method (String)
    HttpUtility.HtmlDecode Method (String, TextWriter)
    Option Strict On
    Imports System.Web
    Imports System.IO
    Public Class Form1
    Private Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load
    Me.Text = "Form1"
    End Sub
    Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
    Dim myString As String = "ö"
    Dim myEncodedString As String = HttpUtility.HtmlEncode(myString)
    Label1.Text = " " & myEncodedString & " "
    Dim myWriter As New StringWriter()
    HttpUtility.HtmlDecode(myEncodedString, myWriter)
    Label1.Text &= myWriter.ToString
    End Sub
    End Class
    La vida loca

  • Encoding ISO-8859-1 String

    I have a code, i want to decode iso88591string, but i got a $=�?=f?se;�rng;�?;�
    Is any sugestions how to convert iso8859-1 to windows 1257, Thanks
    public static void main(String[] args)
    String isoString= new String("$=�Ū=fżse;�rng;�Ż;�");
    byte[] stringBytesIso;
         try
              stringBytesIso = isoString.getBytes("ISO-8859-1");
                   String utf8String = new String(stringBytesIso, "windows-1257");
                        System.out.print(stringBytesIso+" "+utf8String);
         catch (UnsupportedEncodingException e) { e.printStackTrace();}
    }

    Were you expecting Java to magically convert "Ū" into some character?
    Doesn't work that way.I agreed. What you need is a diacritic mapping. Unfortunately, there's no easy way of doing the mapping.
    on solution is to have a HashMap where
    key = character above 256
    value=similiar character to Window ASCII extended
    anything less than 256, you don't need to convert
    also, looks for patterns to reduce the diacritic mapping hashmap.

  • How to switch DB string storage format from/to UTF-8 to/from ISO 8859-1 ?

    As far as I understand strings in tables are stored by default in an Oracle DB in ISO 8859-1 format.
    How can I switch the storage to UTF-8 format?
    Do I have to change just a parameter (which ?) or do I have to setup/install the whole DB again?
    If just a parameter switching is necessary:
    How can I change already existing strings from one format to another?
    Does it take place automatically or do I have to issue and explicite convert command (which ?).
    Peter

    Please refer to
    http://download.oracle.com/docs/cd/B19306_01/server.102/b14225/ch2charset.htm#sthref157
    And
    http://download.oracle.com/docs/cd/B19306_01/server.102/b14225/ch11charsetmig.htm#sthref1476
    Sybrand Bakker
    Senior Oracle DBA

  • UTF-8 encoding vs ISO 8859-1 encoding

    The iTunes tech specs call for UTF-8 encoding of the XML feed file; a friend of mine uses feed generator software through his blog that uses ISO 8859 encoding. Is there a way to convert the latter to UTF-8 so that iTunes tags may be successfully added?
    When I tried editing his XML file, I got error messages when I submitted the file to RSS feed validator sites (such as http://feedvalidator.org/. Any help or knowledge is appreciated because I am not the least bit expert in this coding arena.

    You don't need to convert iso 8859-1 (us-ascii) to utf-8 unless you have nonstandard characters. Basically, ascii is a subset of utf-8 and for English it will serve you just fine. You can have iTunes tags in the xml file even if the file itself is encoded in iso 8859-1.
    The error you see at feedvalidator.org is most likely a warning.
    Hope this helps!
    - Andy Kim
    Potion Factory
    http://www.potionfactory.com

  • Default UTF-8 encode to Iso-8859-1 How to?

    Excused my English, I have a problem with JCreator 2 early access, when, in JSP page, change encode it from Utf-8 to Iso-8859-1 JCreator of default restores to Utf-8, as I can make? I cannot use Italian chars as "� "
    Thanks to who will want to help me.

    Hi,
    Please post the queries related to creator 2EA to the creator 2 EA discussion at https://feedbackprograms.sun.com/login.html
    regards

  • XML file containing an ISO-8859-1encoded string not able to see charcater

    We have an XML file with the following encoding:
    <?xml version="1.0" encoding="ISO-8859-1"?>
    The Oracle 10gR2 database is set with the following encoding.
    PARAMETER VALUE
    NLS_NCHAR_CHARACTERSET AL16UTF16
    NLS_CHARACTERSET WE8ISO8859P1
    There are some characters that are not displaying within SQL Developer, how do I diagnose this as a database or a client issue?

    Guys, why nobody takes care of at least telling us what characters you cannot see. You describe two random elements of the whole application and expect people to give reasonable answers. Please, read through this forum a bit and you will see how many configuration elements may matter. Then, take some time to describe your problem, including version of software, exact (!) symptoms of the problem, etc. Otherwise, you can just get some guesses instead of a solution!
    -- Sergiusz

  • Reading a website in ISO-8859-1

    Hello
    I am trying to read a website using the ISO-8859-1 charset.
    I have searched a bit and found some different ways suggested for this. This is the one I think I want because it seems to be the simpler one.
    byte[] iso88591Data = theString.getBytes("ISO-8859-1");But I don't understand the "flow" of the charsets:
    1. When I read an html that has a #&<code> on it my string is in utf-8 or ISO-8859-1?
    2. When the getBytes command is used, the specified charset is the one I want it to convert it to or the one it is in?
    To understand this problem I did a separate class where I tried the following code.
    import java.io.UnsupportedEncodingException;
    public class charsetConversion {
         public static void main(String[] args) {
              String in = args[0];
              byte bytes[] = in.getBytes();
              try {
                   byte bytesISO[] = in.getBytes("ISO-8859-1");
                   String out1 = new String(bytes, "ISO-8859-1");
                   String out2 = new String(bytesISO, "ISO-8859-1");
                   String out3 = new String(bytes);
                   String out4 = new String(bytesISO);
                   System.out.println(out1);
                   System.out.println(out2);
                   System.out.println(out3);
                   System.out.println(out4);
              } catch (UnsupportedEncodingException e) {
                   e.printStackTrace();
    }I run it with the input Poole&#x27 ;s and always get Poole&#x27 ;s. It doesn't have a space between 7 and ; but if I didn't write like that it always shows ' instead.
    Don't really know what else to do.

    So here is the example of the code
    import java.io.BufferedReader;
    import java.io.FileReader;
    import java.io.UnsupportedEncodingException;
    public class charsetConversion {
         public static void main(String[] args) {
              FileReader fr = null;
              BufferedReader br = null;
              String in = null;
              try {
                   fr = new FileReader(args[0]);
                   br = new BufferedReader(fr);
                   in = br.readLine();
              } catch (Exception e) {
                   System.out.println(e.getMessage());
                   System.exit(0);
              byte bytes[] = in.getBytes();
              try {
                   byte bytesISO[] = in.getBytes("ISO-8859-1");
                   String out1 = new String(bytes, "ISO-8859-1");
                   String out2 = new String(bytesISO, "ISO-8859-1");
                   String out3 = new String(bytes);
                   String out4 = new String(bytesISO);
                   System.out.println(out1);
                   System.out.println(out2);
                   System.out.println(out3);
                   System.out.println(out4);
              } catch (UnsupportedEncodingException e) {
                   e.printStackTrace();
    }As an argument I pass the file. I use this instead Inside the file I have
    Poole&#x27 ;s   (without the space)Without attaching a file it's hard to use the html

  • Convert utf-8 to iso-8859-1

    Hello,
    sorry for my very bad english
    i use httpxmlrequest to answer a database and show resultin a
    div
    the string means utf-8 encoded by my javascript fonction and,
    of course,
    no result are found in the database.
    How can i convert the string to iso-8859-1 before request the
    database ?
    Thank if you have an idea
    JiBé (France)

    PaulH **AdobeCommunityExpert** a écrit :
    > Jibé wrote:
    >> PaulH **AdobeCommunityExpert** a écrit :
    >> I work with a MS SQL server database encoding in
    iso-8859-1
    >
    > data stored in plain text,char,varchar datatypes (ie not
    "N")?
    datatype of "titre" is varchar(250) and "contenu" is text
    using the
    > ODBC or JDBC (it would be listed as ms sql server in the
    db drivers
    > list) driver?
    I think it's jdbc driver (case of my test computer)
    >
    >> The code :
    >
    > you're not following good i18n practices. while my
    preference is for
    > unicode ("just use unicode" has been my motto for
    years), if you're
    > really only ever going to use french & never need
    the euro symbol then i
    > guess iso-8859-1 (latin-1) is fine.
    Here is a part of the content of my application.cfm
    <cfprocessingdirective pageencoding="iso-8859-1">
    <cfcontent type="text/html; charset=iso-8859-1">
    <cfset setEncoding("URL", "iso-8859-1")>
    <cfset setEncoding("Form", "iso-8859-1")>
    >
    > if you think you might need other languages, including
    the euro symbol,
    > then you should consider unicode. change your text
    columsn to "N" type
    > (nText, nChar, nVarChar) & swap the latin-1
    encodings in the tags above
    > to utf-8.
    I'm going to test that....
    JiBé

  • Convert UTF-8 to ISO-8859-1 in JMS receiver

    Hi Freinds,
    We are sending an XML message to the MQ via JMS receiver channel and I need to change the character set from UTF-8 to the ISO-8859-1 while sending it to the MQ queues. Will this be possile?
    Please suggest how can I acheive this.
    Regards,
    Kumar.

    Hi Kumar,
    Try changing the encoding using XSLT mapping and you can call this mapping as shown in this blog:
    /people/michal.krawczyk2/blog/2005/11/01/xi-xml-node-into-a-string-with-graphical-mapping
    or try this wiki page:
    http://wiki.sdn.sap.com/wiki/display/XI/SOAPMessagesin+XI
    Regards
    Suraj

  • Character set conversion UTF-8 -- ISO-8859-1 generates question mark (?)

    I'm trying to convert an XML-file in UTF-8 format to another file with character set ISO-8859-1.
    My problem is that the ISO-8859-1 file generates a question mark (?) and puts it as a prefix in the file.
    ?<?xml version="1.0" encoding="UTF-8"?>
    <ns0:messagetype xmlns:ns0="urn:olof">
    <underkat>testv���rde</underkat>
    </ns0:messagetype>
    Is there a way to do the conversion without getting the question mark?
    My code looks as follows:
    public class ConvertEncoding {
         public static void main(String[] args) {
              String from = "UTF-8", to = "ISO-8859-1";
              String infile = "C:\\temp\\infile.xml", outfile = "C:\\temp\\outfile.xml";
              try {
                   convert(infile, outfile, from, to);
              } catch (Exception e) {
                   System.out.println(e.getMessage());
                   System.exit(1);
         private static void convert(String infile, String outfile,
                                            String from, String to)
                             throws IOException, UnsupportedEncodingException
              //Set up byte streams
              InputStream in = null;
              OutputStream out = null;
              if(infile != null) {
                   in = new FileInputStream(infile);
              if(outfile != null) {
                   out = new FileOutputStream(outfile);
              //Set up character streams
              Reader r = new BufferedReader(new InputStreamReader(in, from));
              Writer w = new BufferedWriter(new OutputStreamWriter(out, to));
              /*Copy characters from input to output.
               * The InputSreamreader converts
               * from Unicode to the output encoding.
               * Characters that cannot be represented in
               * the output encoding are output as '?'
              char[] buffer = new char[4096];
              int len;
              while((len = r.read(buffer))!= -1) { //Read a block of output
                   w.write(buffer, 0, len);
              r.close();
              w.flush();
              w.close();
    }

    Yes the next character is the '<'
    The file that I read from is generated by an integration platform. I send a plain file to it (supposedly in UTF-8 encoding) and it returns another file (in between I call my java class that converts the characterset from UTF-8 to ISO-8859-1). The file that I get back contains the '���' if the conversion doesn't work and '?' if the conversion worked.
    My solution so far is to skip the first "junk-characters" when reading from the inputstream. Something like:
    private static final char UTF_BOM = '\uFEFF'; //UTF-BOM = ?
    String from = "UTF-8", to = "ISO-8859-1";
    if (from != null && from.toLowerCase().startsWith("utf-")) { //Are we reading an UTF encoded file?
    /*Read first character of the UTF-Encoded file
    It will return '?' in the first position if we are dealing with UTF encoding If ? is returned we skip this character in the read
    try {
    r.mark(1); //Only allow to read one char for the reset function to work
    char c;
    int i = r.read();
    c = (char) i;
    if (String.valueOf(UTF_BOM).equalsIgnoreCase(String.valueOf(c))) {
    r.reset(); //reset to start position
    r.skip(1); //Skip first character when reading from the stream
    else {
    r.reset();
    } catch (IOException e) {
    e.getMessage();
    //return null;
    }

  • Conversion  ISO-8859-7- UTF-8  and UTF-8 - ISO-8859-7

    Hi, I written this function to do a Charset conversion
    from ISO-8859-7 to UTF-8 and vice versa
    void ChangeChersetEncoding(String EncodingType)
    String GrammarText;
    try
    GrammarText = Editor.getText();
    b = GrammarText.getBytes(LastEncoding);
    String strTemp = new String(b,EncodingType);
    Editor.setText(strTemp);
    LastEncoding = EncodingType;
    catch (UnsupportedEncodingException e)
    JOptionPane.showMessageDialog(this, "Error: " + e.getMessage
    () , "Error", JOptionPane.ERROR_MESSAGE);
    The steps followed are:
    1)I initialize Editor (that is a JEditorPane) with a InputStreamReader, that use by default "CP1252"(window - latin1) charset encoding.
    2)When I call the function the first time with EncodingType = "ISO-8859-7" and LastEncoding = "CP1252"(window - latin1), Editor shows greek character as I aspected.
    3)When I call the function the second time with EncodingType = "UTF-8" and LastEncoding = "ISO-8859-7", Editor shows unknown character ('&#65533;') as I aspected.
    4)The problem is when I call the function the third time with EncodingType = "ISO-8859-7" and LastEncoding = "UTF-8" Editor don't show the original greek text, as I didn't aspect.
    Thank you for all.

    b = GrammarText.getBytes(LastEncoding);
    String strTemp = new String(b,EncodingType);Here you take a String (which is in Unicode) and convert it to bytes, using "LastEncoding". Next you take those bytes and convert them back to a String, assuming that they were encoded using "EncodingType". But they weren't, so at best this will do nothing and at worst it will produce garbage. It certainly won't do anything useful.
    As I said all Java strings are in Unicode. If you want to convert something from one encoding to another encoding, you can only convert an array of bytes to a String using the first encoding, then convert that back to bytes using the second encoding. Converting a String to a String just makes no sense.

  • Codepage Conversionerror UTF-8 from System-Codepage to Codepage iso-8859-1

    Hello,
    we have on SAP PI 7.1 the problem that we can't process a IDOC to Plain HTTP.
    The channel throws "Codepage Conversionerror UTF-8 from System-Codepage to Codepage iso-8859-1".
    The IDOC is 25 MB. Does anybody have a idea how we can find out what is wrong with the IDOC?
    Thanks in advance.

    In java strings are always unicode i.e. utf16. Its the byte arrays that are encoded. So use the following codeString iso,utf,temp = "����� � �����";
    byte b8859[] = temp.getBytes("ISO-8859-1");
    byte butf8= temp.getBytes("utf8");
    try{
      iso = new String(b8859,"ISO-8859-1");
      utf = new String(butf8,"UTF-8");
      System.out.println("ISO-8859-1:"+iso);
      System.out.println("UTF-8:"+utf);
      System.out.println("UTF to ISO-8859-1:"+new String(utf.getBytes("iso8859_1"),"ISO-8859-1"));
    System.out.println(utf);
    System.out.println(iso);
    }catch(Exception e){ }Also keep in mind that DOS window doesnot support international characters so write it to a file

  • DB2 with iso-8859-15 text encoding

    Hi,
    I develop an application on weblogic server configure with a db2 data source. The application is using hibernate 3 (jpa). I cannot write correcly some characters in the database. Text encoding is not converted from utf-16 (java string) to iso-8859-15. Weblogic is installed with the last db2 jdbc driver (9.7).
    Is there a data source property that inform the driver to convert utf-16 to the db2 encoding ?

    Is the value (text) of header outputText a static text or it is binded to some bindings attribute or backing bean attribute?
    If it is a static text then probably the JSF page encoding (not only the JSP tag specifying the encoding, but the whole file) may be broken. Once I had similar issues, and the only thing that helped was to Ctrl+A, Ctrl+X, Ctrl+V (cut all content of file and paste it back - it restores messed-up utf-8 encoding).
    If it is not a static text, then check the "source" of the text: it should be the java class file also encoded in utf-8 (also, check Compiler options for your projects - plural). Also, chcek the default IDE encoding... Finally, the database also should have utf-8 encoding (if you are retrieving data from database to show in these header outputTexts).
    As I stated before, I had a lot of issues and finally I learned that EVERYTHING should be set correctly to utf-8 in order not to have these strange effects...
    Have in mind that Java is very complex in terms of string/text encodings, much more complex than e.g. MS.Net. Internally, Java uses utf-16 encoded strings while most of web applications use utf-8 (for efficiency reasons). So, the gap between Java and web is present, and legacy Java encodings support is quite impractical in modern applications where no one should ever have reason to use anything else than utf-8!

  • XMLReader throws "Invalid UTF8 encoding." - Need parser for ISO-8859-1 chrs

    Hi,
    We are facing an issue when we try to send data which is encoded in "ISO-8859-1" charset (german chars) via the EMDClient (agent), which tries to parse it using the oracle.xml.parser.v2.XMLParser . The parser, while trying to read it, is unable to determine the charset encoding of our data and assumes that the encoding is "UTF-8", and when it tries to read it, throws the :
    "java.io.UTFDataFormatException: Invalid UTF8 encoding." exception.
    I looked at the XMLReader's code and found that it tries to read the first 4 bytes (Byte Order Mark - BOM) to determine the encoding. It is probably expecting us to send the data where the first line is probably:
    <?xml version="1.0" encoding="iso88591" ?>
    But, the data that our application sends is typically as below:
    ========================================================
    # listener.ora Network Configuration File: /ade/vivsharm_emsa2/oracle/work/listener.ora
    # Generated by Oracle configuration tools.
    SID_LIST_LISTENER =
    (SID_LIST =
    (SID_DESC =
    (SID_NAME = semsa2)
    (ORACLE_HOME = /ade/vivsharm_emsa2/oracle)
    LISTENER =
    (DESCRIPTION_LIST =
    (DESCRIPTION =
    (ADDRESS_LIST =
    (ADDRESS = (PROTOCOL = tcp)(HOST = stadm18.us.oracle.com)(PORT = 15100))
    ========================================================
    the first 4 bytes in our case will be, int[] {35, 32, 108, 105} == chars {#, SPACE, l, i},
    which does not match any of the encodings predefined in oracle.xml.parser.v2.XMLReader.pushXMLReader() method.
    How do we ensure that the parser identifies the encoding properly and instantiates the correct parser for "ISO-8859-1"...
    Should we just add the line <?xml version="1.0" encoding="iso88591" ?> at the beginning of our data?
    We have tried constructing the inputstream (ByteArrayInputStream) by using String.getBytes("ISO-8859-1") and passing that to the parser, but that does not seem to work.
    Please suggest.
    Thanks & Regards,
    Vivek.
    PS: The exception we get is as below:
    java.io.UTFDataFormatException: Invalid UTF8 encoding.
    at oracle.xml.parser.v2.XMLUTF8Reader.checkUTF8Byte(XMLUTF8Reader.java:160)
    at oracle.xml.parser.v2.XMLUTF8Reader.readUTF8Char(XMLUTF8Reader.java:187)
    at oracle.xml.parser.v2.XMLUTF8Reader.fillBuffer(XMLUTF8Reader.java:120)
    at oracle.xml.parser.v2.XMLByteReader.saveBuffer(XMLByteReader.java:450)
    at oracle.xml.parser.v2.XMLReader.fillBuffer(XMLReader.java:2229)
    at oracle.xml.parser.v2.XMLReader.tryRead(XMLReader.java:994)
    at oracle.xml.parser.v2.XMLReader.scanXMLDecl(XMLReader.java:2788)
    at oracle.xml.parser.v2.XMLReader.pushXMLReader(XMLReader.java:502)
    at oracle.xml.parser.v2.XMLReader.pushXMLReader(XMLReader.java:205)
    at oracle.xml.parser.v2.XMLParser.parse(XMLParser.java:180)
    at org.xml.sax.helpers.ParserAdapter.parse(ParserAdapter.java:431)
    at oracle.sysman.emSDK.emd.comm.RemoteOperationInputStream.readXML(RemoteOperationInputStream.java:363)
    at oracle.sysman.emSDK.emd.comm.RemoteOperationInputStream.readHeader(RemoteOperationInputStream.java:195)
    at oracle.sysman.emSDK.emd.comm.RemoteOperationInputStream.read(RemoteOperationInputStream.java:151)
    at oracle.sysman.emSDK.emd.comm.EMDClient.remotePut(EMDClient.java:2075)
    at oracle.sysman.emo.net.util.agent.Operation.saveFile(Operation.java:758)
    at oracle.sysman.emo.net.common.WebIOHandler.saveFile(WebIOHandler.java:152)
    at oracle.sysman.emo.net.common.BaseWebConfigContext.saveConfig(BaseWebConfigContext.java:505)

    Vivek
    Your message is not XML. I believe that the XMLParser is going to have problems with that as well. Perhaps you could wrap the message in an XML tag set and begin the document as you suggested with <?xml version="1.0" encoding="iso88591"?>.
    You are correct in that the parser uses only the first 4 bytes to detect the encoding of the document. It can only determine if the document in ASCII or EPCDIC based. If it is ASCII it can detect only between UTF-8 and UTF-16. It will need the encoding attribute to recognize the ISO-8859-1 encoding.
    hope this helps
    tom

Maybe you are looking for

  • Plz help...boot agent error 1962..

    hi, i will tell the whole story...i was doing a system recovery on ym computer and i tihnk beofre the recovery it said there as media in the tray and told me to take it out. so i took it out and it asked me to reboot my computer so i did. once it got

  • Segmentation in CRM Marketing - Using customer hierarchy in BI query

    Hi all, We are using segmentation in CRM 7.0. As part of this we are creating DataSources based upon BI queries. This works fine when we create queries with 0CUSTOMER in the rows. However when we assign a hierarchy to 0customer in the query designer,

  • Parameters not set error, when launching jdev 9i from Unix

    Hi, I have downloaded the patch to install jdeveloper ( p4141787_11i_GENERIC.zip ) and ftp'ed to Unix directory and unzipped. I have modified the jdev.conf to the location where SDK is installed. we are using SDK version 1.4.2.06 I have set up the va

  • PR mandantory in PO

    Dear Experts, In my client side 10 po docuementy type is avaiable for 6 document type we need PR mandatory in PO and for 4 document we dont need. As per already QADB in SDN I try with functional authorization with su01 efb it is working pr mandatory

  • How to disable Data URI (data:image/png;base64 URI)

    Hi all, I'm not sure when it started, but when I want to view an image's URI in my Safari 5.0.4, it gives me the "data:image/png;base64" URI, instead of the usual "http://example.com/image.jpg". Is there a way to disable this? I need to check URI's o