Russian character encoding

While opening .txt files in Mac OS (the text in russian and usually I don't know whether it was encoded with ANSI or Windows-1251 etc) the text displays incorrectly.
Same happens with some programs (for instance, Guitar Pro 5 - the name of the composition (inside the program) is in russian - so in Mac OS I can see only unreadable symbols).
So the question is: can I add the support of Windows standart encodings for that situations not to happen?
Thank you in advance.

twofacedv wrote:
So the question is: can I add the support of Windows standart encodings for that situations not to happen?
It is already there. MacOS X fully supports Russian and most other languages too. You just happened to pick two worst case scenarios.
For Guitar Pro 5, there is nothing you can do but contact them and report the bug. That is a 3rd party product and obviously has trouble with Russian. It seems to be a cross-platform program and that is likely the reason.
Text files are always difficult. There is nothing about a text file that indicates what encoding it is. You would have to open it in something like TextWrangler that might be able to detect the encoding. If all else fails, you could try various, likely encodings until it was readable. Once you get there, save the file as UTF-8 and you won't have any more problems with it.

Similar Messages

  • Russian character encoding issue

    Hi All,
    I wrote a java program (using jdk 1.5 ) to execute wevtutil.exe output in windows 2008 russian os, i use the command "wevtutil qe Security /c:1 /rd:true /f:text"
    when i run the above command in the command prompt i got the proper output but when i execute the same command using the java program my output is garbled, i tried to run the same output using the encoding Cp866 but still no use. which encoding format i have to use for wevtutil output in russian language? Herewith i attach my java code and its output and the original command output.
    Please help me to resolve this.
    Thanks and Regards,
    Rama
    Java code:
    import java.util.*;
    import java.io.*;
    public class evtquery
         public static void main(String args[])
              String cmd ="wevtutil qe Security /c:1 /rd:true /f:text";
              Runtime run1 = Runtime.getRuntime();
              try
                   Process p = run1.exec(cmd);
                   BufferedReader rd[] = new BufferedReader[2];
                   rd[1] = new BufferedReader(new InputStreamReader(p.getErrorStream()));
                   rd[0] = new BufferedReader(new InputStreamReader(p.getInputStream()));
                   String line = null;
                   while((line =rd[0].readLine()) != null)
                        System.out.println(line);
              catch (Exception ex)
                   ex.printStackTrace();
    output:
    C:\rama>java evtquery
    Event[0]:
    Log Name: Security
    Source: Microsoft-Windows-Security-Auditing
    Date: 2012-01-08T22:57:38.703
    Event ID: 4648
    Task: ┬їюф т ёшёЄхьє
    Level: ╤тхфхэш 
    Opcode: ╤тхфхэш 
    Keyword: └єфшЄ єёяхїр
    User: N/A
    User Name: N/A
    Computer: WIN-KBTGH8I5IBE
    Description:
    ┬√яюыэхэр яюя√Єър тїюфр т ёшёЄхьє ё  тэ√ь єърчрэшхь єўхЄэ√ї фрээ√ї.
    ╤єс·хъЄ:
    ╚─ схчюярёэюёЄш: S-1-5-21-3933732538-1582820160-195548471
    1-500
    ╚ь  єўхЄэющ чряшёш: Administrator
    ─юьхэ єўхЄэющ чряшёш: WIN-KBTGH8I5IBE
    ╩юф тїюфр: 0xbc15a
    GUID тїюфр: {00000000-0000-0000-0000-000000000000}
    ┴√ыш шёяюы№чютрэ√ єўхЄэ√х фрээ√х ёыхфє■∙хщ єўхЄэющ чряшёш:
    ╚ь  єўхЄэющ чряшёш: eguser
    ─юьхэ єўхЄэющ чряшёш: WIN-KBTGH8I5IBE
    GUID тїюфр: {00000000-0000-0000-0000-000000000000}
    ╓хыхтющ ёхЁтхЁ:
    ╚ь  Ўхыхтюую ёхЁтхЁр: RAMA-PC
    ─юяюыэшЄхы№э√х ётхфхэш : RAMA-PC
    ╤тхфхэш  ю яЁюЎхёёх:
    ╚фхэЄшЇшърЄюЁ яЁюЎхёёр: 0x4
    ╚ь  яЁюЎхёёр:
    ╤тхфхэш  ю ёхЄш:
    ╤хЄхтющ рфЁхё: -
    ╧юЁЄ: -
    ─рээюх ёюс√Єшх тючэшърхЄ, ъюуфр яЁюЎхёё я√ЄрхЄё  т√яюыэшЄ№ тїюф ё єўхЄэющ чряшё№
    ■,  тэю єърчрт хх єўхЄэ√х фрээ√х. ▌Єю юс√ўэю яЁюшёїюфшЄ яЁш шёяюы№чютрэшш ъюэЇш
    уєЁрЎшщ яръхЄэюую Єшяр, эряЁшьхЁ, эрчэрўхээ√ї чрфрў, шыш т√яюыэхэшш ъюьрэф√ RUNA
    S.
    C:\rama> java -Dfile.encoding=Cp866 evtquery
    Event[0]:
    Log Name: Security
    Source: Microsoft-Windows-Security-Auditing
    Date: 2012-01-08T22:57:38.703
    Event ID: 4648
    Task: ┬їюф т ёшёЄхьє
    Level: ╤тхфхэш 
    Opcode: ╤тхфхэш 
    Keyword: └єфшЄ єёяхїр
    User: N/A
    User Name: N/A
    Computer: WIN-KBTGH8I5IBE
    Description:
    ┬√яюыэхэр яюя√Єър тїюфр т ёшёЄхьє ё  тэ√ь єърчрэшхь єўхЄэ√ї фрээ√ї.
    ╤єс·хъЄ:
    ╚─ схчюярёэюёЄш: S-1-5-21-3933732538-1582820160-195548471
    1-500
    ╚ь  єўхЄэющ чряшёш: Administrator
    ─юьхэ єўхЄэющ чряшёш: WIN-KBTGH8I5IBE
    ╩юф тїюфр: 0xbc15a
    GUID тїюфр: {00000000-0000-0000-0000-000000000000}
    ┴√ыш шёяюы№чютрэ√ єўхЄэ√х фрээ√х ёыхфє■∙хщ єўхЄэющ чряшёш:
    ╚ь  єўхЄэющ чряшёш: eguser
    ─юьхэ єўхЄэющ чряшёш: WIN-KBTGH8I5IBE
    GUID тїюфр: {00000000-0000-0000-0000-000000000000}
    ╓хыхтющ ёхЁтхЁ:
    ╚ь  Ўхыхтюую ёхЁтхЁр: RAMA-PC
    ─юяюыэшЄхы№э√х ётхфхэш : RAMA-PC
    ╤тхфхэш  ю яЁюЎхёёх:
    ╚фхэЄшЇшърЄюЁ яЁюЎхёёр: 0x4
    ╚ь  яЁюЎхёёр:
    ╤тхфхэш  ю ёхЄш:
    ╤хЄхтющ рфЁхё: -
    ╧юЁЄ: -
    ─рээюх ёюс√Єшх тючэшърхЄ, ъюуфр яЁюЎхёё я√ЄрхЄё  т√яюыэшЄ№ тїюф ё єўхЄэющ чряшё№
    ■,  тэю єърчрт хх єўхЄэ√х фрээ√х. ▌Єю юс√ўэю яЁюшёїюфшЄ яЁш шёяюы№чютрэшш ъюэЇш
    уєЁрЎшщ яръхЄэюую Єшяр, эряЁшьхЁ, эрчэрўхээ√ї чрфрў, шыш т√яюыэхэшш ъюьрэф√ RUNA
    S.
    command output:
    C:\rama>wevtutil qe Security /c:1 /rd:true /f:text
    Event[0]:
    Log Name: Security
    Source: Microsoft-Windows-Security-Auditing
    Date: 2012-01-08T22:57:38.703
    Event ID: 4648
    Task: Вход в систему
    Level: Сведения
    Opcode: Сведения
    Keyword: Аудит успеха
    User: N/A
    User Name: N/A
    Computer: WIN-KBTGH8I5IBE
    Description:
    Выполнена попытка входа в систему с явным указанием учетных данных.
    Субъект:
    ИД безопасности: S-1-5-21-3933732538-1582820160-195548471
    1-500
    Имя учетной записи: Administrator
    Домен учетной записи: WIN-KBTGH8I5IBE
    Код входа: 0xbc15a
    GUID входа: {00000000-0000-0000-0000-000000000000}
    Были использованы учетные данные следующей учетной записи:
    Имя учетной записи: eguser
    Домен учетной записи: WIN-KBTGH8I5IBE
    GUID входа: {00000000-0000-0000-0000-000000000000}
    Целевой сервер:
    Имя целевого сервера: RAMA-PC
    Дополнительные сведения: RAMA-PC
    Сведения о процессе:
    Идентификатор процесса: 0x4
    Имя процесса:
    Сведения о сети:
    Сетевой адрес: -
    Порт: -
    Данное событие возникает, когда процесс пытается выполнить вход с учетной запись
    ю, явно указав ее учетные данные. Это обычно происходит при использовании конфи
    гураций пакетного типа, например, назначенных задач, или выполнении команды RUNA
    S.
    C:\rama>

    Hi,
    Thanks for your reply. I tried the code with the changes you have specified but it doesn't work.
    The output is
    C:\rama>java evtquery
    Event[0]:
    Log Name: Security
    Source: Microsoft-Windows-Security-Auditing
    Date: 2012-01-09T04:05:16.718
    Event ID: 4672
    Task: ??????????? ????
    Level: ???????а
    Opcode: ???????а
    Keyword: ????? ??????
    User: N/A
    User Name: N/A
    Computer: WIN-KBTGH8I5IBE
    Description:
    ???╖???:
    ?? ????????????: S-1-5-18
    ??а ??????? ??????: ???????
    ????? ??????? ??????: NT AUTHORITY
    ??? ?????: 0x3e7
    ??????????: SeAssignPrimaryTokenPrivilege
    SeTcbPrivilege
    SeSecurityPrivilege
    SeTakeOwnershipPrivilege
    SeLoadDriverPrivilege
    SeBackupPrivilege
    SeRestorePrivilege
    SeDebugPrivilege
    SeAuditPrivilege
    SeSystemEnvironmentPrivilege
    SeImpersonatePrivilege
    Thanks & Regards,
    Rama

  • What every developer should know about character encoding

    This was originally posted (with better formatting) at Moderator edit: link removed/what-every-developer-should-know-about-character-encoding.html. I'm posting because lots of people trip over this.
    If you write code that touches a text file, you probably need this.
    Lets start off with two key items
    1.Unicode does not solve this issue for us (yet).
    2.Every text file is encoded. There is no such thing as an unencoded file or a "general" encoding.
    And lets add a codacil to this – most Americans can get by without having to take this in to account – most of the time. Because the characters for the first 127 bytes in the vast majority of encoding schemes map to the same set of characters (more accurately called glyphs). And because we only use A-Z without any other characters, accents, etc. – we're good to go. But the second you use those same assumptions in an HTML or XML file that has characters outside the first 127 – then the trouble starts.
    The computer industry started with diskspace and memory at a premium. Anyone who suggested using 2 bytes for each character instead of one would have been laughed at. In fact we're lucky that the byte worked best as 8 bits or we might have had fewer than 256 bits for each character. There of course were numerous charactersets (or codepages) developed early on. But we ended up with most everyone using a standard set of codepages where the first 127 bytes were identical on all and the second were unique to each set. There were sets for America/Western Europe, Central Europe, Russia, etc.
    And then for Asia, because 256 characters were not enough, some of the range 128 – 255 had what was called DBCS (double byte character sets). For each value of a first byte (in these higher ranges), the second byte then identified one of 256 characters. This gave a total of 128 * 256 additional characters. It was a hack, but it kept memory use to a minimum. Chinese, Japanese, and Korean each have their own DBCS codepage.
    And for awhile this worked well. Operating systems, applications, etc. mostly were set to use a specified code page. But then the internet came along. A website in America using an XML file from Greece to display data to a user browsing in Russia, where each is entering data based on their country – that broke the paradigm.
    Fast forward to today. The two file formats where we can explain this the best, and where everyone trips over it, is HTML and XML. Every HTML and XML file can optionally have the character encoding set in it's header metadata. If it's not set, then most programs assume it is UTF-8, but that is not a standard and not universally followed. If the encoding is not specified and the program reading the file guess wrong – the file will be misread.
    Point 1 – Never treat specifying the encoding as optional when writing a file. Always write it to the file. Always. Even if you are willing to swear that the file will never have characters out of the range 1 – 127.
    Now lets' look at UTF-8 because as the standard and the way it works, it gets people into a lot of trouble. UTF-8 was popular for two reasons. First it matched the standard codepages for the first 127 characters and so most existing HTML and XML would match it. Second, it was designed to use as few bytes as possible which mattered a lot back when it was designed and many people were still using dial-up modems.
    UTF-8 borrowed from the DBCS designs from the Asian codepages. The first 128 bytes are all single byte representations of characters. Then for the next most common set, it uses a block in the second 128 bytes to be a double byte sequence giving us more characters. But wait, there's more. For the less common there's a first byte which leads to a sersies of second bytes. Those then each lead to a third byte and those three bytes define the character. This goes up to 6 byte sequences. Using the MBCS (multi-byte character set) you can write the equivilent of every unicode character. And assuming what you are writing is not a list of seldom used Chinese characters, do it in fewer bytes.
    But here is what everyone trips over – they have an HTML or XML file, it works fine, and they open it up in a text editor. They then add a character that in their text editor, using the codepage for their region, insert a character like ß and save the file. Of course it must be correct – their text editor shows it correctly. But feed it to any program that reads according to the encoding and that is now the first character fo a 2 byte sequence. You either get a different character or if the second byte is not a legal value for that first byte – an error.
    Point 2 – Always create HTML and XML in a program that writes it out correctly using the encode. If you must create with a text editor, then view the final file in a browser.
    Now, what about when the code you are writing will read or write a file? We are not talking binary/data files where you write it out in your own format, but files that are considered text files. Java, .NET, etc all have character encoders. The purpose of these encoders is to translate between a sequence of bytes (the file) and the characters they represent. Lets take what is actually a very difficlut example – your source code, be it C#, Java, etc. These are still by and large "plain old text files" with no encoding hints. So how do programs handle them? Many assume they use the local code page. Many others assume that all characters will be in the range 0 – 127 and will choke on anything else.
    Here's a key point about these text files – every program is still using an encoding. It may not be setting it in code, but by definition an encoding is being used.
    Point 3 – Always set the encoding when you read and write text files. Not just for HTML & XML, but even for files like source code. It's fine if you set it to use the default codepage, but set the encoding.
    Point 4 – Use the most complete encoder possible. You can write your own XML as a text file encoded for UTF-8. But if you write it using an XML encoder, then it will include the encoding in the meta data and you can't get it wrong. (it also adds the endian preamble to the file.)
    Ok, you're reading & writing files correctly but what about inside your code. What there? This is where it's easy – unicode. That's what those encoders created in the Java & .NET runtime are designed to do. You read in and get unicode. You write unicode and get an encoded file. That's why the char type is 16 bits and is a unique core type that is for characters. This you probably have right because languages today don't give you much choice in the matter.
    Point 5 – (For developers on languages that have been around awhile) – Always use unicode internally. In C++ this is called wide chars (or something similar). Don't get clever to save a couple of bytes, memory is cheap and you have more important things to do.
    Wrapping it up
    I think there are two key items to keep in mind here. First, make sure you are taking the encoding in to account on text files. Second, this is actually all very easy and straightforward. People rarely screw up how to use an encoding, it's when they ignore the issue that they get in to trouble.
    Edited by: Darryl Burke -- link removed

    DavidThi808 wrote:
    This was originally posted (with better formatting) at Moderator edit: link removed/what-every-developer-should-know-about-character-encoding.html. I'm posting because lots of people trip over this.
    If you write code that touches a text file, you probably need this.
    Lets start off with two key items
    1.Unicode does not solve this issue for us (yet).
    2.Every text file is encoded. There is no such thing as an unencoded file or a "general" encoding.
    And lets add a codacil to this – most Americans can get by without having to take this in to account – most of the time. Because the characters for the first 127 bytes in the vast majority of encoding schemes map to the same set of characters (more accurately called glyphs). And because we only use A-Z without any other characters, accents, etc. – we're good to go. But the second you use those same assumptions in an HTML or XML file that has characters outside the first 127 – then the trouble starts. Pretty sure most Americans do not use character sets that only have a range of 0-127. I don't think I have every used a desktop OS that did. I might have used some big iron boxes before that but at that time I wasn't even aware that character sets existed.
    They might only use that range but that is a different issue, especially since that range is exactly the same as the UTF8 character set anyways.
    >
    The computer industry started with diskspace and memory at a premium. Anyone who suggested using 2 bytes for each character instead of one would have been laughed at. In fact we're lucky that the byte worked best as 8 bits or we might have had fewer than 256 bits for each character. There of course were numerous charactersets (or codepages) developed early on. But we ended up with most everyone using a standard set of codepages where the first 127 bytes were identical on all and the second were unique to each set. There were sets for America/Western Europe, Central Europe, Russia, etc.
    And then for Asia, because 256 characters were not enough, some of the range 128 – 255 had what was called DBCS (double byte character sets). For each value of a first byte (in these higher ranges), the second byte then identified one of 256 characters. This gave a total of 128 * 256 additional characters. It was a hack, but it kept memory use to a minimum. Chinese, Japanese, and Korean each have their own DBCS codepage.
    And for awhile this worked well. Operating systems, applications, etc. mostly were set to use a specified code page. But then the internet came along. A website in America using an XML file from Greece to display data to a user browsing in Russia, where each is entering data based on their country – that broke the paradigm.
    The above is only true for small volume sets. If I am targeting a processing rate of 2000 txns/sec with a requirement to hold data active for seven years then a column with a size of 8 bytes is significantly different than one with 16 bytes.
    Fast forward to today. The two file formats where we can explain this the best, and where everyone trips over it, is HTML and XML. Every HTML and XML file can optionally have the character encoding set in it's header metadata. If it's not set, then most programs assume it is UTF-8, but that is not a standard and not universally followed. If the encoding is not specified and the program reading the file guess wrong – the file will be misread.
    The above is out of place. It would be best to address this as part of Point 1.
    Point 1 – Never treat specifying the encoding as optional when writing a file. Always write it to the file. Always. Even if you are willing to swear that the file will never have characters out of the range 1 – 127.
    Now lets' look at UTF-8 because as the standard and the way it works, it gets people into a lot of trouble. UTF-8 was popular for two reasons. First it matched the standard codepages for the first 127 characters and so most existing HTML and XML would match it. Second, it was designed to use as few bytes as possible which mattered a lot back when it was designed and many people were still using dial-up modems.
    UTF-8 borrowed from the DBCS designs from the Asian codepages. The first 128 bytes are all single byte representations of characters. Then for the next most common set, it uses a block in the second 128 bytes to be a double byte sequence giving us more characters. But wait, there's more. For the less common there's a first byte which leads to a sersies of second bytes. Those then each lead to a third byte and those three bytes define the character. This goes up to 6 byte sequences. Using the MBCS (multi-byte character set) you can write the equivilent of every unicode character. And assuming what you are writing is not a list of seldom used Chinese characters, do it in fewer bytes.
    The first part of that paragraph is odd. The first 128 characters of unicode, all unicode, is based on ASCII. The representational format of UTF8 is required to implement unicode, thus it must represent those characters. It uses the idiom supported by variable width encodings to do that.
    But here is what everyone trips over – they have an HTML or XML file, it works fine, and they open it up in a text editor. They then add a character that in their text editor, using the codepage for their region, insert a character like ß and save the file. Of course it must be correct – their text editor shows it correctly. But feed it to any program that reads according to the encoding and that is now the first character fo a 2 byte sequence. You either get a different character or if the second byte is not a legal value for that first byte – an error.
    Not sure what you are saying here. If a file is supposed to be in one encoding and you insert invalid characters into it then it invalid. End of story. It has nothing to do with html/xml.
    Point 2 – Always create HTML and XML in a program that writes it out correctly using the encode. If you must create with a text editor, then view the final file in a browser.
    The browser still needs to support the encoding.
    Now, what about when the code you are writing will read or write a file? We are not talking binary/data files where you write it out in your own format, but files that are considered text files. Java, .NET, etc all have character encoders. The purpose of these encoders is to translate between a sequence of bytes (the file) and the characters they represent. Lets take what is actually a very difficlut example – your source code, be it C#, Java, etc. These are still by and large "plain old text files" with no encoding hints. So how do programs handle them? Many assume they use the local code page. Many others assume that all characters will be in the range 0 – 127 and will choke on anything else.
    I know java files have a default encoding - the specification defines it. And I am certain C# does as well.
    Point 3 – Always set the encoding when you read and write text files. Not just for HTML & XML, but even for files like source code. It's fine if you set it to use the default codepage, but set the encoding.
    It is important to define it. Whether you set it is another matter.
    Point 4 – Use the most complete encoder possible. You can write your own XML as a text file encoded for UTF-8. But if you write it using an XML encoder, then it will include the encoding in the meta data and you can't get it wrong. (it also adds the endian preamble to the file.)
    Ok, you're reading & writing files correctly but what about inside your code. What there? This is where it's easy – unicode. That's what those encoders created in the Java & .NET runtime are designed to do. You read in and get unicode. You write unicode and get an encoded file. That's why the char type is 16 bits and is a unique core type that is for characters. This you probably have right because languages today don't give you much choice in the matter.
    Unicode character escapes are replaced prior to actual code compilation. Thus it is possible to create strings in java with escaped unicode characters which will fail to compile.
    Point 5 – (For developers on languages that have been around awhile) – Always use unicode internally. In C++ this is called wide chars (or something similar). Don't get clever to save a couple of bytes, memory is cheap and you have more important things to do.
    No. A developer should understand the problem domain represented by the requirements and the business and create solutions that appropriate to that. Thus there is absolutely no point for someone that is creating an inventory system for a stand alone store to craft a solution that supports multiple languages.
    And another example is with high volume systems moving/storing bytes is relevant. As such one must carefully consider each text element as to whether it is customer consumable or internally consumable. Saving bytes in such cases will impact the total load of the system. In such systems incremental savings impact operating costs and marketing advantage with speed.

  • XML Character Encoding Using UTL_DBWS

    Hi,
    I have a database with WINDOWS-1252 character encoding. I'm using UTL_DBWS to call a web service method which echoes a given string. For this purpose, I do the following:
    DECLARE
        v_wsdl CONSTANT VARCHAR2(500) := 'http://myhost/myservice?wsdl';
        v_namespace CONSTANT VARCHAR2(500) := 'my.namespace';
        v_service_name CONSTANT UTL_DBWS.QNAME := UTL_DBWS.to_qname(v_namespace, 'MyService');
        v_service_port CONSTANT UTL_DBWS.QNAME := UTL_DBWS.to_qname(v_namespace, 'MySoapServicePort');
        v_ping CONSTANT UTL_DBWS.QNAME := UTL_DBWS.to_qname(v_namespace, 'ping');
        v_wsdl_uri CONSTANT URITYPE := URIFACTORY.getURI(v_wsdl);
        v_str_request CONSTANT VARCHAR2(4000) :=
    '<?xml version="1.0" encoding="UTF-8" ?>
    <ping>
        <pingRequest>
            <echoData>Dev Team üöäß</echoData>
        </pingRequest>
    </ping>';
        v_service UTL_DBWS.SERVICE;
        v_call UTL_DBWS.CALL;
        v_request XMLTYPE := XMLTYPE (v_str_request);
        v_response SYS.XMLTYPE;
    BEGIN
        DBMS_JAVA.set_output(20000);
        UTL_DBWS.set_logger_level('FINE');
        v_service := UTL_DBWS.create_service(v_wsdl_uri, v_service_name);
        v_call := UTL_DBWS.create_call(v_service, v_service_port, v_ping);
        UTL_DBWS.set_property(v_call, 'oracle.webservices.charsetEncoding', 'UTF-8');
        v_response := UTL_DBWS.invoke(v_call, v_request);
        DBMS_OUTPUT.put_line(v_response.getStringVal());
        UTL_DBWS.release_call(v_call);
        UTL_DBWS.release_all_services;
    END;
    /Here is the SERVER OUTPUT:
    ServiceFacotory: oracle.j2ee.ws.client.ServiceFactoryImpl@a9deba8d
    WSDL: http://myhost/myservice?wsdl
    Service: oracle.j2ee.ws.client.dii.ConfiguredService@c881d39e
    *** Created service: -2121202561 - oracle.jpub.runtime.dbws.DbwsProxy$ServiceProxy@afb58220 ***
    ServiceProxy.get(-2121202561) = oracle.jpub.runtime.dbws.DbwsProxy$ServiceProxy@afb58220
    Collection Call info: port={my.namespace}MySoapServicePort, operation={my.namespace}ping, returnType={my.namespace}PingResponse, params count=1
    setProperty(oracle.webservices.charsetEncoding, UTF-8)
    dbwsproxy.add.map: ns, my.namespace
    Attribute 0: my.namespace: xmlns:ns, my.namespace
    dbwsproxy.lookup.map: ns, my.namespace
    createElement(ns:ping,null,my.namespace)
    dbwsproxy.add.soap.element.namespace: ns, my.namespace
    Attribute 0: my.namespace: xmlns:ns, my.namespace
    dbwsproxy.element.node.child.3: 1, null
    createElement(echoData,null,null)
    dbwsproxy.text.node.child.0: 3, Dev Team üöäß
    request:
    <ns:ping xmlns:ns="my.namespace">
       <pingRequest>
          <echoData>Dev Team üöäß</echoData>
       </pingRequest>
    </ns:ping>
    Jul 8, 2008 6:58:49 PM oracle.j2ee.ws.client.StreamingSender _sendImpl
    FINE: StreamingSender.response:<?xml version = '1.0' encoding = 'UTF-8'?>
    <env:Envelope xmlns:env="http://schemas.xmlsoap.org/soap/envelope/"><env:Header/><env:Body><ns0:pingResponse xmlns:ns0="my.namespace"><pingResponse><responseTimeMillis>0</responseTimeMillis><resultCode>0</resultCode><echoData>Dev Team üöäß</echoData></pingResponse></ns0:pingResponse></env:Body></env:Envelope>
    response:
    <ns0:pingResponse xmlns:ns0="my.namespace">
       <pingResponse>
          <responseTimeMillis>0</responseTimeMillis>
          <resultCode>0</resultCode>
          <echoData>Dev Team üöäß</echoData>
       </pingResponse>
    </ns0:pingResponse>As you can see the character encoding is broken in the request and in the response, i.e. the SOAP encoder does not take into consideration the UTF-8 encoding.
    I tracked down the problem to the method oracle.jpub.runtime.dbws.DbwsProxy.dom2SOAP(org.w3c.dom.Node, java.util.Hashtable); and more specifically to the calls of oracle.j2ee.ws.saaj.soap.soap11.SOAPFactory11.
    My question is: is there a way to make the SOAP encoder use the correct character encoding?
    Thanks a lot in advance!
    Greetings,
    Dimitar

    I found a workaround of the problem:
        v_response := XMLType(v_response.getBlobVal(NLS_CHARSET_ID('CHAR_CS')), NLS_CHARSET_ID('AL32UTF8'));Ugly, but I'm tired of decompiling and debugging Java classes ;)
    Greetings,
    Dimitar

  • Russian Character is not displaying Correctly in HTML Attachment

    Hi,
    We are converting the TRIP Details into HTML Format and sending this as attachment by using the function module 'SO_DOCUMENT_SEND_API1'
    It is working fine if the Content are in English, But if there is any Russian characters, Junk character is getting display while opening the HTML attachment in the Mail.
    While debugging we have checked the HTML Content before sending it to the Function Module SO_DOCUMENT_SEND_API1, Which is coming correctly in Russian Character.
    Kindly suggest is there is any specific setting need to added in HTML.
    We are using the CHARSET UTF-8 in the HTML
    Then same scenario we have tested in Spanish (ES), which is also displaying correctly.
    With Regards,
    Subbaraman v

    Hi James
    Oh~ It perfectly works!!
    I've been tried to solve this problem for a long time and
    I really appreciate for your help~
    Thanks~
    Kwangbock.

  • XML parser not detecting character encoding

    Hi,
    I am using Jdeveloper 9.0.5 preview and the same problem is happening in our production AS 9.0.2 release.
    The character encoding of an xml document is not correctly being detected by the oracle v2 parser even though the xml declaration correctly contains
    <?xml version="1.0" encoding="ISO-8859-1" ?>
    instead it treats the document as UTF8 encoding which is fine until a document comes along with an extended character which then causes a
    java.io.UTFDataFormatException: Invalid UTF8 encoding.
    at oracle.xml.parser.v2.XMLUTF8Reader.checkUTF8Byte(XMLUTF8Reader.java:160)
    at oracle.xml.parser.v2.XMLUTF8Reader.readUTF8Char(XMLUTF8Reader.java:187)
    at oracle.xml.parser.v2.XMLUTF8Reader.fillBuffer(XMLUTF8Reader.java:120)
    at oracle.xml.parser.v2.XMLByteReader.saveBuffer(XMLByteReader.java:448)
    at oracle.xml.parser.v2.XMLReader.fillBuffer(XMLReader.java:2023)
    at oracle.xml.parser.v2.XMLReader.tryRead(XMLReader.java:972)
    at oracle.xml.parser.v2.XMLReader.scanXMLDecl(XMLReader.java:2589)
    at oracle.xml.parser.v2.XMLReader.pushXMLReader(XMLReader.java:485)
    at oracle.xml.parser.v2.XMLReader.pushXMLReader(XMLReader.java:192)
    at oracle.xml.parser.v2.XMLParser.parse(XMLParser.java:144)
    as you can see it is explicitly casting the XMLUTF8Reader to perform the read.
    I can get around this by hard coding the xml input stream to be processed by a reader
    XMLSource = new StreamSource(new InputStreamReader(XMLInStream,"ISO-8859-1"));
    however the manual documents that the character encoding is automatically picked up from the xml file and casting into a reader is not necessary, so I should be able to write
    XMLSource = new StreamSource(XMLInStream)
    Does anyone else experience this same problem?
    having to hardcode the encoding causes my software to lose flexibility.
    Jarrod Sharp.

    An XML document should be created with 'ISO-8859-1' encoding to be parsed as 'ISO-8859-1' encoding.

  • What's the difference of character encoding between 1.4.0and1.4.2 in Linux

    As i find, the character encoding about chinese in jdk1.4.2 no langer the same of jdk1.4.0.
    In jdk1.4.0, the character encoding used the "file.encoding" system property, we often set the
    property with "gb2312".
    But in jdk1.4.2, i find that the default character encoding no longer used the "file.encoding" system property.
    Who knows the reason?
    Test Program:
    public class B{
    public static void main(String args[]) throws Exception{
    byte [] bytes = new byte[]{(byte)0xD6,(byte)0xD0,(byte)0xCE,(byte)0xC4};
    String s1 = new String(bytes);
    String s2 = new String(bytes,System.getProperty("file.encoding"));
    System.out.println("s1="+s1+" , s2="+s2);
    System.out.println("s1.length=" + s1.length() + " , s2.length="+s2.length());
    run four times and the result list:
    [root@app15 component]# /usr/local/j2sdk1.4.0/bin/java -Dfile.encoding=ISO-8859-1 -cp . B
    s1=&#20013;&#25991; , s2=&#20013;&#25991;
    s1.length=4 , s2.length=4
    [root@app15 component]# /usr/local/j2sdk1.4.0/bin/java -Dfile.encoding=gb2312 -cp . B
    s1=&#20013;&#25991; , s2=&#20013;&#25991;
    s1.length=2 , s2.length=2
    [root@app15 component]# /usr/local/j2sdk1.4.2/bin/java -Dfile.encoding=ISO-8859-1 -cp . B
    s1=&#20013;&#25991; , s2=&#20013;&#25991;
    s1.length=4 , s2.length=4
    [root@app15 component]# /usr/local/j2sdk1.4.2/bin/java -Dfile.encoding=gb2312 -cp . B
    s1=&#20013;&#25991; , s2=??
    s1.length=4 , s2.length=2
    [root@app15 component]#

    I don't know for sure, but:
    -- The API documentation for String says that "new String(byte[])" uses "the platform's default charset".
    -- The API documentation for Charset says "The default charset is determined during virtual-machine startup and typically depends upon the locale and charset being used by the underlying operating system."
    You'll notice that it doesn't say anything about using the file.encoding system value, so presumably (based on your experiments) it doesn't. I did a search for "java default charset" and didn't find anything specific, but this site says "As of Java 1.4.1, the default Charset varies from platform to platform" and suggests you explicitly hard-code your charset. I would agree with that.

  • Why differing Character Encoding and how to fix it?

    I have PRS-950 and PRS-350 readers, both since 2011.  
    In the last year, I've been getting books with Character Encoding that is not easy to read.  In playing around with my browsers and View -> Encoding menus, I have figured out that it has something to do with the character encoding within the epub files.
    I buy books from several ebook stores and I borrow from the library.
    The problem may be the entire book, but it is usually restricted to a few chapters, with rare occasion where the encoding changes within a chapter.  Usually it is for a whole chapter, not part, and it can be seen in chapters not consecutive to each other.
    It occurs whether the book is downloaded directly to my 950 reader or if I load it to either reader from my computer(s), which are all Mac OS X of several versions fom 10.4 to Mountain Lion.  SInce it happens when the book is downloaded directly, I figure the operating system of my computer is not relevant.
    There are several publishers involved, though Baen (no DRM ebooks) has not so far been one of them.
    If I look at the books with viewers on the computer, the encoding is the same.  I've read them in Calibre, in the Sony Reader App, and in Adobe Digital Editions 2.0.  It's always the same.
    I believe the encoding is inherent to the files.  I would like to fix this if I can to make the books I've purchased, many of them in paper and electronically, more enjoyable to read on my readers.
    Example: I’ve is printed instead of I've.
    ’ for apostrophe
    “ the opening of a quotation,
    â€?  for closing the quotation,
    and I think — is for a hyphen.
    When a sentence had “’m  for " 'm at the beginning of a speech (when the character was slurring his words) it took me a while to figure out how it was supposed to read.
    “’Sides, â€™tis only for a moon.  That ain’t long.â€?
    was in one recent book.
    Translation: " 'Sides, 'tis only for a moon. That ain't long."
    See what I mean? 
    Any ideas?

    Hi
    I wonder if it’s possible to download a free ebook with such issue, in order to make some “tests”.
    Perhaps it’s possible, on free ebooks (without DRM), to add fonts by using softwares like Sigil.

  • How can I tell what character encoding is sent from the browser?

    Hi,
    I am developing a servlet which supposed to be used to send and receive message
    in multiple character set. However, I read from the previous postings that each
    Weblogic Server can only support one input character encoding. Is that true?
    And do you have any suggestions on how I can do what I want. For example, I
    have a HTML form for people to post any comments (they may post in any characterset,
    like ShiftJIS, Big5, Gb, etc). I need to know what character encoding they are
    using before I can read that correctly in the servlet and save in the database.

    From what I understand (I haven't used it yet) 6.1 supports the 2.3
    servlet spec. That should have a method to set the encoding.
    Otherwise, I don't think you can support multiple encodings in one
    instance of WebLogic.
    From what I know browsers don't give any indication at all about what
    encoding they're using. I've read some chatter about the HTTP spec
    being changed so it's always UTF-8, but that's a Some Day(TM) kind of
    thing, so you're stuck with all the stuff out there now which doesn't do
    everything in UTF-8.
    Sorry for the bad news, but if it makes you feel any better I've felt
    your pain. Oh, and trying to process multipart/form-data (file upload)
    forms is even worse and from what I've seen the API that people talk
    about on these newsgroups assumes everything is ISO-8859-1.
    Emmy Lau wrote:
    >
    Hi,
    I am developing a servlet which supposed to be used to send and receive message
    in multiple character set. However, I read from the previous postings that each
    Weblogic Server can only support one input character encoding. Is that true?
    And do you have any suggestions on how I can do what I want. For example, I
    have a HTML form for people to post any comments (they may post in any characterset,
    like ShiftJIS, Big5, Gb, etc). I need to know what character encoding they are
    using before I can read that correctly in the servlet and save in the database.

  • Reading Advance Queuing with XMLType payload and JDBC Driver character encoding

    Hi
    I've got a problem retrieving the message from the queue with XMLType payload in Java.
    It was working fine in 10g database but after the switch to 11g it returns corrupted string instead of real XML message. Database NLS_LANG setting is AL32UTF8
    It is said that JDBC driver should deal with that automatically but it obviously don't in this case. When I dequeue the message using database functionality (DBMS_AQ package) it looks fine but not when using JDBC driver so Ithink it is character encoding issue or so. The message itself is enqueued by the database and supposed to be retrieved by dedicated EJB.
    Driver file used: ojdbc6.jar
    Additional libraries: aqapi.jar, xdb.jar
    All file taken from 11g database installation.
    What shoul dI do to get the xml message correctly?

    Do you mean NLS_LANG is AL32UTF8 or the database character set is AL32UTF8? What is the database character set (SELECT value FROM nls_database_parameters WHERE parameter='NLS_CHARACTERSET')?
    Thanks,
    Sergiusz

  • Where is the Character Encoding option in 29.0.1?

    After the new layout change, the developer menu don't have Character Encoding options anymore where the hell is it? It's driving me mad...

    You can also this access the Character Encoding via the View menu (Alt+V C)
    See also:
    *Options > Content > Fonts & Colors > Advanced > Character Encoding for Legacy Content

  • XML Server character encoding

    Hi, I'm using the proxy generated by UDS for XML Server published service object. I have some problems with strings return types with some locales ( e.g:spanish ).
    How can I set the XML Server response character encoding to avoid this problem?

    This tech note will help:
    http://sunsolve.sun.com/pub-cgi/retrieve.pl?doc=fsunone%2F7717&zone_110=7717%2A%20
    ka

  • Wrong character encoding from flash to mysql

    Hi, im experiencing problems with character encoding not
    functioning correctly when sending from flash to mysql. What i am
    doing is doing a contact form in flash which then sends the value
    to a php file which takes the values and inserts them into a table.
    As i'm using icelandic charecters i need the char encoding to be
    either latin1 or utf8 in mysql, or at least i think so. But it
    seems that flash or the php document isn't sending in the same
    format as i have selected in mysql because all special icelandic
    characters come scrambled in the mysql table. Firefox tells me
    tough that the html document containing the flash movie is using
    utf-8.

    I don't know anything about Icelandic characters, but Flash
    generally really likes UTF-8. So it should be sending that if that
    is what it is starting with.
    You aren't using any kind of useCodePage? That will mess it
    up.
    Are you sure that the input method is Icelandic?
    In the testing environment can you list variables (from the
    debug menu) and see if they look proper? If they do then Flash is
    readying them correctly and the problem must be coming in further
    down stream.

  • How to set the system default file character encoding to UTF-8?

    Hi all. This is driving me nuts, on both my Windows box and Snow Leopard; I figure much more chance of finding the answer for OS X.
    My language and locale are set to Australian English. $LANG=en_AU.UTF-8
    However, as I believe is expected, OS X (and Windows for that matter) will create files by default with character encoding of Cp1252 (Latin-1). That is, the FILE encoding in the file metadata - the Byte Order Mark I believe. The file itself, not the characters written to it.
    This, in a word, bites. I don't want to be restricted to only ASCII by default, and it is causing me problems with certain software (a Firefox plugin) that creates text files, passing in UTF-8 encoded content, which is then mangled because the file encoding itself is still Cp1252. (I know, I've tested this by changing the file encoding manually and having it overwritten again by the plugin: works correctly.)
    As a simple example, just `touch somefile` from terminal creates a file in Cp1252 -- I'm obtaining that info by opening in jEdit by the way (anyone know of something better?).
    In other locales that are not English-based, I believe the default file encoding is UTF-8. But surely this can be controlled independently? There must be a system configuration value somewhere that specifies file encoding default. Can someone please tell me what it is?
    Thanks!

    However, as I believe is expected, OS X (and Windows for that matter) will create files by default with character encoding of Cp1252 (Latin-1). That is, the FILE encoding in the file metadata - the Byte Order Mark I believe. The file itself, not the characters written to it.
    Apps like TextEdit and Mail have settings that let you determine the encoding of text produced. The default would normally depend on the character content of the file, ranging from ASCII for basic English to Windows Latin-1 (Win 1252) or ISO Latin -1 (ISO 8859-1) to UTF-8 for other content.
    Win 1252 is not ASCII, but has twice the number of characters in the latter.
    Byte Order Mark is something totally different --it's a particular character used to signal certain encodings.
    http://en.wikipedia.org/wiki/Byteordermark
    As a simple example, just `touch somefile` from terminal creates a file in Cp1252 -- I'm obtaining that info by opening in jEdit by the way (anyone know of something better?).
    For what Terminal does and how to change it, it might best to post in the Unix forum:
    http://discussions.apple.com/forum.jspa?forumID=735
    For problems with a FireFox plugin, it might be good to ask on their own forums as well.

  • Character Encoding for JSPs and HTML forms

    After having read loads of postings on character encoding problems I'm still puzzled about the following problem:
    I have an instance (A) of WL 8.1 SP3 on a WinXP machine and another instance (B) of WL 8.1 without any SP on a Win2K machine. The underlying Windows locale is english(US) in both cases.
    The same application deployed as a war file to these instances does not behave in the same way when it comes to displaying non-Latin1-characters like the Euro symbol: Whereas (A) shows and accepts these characters as request-parameters, (B) does not.
    Since the war file is the same (weblogic.xml, jsps and everything), the reason for this must either be the service-pack-level or some other configuration setting I overlooked.
    Any hints are appreciated!

    Try this:
    Prefrences -> Content -> Fonts & Color -> Advanced
    At the bottom, choose your Encoding.

Maybe you are looking for

  • IPod touch 5th gen back camera won't work

    Whenever i open the camera app the shutter stays on but on FaceTime, the front camera works perfectly. (I already tried resetting it and backing it up) I really don't want to go to the apple store to fix it. Thanks!

  • ORA-01403 displaying image in a report

    Hello. I have a table with BLOB column that stores an image. I created a report on this table trying to display this column's images. I surround the column with dbms_lob.getlength() function in the SQL of the report and I use the following for the co

  • Import a partition?

    Hello every one, I have one partition table say TABLE_A with partitions from jan_2008 to dec_2008.Now i have exported one partition say feb_2008 using the following syntax exp test/pwd file=part.dmp tables=table_a:feb_2008 now i drop the partition fe

  • Conditional Printing when NE to 0

    Post Author: LaVerne CA Forum: Formula Good Morning - I would like to suppress printing of records when (the Actual Volume = 0 and the Budgeted Volume = 0).  Amounts in either of these field can be less than 0 in which case, I would like the row to p

  • Shutter does not open after picture is taken

    I recently noticed my 1.1.4 Iphone camera does not work anymore. The problem is I can open the camera fine, but when I take a photo, it makes the shutter sound however when the shutter closes, it does not open again. After waiting about 30 min I have