Bad character encoding when ResultSet is CONCUR_UPDATABLE

Hi,
I have a problem with encoding of texts loaded from database (slovak, czech).
My application environment first:
MS SQL Server 2000, standard JDBC drivers from Microsoft
DB table encoded in windows-1250
SAP WAS application server on UNIX, file.encoding property = iso-8859-1
Here is part of JSP (using only for tests ):
<%@ page language="java" contentType="text/html;charset=iso-8859-2"%>
<% request.setCharacterEncoding("iso-8859-2"); %>
<%
  try {
    // ... getting connection  
    java.sql.Statement stmt = conn.createStatement (
      ResultSet.TYPE_SCROLL_INSENSITIVE, ResultSet.CONCUR_UPDATABLE );
    String q = "SELECT col FROM table where id='1'";
    java.sql.ResultSet result = stat.executeQuery(q);
    if ( result.next() )
      out.write( result.getString(1) );
  catch ( SQLException sqle )
%>
Special characters {Slovak, Czech} are corrupted. 
In first two lines I've tried utf-8, iso-8859-1, windows-1250 (also with pageEncoding directive) and I've check browser encoding setting in every case.
Then I've tried tens of combination in this way:
out.write( new String (result.getString(1).getBytes("windows-1250"), "iso-8859-1"));
Nothing from this helps.
And now, for me the most interesting part. When I change line with statement creation from:
stmt = conn.createStatement ( ResultSet.TYPE_SCROLL_INSENSITIVE, ResultSet.CONCUR_UPDATABLE);
to:
stmt = conn.createStatement ( ResultSet.TYPE_SCROLL_INSENSITIVE, ResultSet.CONCUR_READ_ONLY);
diacritic works... (for every setting {iso-8859-2, utf-8, windows-1250} )
I've tried to do these changes in environment configuration::
1. SAP WAS on UNIX + Oracle 10g
2. SAP WAS on Windows + MS SQL 2000
3. standalone java application + MS SQL 2000
All 3 works fine with ResultSet.CONCUR_UPDATABLE...
Can you give me advice, how to solve this problem? Thanks for any answer.
Regards,
Juraj

Hi Juraj,
If the column containing the texts is supporting unicode (with nVarchar, nChar,etc), don't bother to set the charset;
otherwise
You can try to put "charset=iso-8859-1;" in your JDBC URL if the "iso-8859-1" is the corresponding charset for slovak,czech.
For instance,
jdbc:microsoft:sqlserver://localhost:1433;DatabaseName=XXX;SelectMethod=cursor;charset=iso-8859-1
By the way, you can output the texts after they are retrieved from database to see if the problem is related to JDBC or JSP.
for instance,
Put the following line to the if block,
System.out.println( result.getString(1));
if ( result.next() ){
      String stringFromDB = result.getString(1);
      System.out.println(StringFromDB);
      out.write( [stringFromDB );
Dennis

Similar Messages

  • Character  encoding & Sensetive ResultSet

    Dear friends,
    We are using an oracle thin JDBC driver. The charset of our DB is ar8mswin1256. When we use Insensitive ResultSet, it works fine. But for Sensetive ResultSet the jdbc driver does not tranform the content of DB to java unicode strings. Do you know any helpful way?
    Thank in advance,
    Arash

    Thanks. I will check that link. Looks interesting.
    I need suggestions for direction database -> browser...
    Regards,
    djuri
    Message was edited by:
    djuri@sk

  • Having trouble with character encoding when opening a .csv file either on excel or in numbers!

    Hi.
    I oftenly use internet banking services which produce .csv reports for the transactions i make. After downloading these reports i try to open them in numbers (and excel) but the encoding for Greek is messed up. Opening the same file with excel on a windows computer everything works fine. Is there something i can do to fix this?Same thing happens to both my Macbook Pro and my iMac.
    Many thanks in advance,
    John

    I suspect that the .csv reports are not in Unicode which is required on Mac's.  You would probably have to open the files with something like TextEdit and resave as Unicode to get them to display properly.

  • When I load certain websites the the writing is all squashed up. I correct this by changing the character encoding setting. I am using the latest Apple Mac machine. Thanks in advance

    When I load certain websites the the writing is all squashed up. I correct this by changing the character encoding setting. I am using the latest Apple Mac machine. Thanks in advance

    Thanks for that information!
    I'm sure I will be calling AppleCare, but the problem is, they charge for the phone calls don't they? Because I don't have money to be spending to be on the phone with a support service.
    In other things, it seemed like the only time my MacBook was working was when I had Snow Leopard without the 10.6.8 update download that was supposed to be done to prepare for OS X Lion.
    When I look at the information of my HD it says that I have 10.6.8 but that was the install that it claimed to have failed and caused me to restart resulting in all of the repeated problems.
    Also, because my computer is currently down, and I've lost all files how would that effect the use of my iPhone? Because if it doesn't get fixed by the time OS 5 is released, how would I be able to upgrade?!

  • Which character encoding do Adobe ExportPDF use when converting to word document?

    Which character encoding do Adobe ExportPDF use when converting to word document?

    Hi Ram,
    Sorry for the long delay. I've been trying to track down this answer for you.
    We're using UTF-8 character encoding for our files.
    -David

  • Oracle.xdo.parser.v2.XMLParseException: Bad character (1e)

    Hi,
    I am using BI Publisher shipped with Oracle EBS R12 (I think it is BIP 5.6.3).
    I have made a specific XSL-TEXT layout file to generate some output files in a proprietary format (for label printing). Everything is working like a charm so far.
    However now I want to put ASCII control characters into my output file, e.g. char 30 / RS / Record Separator (https://en.wikipedia.org/wiki/C0_and_C1_control_codes). I know that XML is not really made for some of those control characters, but well...
    I have something like this in my XSLT layout file:
    <xsl:output method="text" media-type="text/plain" indent="no" encoding="iso-8859-1" />
    <xsl:text>& #30;</xsl:text>
    ...When I run it in BIP I get following error message in the OPP logfile:
    oracle.xdo.parser.v2.XMLParseException: Bad character (1e).
    I found Metalink note 745451.1 stating that patch 7687414 should fix the problem. We have this patch installed (comes with R12.1.3) but the issue is still there.
    Does anyone have ideas or experience with this? I know that I should not use non-supported characters in XML, but that would mean that BIP has no purpose anymore for this case and I have to find another solution from someone else or develop my own program.
    Thanks in advance,
    David.
    Edited by: David Weber on Aug 24, 2012 2:12 PM

    Does anyone have an idea? Anyone experience with generating binary data via BIP maybe?
    regards,
    David.

  • Character encoding again

    Hi, i havent got any answer so i try to ask again...
    I have created a page from Data Controls.
    I have created parameter form. And table. Detail is showed at the bottom of page (there is shown current row through #bindings. . .
    Everything works fine when i fill something into parameter form table is filtered by that criteria, when current row is changed, the detail is also changed. But when i fill some czech character into parametr form it works pretty bad.
    The table is correctly filtered but when i make any other action after filtering, the table shows no rows.
    I found what cause this problem. It is the Property in bindings that holds value for this parameter. When i first call bindings.findXXX.execute the Propertys value is "č" for example. That is correct and table shows filtered rows. After i perform another action (i think it is not dependent what the action is, but change current row fow example) the value in that Property has changed to "?" instead of "č" and because of this filter is applied again and table shows no rows. I have checked all encoding and character encoding is set to utf-8. Is this the problem and i miss some settings ?
    1) menu tools-preferences-Environment
    2) project properties- compiler-character encoding
    3) in jspx
    <?xml version='1.0' encoding='UTF-8'?>
    <jsp:directive.page contentType="text/html;charset=UTF-8"
    pageEncoding="UTF-8"/>
    <afh:head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
    </afh:head>
    is there other ? i dont know i set this settings long ago ...
    Please if someone know, give me some hint , thanks for help.
    Jdeveloper 10.1.3.0.4(SU5)

    check regional language settings on your machiner where u r application server is running. I faced this problem but was able to resolve by modifying the
    NLS_LANG = AMERICAN_AMERICA.WE8ISO8859P1
    NLS_LANG is under HKEY_LOCAL_MACHINE==>ORACLE
    WE8ISO8859P1 is the standard encoding for my application developed in local indian language and works fine for me
    THIS can help check out
    Amit

  • Validator warning: Character Encoding mismatch!

    I have been following the discussion on favicons with
    interest. A few days ago
    I added a favicon to the page:
    http://www.corybas.com/, and
    eventually persuaded
    IE6 to show the favicon, provided I loaded it by clicking the
    icon. It did not
    show it when the page reloaded itself, and now it has
    forgotten all about it.
    Following some discussions here this morning I ran the
    Validator over the page
    and got the diagnostic "The character encoding specified in
    the HTTP header
    (utf-8) is different from the value in the <meta>
    element (iso-8859-1). I will
    use the value from the HTTP header (utf-8) for this
    validation. "
    As far as I can work out, this is caused by an
    incompatibility in the witchcraft
    Dreamweaver includes in a basic HTML page, which is as
    follows:
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0
    Transitional//EN"
    http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
    <html xmlns="
    http://www.w3.org/1999/xhtml">
    <head>
    <meta http-equiv="Content-Type" content="text/html;
    charset=iso-8859-1"
    />
    <title>Untitled Document</title>
    </head>
    Should I worry about this warning?
    (I removed a few other insignificant errors, but IE6 still
    can't see the
    favicon.)
    Clancy

    On 22 Apr 2008 in macromedia.dreamweaver, Clancy wrote:
    > Following some discussions here this morning I ran the
    Validator
    > over the page and got the diagnostic "The character
    encoding
    > specified in the HTTP header (utf-8) is different from
    the value in
    > the <meta> element (iso-8859-1). I will use the
    value from the HTTP
    > header (utf-8) for this validation. "
    >
    > As far as I can work out, this is caused by an
    incompatibility in
    > the witchcraft Dreamweaver includes in a basic HTML
    page, which is
    > as follows:
    >
    > <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0
    Transitional//EN"
    > "
    http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
    > <html xmlns="
    http://www.w3.org/1999/xhtml">
    > <head>
    > <meta http-equiv="Content-Type" content="text/html;
    > charset=iso-8859-1"
    > />
    > <title>Untitled Document</title>
    > </head>
    >
    > Should I worry about this warning?
    At an offhand guess, you're on an Apache 2.x server. In its
    default
    setup, it sends a UTF-8 charset header. That is sufficient
    for the
    browser. It also conflicts with the iso-8859-1 charset in the
    document. The fastest cure for it is to remove the charset
    meta from
    the page, or change it to UTF-8. But it has no bad effects
    that I know
    of on a browser.
    Joe Makowiec
    http://makowiec.net/
    Email:
    http://makowiec.net/contact.php

  • Locale and character encoding. What to do about these dreadful ÅÄÖ??

    It's time for me to get it into my head how this works. Please, help me understand before I go nuts.
    I'm from Sweden and we use a few of these weird characters like ÅÄÖ.
    If I create a file called "övrigt.txt" in windows, then the file will turn up as "?vrigt.txt" on my Linux pc (At least in the console, sometimes it looks ok in other apps in X). The same is true if I create the file in Linux and copy it to Windows, it will look just as weird on the other side.
    As I (probably) can't change the way windows works, my question is what I have to do to have these two systems play nicely with eachother?
    This is the output from locale:
    LANG=en_US.utf8
    LC_CTYPE="en_US.utf8"
    LC_NUMERIC="en_US.utf8"
    LC_TIME="en_US.utf8"
    LC_COLLATE=C
    LC_MONETARY="en_US.utf8"
    LC_MESSAGES="en_US.utf8"
    LC_PAPER="en_US.utf8"
    LC_NAME="en_US.utf8"
    LC_ADDRESS="en_US.utf8"
    LC_TELEPHONE="en_US.utf8"
    LC_MEASUREMENT="en_US.utf8"
    LC_IDENTIFICATION="en_US.utf8"
    LC_ALL=
    Is there anything here I should change? I have tried using ISO-8859-1 with no luck. Mind you that I want to have the system wide language set to english. The only thing I want to achieve is that "Ö" on widows should turn up as "Ö" i Linux as well, and vice versa.
    Please save my hair from being torn off, I'm going bald here...

    Hey, thanks for all the answers!
    I share my files in a number of ways, but mainly trough a web application called Ajaxplorer (very nice btw...). The thing is that as soon as a windows user uploads anything with special chatacters in the file name my programs, xbmc, console etc, refuses to read them correctly. Other ways of sharing is through file copying with usb sticks, ssh etc. It's really not the way of sharing that is the problem I think, but rather the special characters being used sometimes.
    I could probably convert the filenames with suggested applications but then I'll set the windows users in trouble when they want to download them again, won't I?
    I realize that it's cp1252 that is the bad guy in this drama. Is there no way to set/use cp1252 as a character encoding in Linux? It's probably a bad idea as utf8 seems like the future way to go, but the fact that these two OS's can't communicate too well in this area is pretty useless if you ask me.
    To wrap this up I'll answer some questions...
    @EVRAMP: I'm actually using pcmanfm, but that is only for me and I'm not dealing very often with vfat partitions to be honest.
    @pkervien: Well, I think I mentioned my forms of sharing above. (kul med lite arch-svenskar!)
    @quarkup: locale.gen is edited and both sv.SE and en_US have utf-8 and ISO-8859 enabled and generated.
    ...and to clearify things even further. It doesn't matter if I get or provide a file via a usb stick, samba, ftp or by paper. All I want is for "Ö" to always be "Ö", everywhere.
    I can't believe how hard this is to get around. Linus is finish for crying out loud. I thought he'd sorted this out the first thing he did. Maybe he doesn't deal with windows or their users at all

  • What every developer should know about character encoding

    This was originally posted (with better formatting) at Moderator edit: link removed/what-every-developer-should-know-about-character-encoding.html. I'm posting because lots of people trip over this.
    If you write code that touches a text file, you probably need this.
    Lets start off with two key items
    1.Unicode does not solve this issue for us (yet).
    2.Every text file is encoded. There is no such thing as an unencoded file or a "general" encoding.
    And lets add a codacil to this – most Americans can get by without having to take this in to account – most of the time. Because the characters for the first 127 bytes in the vast majority of encoding schemes map to the same set of characters (more accurately called glyphs). And because we only use A-Z without any other characters, accents, etc. – we're good to go. But the second you use those same assumptions in an HTML or XML file that has characters outside the first 127 – then the trouble starts.
    The computer industry started with diskspace and memory at a premium. Anyone who suggested using 2 bytes for each character instead of one would have been laughed at. In fact we're lucky that the byte worked best as 8 bits or we might have had fewer than 256 bits for each character. There of course were numerous charactersets (or codepages) developed early on. But we ended up with most everyone using a standard set of codepages where the first 127 bytes were identical on all and the second were unique to each set. There were sets for America/Western Europe, Central Europe, Russia, etc.
    And then for Asia, because 256 characters were not enough, some of the range 128 – 255 had what was called DBCS (double byte character sets). For each value of a first byte (in these higher ranges), the second byte then identified one of 256 characters. This gave a total of 128 * 256 additional characters. It was a hack, but it kept memory use to a minimum. Chinese, Japanese, and Korean each have their own DBCS codepage.
    And for awhile this worked well. Operating systems, applications, etc. mostly were set to use a specified code page. But then the internet came along. A website in America using an XML file from Greece to display data to a user browsing in Russia, where each is entering data based on their country – that broke the paradigm.
    Fast forward to today. The two file formats where we can explain this the best, and where everyone trips over it, is HTML and XML. Every HTML and XML file can optionally have the character encoding set in it's header metadata. If it's not set, then most programs assume it is UTF-8, but that is not a standard and not universally followed. If the encoding is not specified and the program reading the file guess wrong – the file will be misread.
    Point 1 – Never treat specifying the encoding as optional when writing a file. Always write it to the file. Always. Even if you are willing to swear that the file will never have characters out of the range 1 – 127.
    Now lets' look at UTF-8 because as the standard and the way it works, it gets people into a lot of trouble. UTF-8 was popular for two reasons. First it matched the standard codepages for the first 127 characters and so most existing HTML and XML would match it. Second, it was designed to use as few bytes as possible which mattered a lot back when it was designed and many people were still using dial-up modems.
    UTF-8 borrowed from the DBCS designs from the Asian codepages. The first 128 bytes are all single byte representations of characters. Then for the next most common set, it uses a block in the second 128 bytes to be a double byte sequence giving us more characters. But wait, there's more. For the less common there's a first byte which leads to a sersies of second bytes. Those then each lead to a third byte and those three bytes define the character. This goes up to 6 byte sequences. Using the MBCS (multi-byte character set) you can write the equivilent of every unicode character. And assuming what you are writing is not a list of seldom used Chinese characters, do it in fewer bytes.
    But here is what everyone trips over – they have an HTML or XML file, it works fine, and they open it up in a text editor. They then add a character that in their text editor, using the codepage for their region, insert a character like ß and save the file. Of course it must be correct – their text editor shows it correctly. But feed it to any program that reads according to the encoding and that is now the first character fo a 2 byte sequence. You either get a different character or if the second byte is not a legal value for that first byte – an error.
    Point 2 – Always create HTML and XML in a program that writes it out correctly using the encode. If you must create with a text editor, then view the final file in a browser.
    Now, what about when the code you are writing will read or write a file? We are not talking binary/data files where you write it out in your own format, but files that are considered text files. Java, .NET, etc all have character encoders. The purpose of these encoders is to translate between a sequence of bytes (the file) and the characters they represent. Lets take what is actually a very difficlut example – your source code, be it C#, Java, etc. These are still by and large "plain old text files" with no encoding hints. So how do programs handle them? Many assume they use the local code page. Many others assume that all characters will be in the range 0 – 127 and will choke on anything else.
    Here's a key point about these text files – every program is still using an encoding. It may not be setting it in code, but by definition an encoding is being used.
    Point 3 – Always set the encoding when you read and write text files. Not just for HTML & XML, but even for files like source code. It's fine if you set it to use the default codepage, but set the encoding.
    Point 4 – Use the most complete encoder possible. You can write your own XML as a text file encoded for UTF-8. But if you write it using an XML encoder, then it will include the encoding in the meta data and you can't get it wrong. (it also adds the endian preamble to the file.)
    Ok, you're reading & writing files correctly but what about inside your code. What there? This is where it's easy – unicode. That's what those encoders created in the Java & .NET runtime are designed to do. You read in and get unicode. You write unicode and get an encoded file. That's why the char type is 16 bits and is a unique core type that is for characters. This you probably have right because languages today don't give you much choice in the matter.
    Point 5 – (For developers on languages that have been around awhile) – Always use unicode internally. In C++ this is called wide chars (or something similar). Don't get clever to save a couple of bytes, memory is cheap and you have more important things to do.
    Wrapping it up
    I think there are two key items to keep in mind here. First, make sure you are taking the encoding in to account on text files. Second, this is actually all very easy and straightforward. People rarely screw up how to use an encoding, it's when they ignore the issue that they get in to trouble.
    Edited by: Darryl Burke -- link removed

    DavidThi808 wrote:
    This was originally posted (with better formatting) at Moderator edit: link removed/what-every-developer-should-know-about-character-encoding.html. I'm posting because lots of people trip over this.
    If you write code that touches a text file, you probably need this.
    Lets start off with two key items
    1.Unicode does not solve this issue for us (yet).
    2.Every text file is encoded. There is no such thing as an unencoded file or a "general" encoding.
    And lets add a codacil to this – most Americans can get by without having to take this in to account – most of the time. Because the characters for the first 127 bytes in the vast majority of encoding schemes map to the same set of characters (more accurately called glyphs). And because we only use A-Z without any other characters, accents, etc. – we're good to go. But the second you use those same assumptions in an HTML or XML file that has characters outside the first 127 – then the trouble starts. Pretty sure most Americans do not use character sets that only have a range of 0-127. I don't think I have every used a desktop OS that did. I might have used some big iron boxes before that but at that time I wasn't even aware that character sets existed.
    They might only use that range but that is a different issue, especially since that range is exactly the same as the UTF8 character set anyways.
    >
    The computer industry started with diskspace and memory at a premium. Anyone who suggested using 2 bytes for each character instead of one would have been laughed at. In fact we're lucky that the byte worked best as 8 bits or we might have had fewer than 256 bits for each character. There of course were numerous charactersets (or codepages) developed early on. But we ended up with most everyone using a standard set of codepages where the first 127 bytes were identical on all and the second were unique to each set. There were sets for America/Western Europe, Central Europe, Russia, etc.
    And then for Asia, because 256 characters were not enough, some of the range 128 – 255 had what was called DBCS (double byte character sets). For each value of a first byte (in these higher ranges), the second byte then identified one of 256 characters. This gave a total of 128 * 256 additional characters. It was a hack, but it kept memory use to a minimum. Chinese, Japanese, and Korean each have their own DBCS codepage.
    And for awhile this worked well. Operating systems, applications, etc. mostly were set to use a specified code page. But then the internet came along. A website in America using an XML file from Greece to display data to a user browsing in Russia, where each is entering data based on their country – that broke the paradigm.
    The above is only true for small volume sets. If I am targeting a processing rate of 2000 txns/sec with a requirement to hold data active for seven years then a column with a size of 8 bytes is significantly different than one with 16 bytes.
    Fast forward to today. The two file formats where we can explain this the best, and where everyone trips over it, is HTML and XML. Every HTML and XML file can optionally have the character encoding set in it's header metadata. If it's not set, then most programs assume it is UTF-8, but that is not a standard and not universally followed. If the encoding is not specified and the program reading the file guess wrong – the file will be misread.
    The above is out of place. It would be best to address this as part of Point 1.
    Point 1 – Never treat specifying the encoding as optional when writing a file. Always write it to the file. Always. Even if you are willing to swear that the file will never have characters out of the range 1 – 127.
    Now lets' look at UTF-8 because as the standard and the way it works, it gets people into a lot of trouble. UTF-8 was popular for two reasons. First it matched the standard codepages for the first 127 characters and so most existing HTML and XML would match it. Second, it was designed to use as few bytes as possible which mattered a lot back when it was designed and many people were still using dial-up modems.
    UTF-8 borrowed from the DBCS designs from the Asian codepages. The first 128 bytes are all single byte representations of characters. Then for the next most common set, it uses a block in the second 128 bytes to be a double byte sequence giving us more characters. But wait, there's more. For the less common there's a first byte which leads to a sersies of second bytes. Those then each lead to a third byte and those three bytes define the character. This goes up to 6 byte sequences. Using the MBCS (multi-byte character set) you can write the equivilent of every unicode character. And assuming what you are writing is not a list of seldom used Chinese characters, do it in fewer bytes.
    The first part of that paragraph is odd. The first 128 characters of unicode, all unicode, is based on ASCII. The representational format of UTF8 is required to implement unicode, thus it must represent those characters. It uses the idiom supported by variable width encodings to do that.
    But here is what everyone trips over – they have an HTML or XML file, it works fine, and they open it up in a text editor. They then add a character that in their text editor, using the codepage for their region, insert a character like ß and save the file. Of course it must be correct – their text editor shows it correctly. But feed it to any program that reads according to the encoding and that is now the first character fo a 2 byte sequence. You either get a different character or if the second byte is not a legal value for that first byte – an error.
    Not sure what you are saying here. If a file is supposed to be in one encoding and you insert invalid characters into it then it invalid. End of story. It has nothing to do with html/xml.
    Point 2 – Always create HTML and XML in a program that writes it out correctly using the encode. If you must create with a text editor, then view the final file in a browser.
    The browser still needs to support the encoding.
    Now, what about when the code you are writing will read or write a file? We are not talking binary/data files where you write it out in your own format, but files that are considered text files. Java, .NET, etc all have character encoders. The purpose of these encoders is to translate between a sequence of bytes (the file) and the characters they represent. Lets take what is actually a very difficlut example – your source code, be it C#, Java, etc. These are still by and large "plain old text files" with no encoding hints. So how do programs handle them? Many assume they use the local code page. Many others assume that all characters will be in the range 0 – 127 and will choke on anything else.
    I know java files have a default encoding - the specification defines it. And I am certain C# does as well.
    Point 3 – Always set the encoding when you read and write text files. Not just for HTML & XML, but even for files like source code. It's fine if you set it to use the default codepage, but set the encoding.
    It is important to define it. Whether you set it is another matter.
    Point 4 – Use the most complete encoder possible. You can write your own XML as a text file encoded for UTF-8. But if you write it using an XML encoder, then it will include the encoding in the meta data and you can't get it wrong. (it also adds the endian preamble to the file.)
    Ok, you're reading & writing files correctly but what about inside your code. What there? This is where it's easy – unicode. That's what those encoders created in the Java & .NET runtime are designed to do. You read in and get unicode. You write unicode and get an encoded file. That's why the char type is 16 bits and is a unique core type that is for characters. This you probably have right because languages today don't give you much choice in the matter.
    Unicode character escapes are replaced prior to actual code compilation. Thus it is possible to create strings in java with escaped unicode characters which will fail to compile.
    Point 5 – (For developers on languages that have been around awhile) – Always use unicode internally. In C++ this is called wide chars (or something similar). Don't get clever to save a couple of bytes, memory is cheap and you have more important things to do.
    No. A developer should understand the problem domain represented by the requirements and the business and create solutions that appropriate to that. Thus there is absolutely no point for someone that is creating an inventory system for a stand alone store to craft a solution that supports multiple languages.
    And another example is with high volume systems moving/storing bytes is relevant. As such one must carefully consider each text element as to whether it is customer consumable or internally consumable. Saving bytes in such cases will impact the total load of the system. In such systems incremental savings impact operating costs and marketing advantage with speed.

  • Why differing Character Encoding and how to fix it?

    I have PRS-950 and PRS-350 readers, both since 2011.  
    In the last year, I've been getting books with Character Encoding that is not easy to read.  In playing around with my browsers and View -> Encoding menus, I have figured out that it has something to do with the character encoding within the epub files.
    I buy books from several ebook stores and I borrow from the library.
    The problem may be the entire book, but it is usually restricted to a few chapters, with rare occasion where the encoding changes within a chapter.  Usually it is for a whole chapter, not part, and it can be seen in chapters not consecutive to each other.
    It occurs whether the book is downloaded directly to my 950 reader or if I load it to either reader from my computer(s), which are all Mac OS X of several versions fom 10.4 to Mountain Lion.  SInce it happens when the book is downloaded directly, I figure the operating system of my computer is not relevant.
    There are several publishers involved, though Baen (no DRM ebooks) has not so far been one of them.
    If I look at the books with viewers on the computer, the encoding is the same.  I've read them in Calibre, in the Sony Reader App, and in Adobe Digital Editions 2.0.  It's always the same.
    I believe the encoding is inherent to the files.  I would like to fix this if I can to make the books I've purchased, many of them in paper and electronically, more enjoyable to read on my readers.
    Example: I’ve is printed instead of I've.
    ’ for apostrophe
    “ the opening of a quotation,
    â€?  for closing the quotation,
    and I think — is for a hyphen.
    When a sentence had “’m  for " 'm at the beginning of a speech (when the character was slurring his words) it took me a while to figure out how it was supposed to read.
    “’Sides, â€™tis only for a moon.  That ain’t long.â€?
    was in one recent book.
    Translation: " 'Sides, 'tis only for a moon. That ain't long."
    See what I mean? 
    Any ideas?

    Hi
    I wonder if it’s possible to download a free ebook with such issue, in order to make some “tests”.
    Perhaps it’s possible, on free ebooks (without DRM), to add fonts by using softwares like Sigil.

  • Bad initial encoding of report after WEB.SHOW_DOCUMENT(MyURL,'_blank')

    Hi,
    I use Developer9iDS 9.0.2 with installed Patch1 on Win2000 with IE 6.0.
    I need an HTMLCSS report in lithuanian language and want to prewiev them in new window in browser with Baltic(Windows) encoding.
    When I produce my report with RUN_REPORT_OBJECT and parameters REPORT_DESFORMAT=HTMLCSS and REPORT_DESTYPE=CACHE
    I found, that my report is OK in cache with all needed lithuanian characters.
    <html>
    <head>
    <meta http-equiv="Content-Type" content="text/html; charset=WINDOWS-1257">
    </head>
    <body dir=LTR bgcolor="#ffffff">
    </body></html>
    But when I open it with WEB.SHOW_DOCUMENT (MyURL/getjobid'||vJobId||'?server=<myserver>,'_blank');
    my browser shows it in bad encoding for first time and every time I must set encoding manualy to Baltic(Windows).
    That's will be very difficult for my users!!!
    I found in source that another one HTML document stands at the begining of my document followed by my document
    <html>
    <head>
    <base href="MyURL/reports/rwservlet/getfile/HXDcQdI8VQ/QC6Q=/58327007.htm">
    </head></html>
    <html>
    <head>
    <meta http-equiv="Content-Type" content="text/html; charset=WINDOWS-1257">
    </head>
    <body dir=LTR bgcolor="#ffffff">
    </body></html>
    I think, that first docume[i]Long postings are being truncated to ~1 kB at this time.

    Sorry, my question was ower 1 Kb. My question is:
    I am thinking, that additional document without META may be reason for bad initial encoding.
    Am I right? How can I suppress this first document? Is there another way for solution of this problem?
    Sorry for my english. Thanks for help!
    Algis

  • How can I tell what character encoding is sent from the browser?

    Hi,
    I am developing a servlet which supposed to be used to send and receive message
    in multiple character set. However, I read from the previous postings that each
    Weblogic Server can only support one input character encoding. Is that true?
    And do you have any suggestions on how I can do what I want. For example, I
    have a HTML form for people to post any comments (they may post in any characterset,
    like ShiftJIS, Big5, Gb, etc). I need to know what character encoding they are
    using before I can read that correctly in the servlet and save in the database.

    From what I understand (I haven't used it yet) 6.1 supports the 2.3
    servlet spec. That should have a method to set the encoding.
    Otherwise, I don't think you can support multiple encodings in one
    instance of WebLogic.
    From what I know browsers don't give any indication at all about what
    encoding they're using. I've read some chatter about the HTTP spec
    being changed so it's always UTF-8, but that's a Some Day(TM) kind of
    thing, so you're stuck with all the stuff out there now which doesn't do
    everything in UTF-8.
    Sorry for the bad news, but if it makes you feel any better I've felt
    your pain. Oh, and trying to process multipart/form-data (file upload)
    forms is even worse and from what I've seen the API that people talk
    about on these newsgroups assumes everything is ISO-8859-1.
    Emmy Lau wrote:
    >
    Hi,
    I am developing a servlet which supposed to be used to send and receive message
    in multiple character set. However, I read from the previous postings that each
    Weblogic Server can only support one input character encoding. Is that true?
    And do you have any suggestions on how I can do what I want. For example, I
    have a HTML form for people to post any comments (they may post in any characterset,
    like ShiftJIS, Big5, Gb, etc). I need to know what character encoding they are
    using before I can read that correctly in the servlet and save in the database.

  • Reading Advance Queuing with XMLType payload and JDBC Driver character encoding

    Hi
    I've got a problem retrieving the message from the queue with XMLType payload in Java.
    It was working fine in 10g database but after the switch to 11g it returns corrupted string instead of real XML message. Database NLS_LANG setting is AL32UTF8
    It is said that JDBC driver should deal with that automatically but it obviously don't in this case. When I dequeue the message using database functionality (DBMS_AQ package) it looks fine but not when using JDBC driver so Ithink it is character encoding issue or so. The message itself is enqueued by the database and supposed to be retrieved by dedicated EJB.
    Driver file used: ojdbc6.jar
    Additional libraries: aqapi.jar, xdb.jar
    All file taken from 11g database installation.
    What shoul dI do to get the xml message correctly?

    Do you mean NLS_LANG is AL32UTF8 or the database character set is AL32UTF8? What is the database character set (SELECT value FROM nls_database_parameters WHERE parameter='NLS_CHARACTERSET')?
    Thanks,
    Sergiusz

  • Wrong character encoding from flash to mysql

    Hi, im experiencing problems with character encoding not
    functioning correctly when sending from flash to mysql. What i am
    doing is doing a contact form in flash which then sends the value
    to a php file which takes the values and inserts them into a table.
    As i'm using icelandic charecters i need the char encoding to be
    either latin1 or utf8 in mysql, or at least i think so. But it
    seems that flash or the php document isn't sending in the same
    format as i have selected in mysql because all special icelandic
    characters come scrambled in the mysql table. Firefox tells me
    tough that the html document containing the flash movie is using
    utf-8.

    I don't know anything about Icelandic characters, but Flash
    generally really likes UTF-8. So it should be sending that if that
    is what it is starting with.
    You aren't using any kind of useCodePage? That will mess it
    up.
    Are you sure that the input method is Icelandic?
    In the testing environment can you list variables (from the
    debug menu) and see if they look proper? If they do then Flash is
    readying them correctly and the problem must be coming in further
    down stream.

Maybe you are looking for

  • Genius is evil - madly changing track info in the background

    I think I've figured out why iTunes, starting in version 8, has been behaving strangely. It's Genius. I started out noticing that when double-clicked a song in the list view to play that it would jump to the end of that album's list. This drove me nu

  • How to affect several data providers from navigation pane in BW7.0

    Hi all, I'm trying to use navigation pane for several numbers of data providers at BW7.0. This was possible in BW3.5 by using the web item called generic navigation block. By assining more than 1 data providers at the parameter of affected data provi

  • Cover Flow View - By Genre/Playlist?

    When I'm viewing a specific Genre using List View, then switch to Cover Flow, albums are displayed which don't match the genre/artist selected. Is it possible to use Cover Flow in a context sensitive manner, like artist, genre, or playlist?

  • Viewing multiple thumbs before importing?

    Hi, I am going through photo discs trying to accumulate relevent photos to make a Photo Book. I don't want to import every single photo on each disc in order to look at and them in iPhoto - and would prefer not to have to click on each photo to see w

  • System control panel malfunctioning after upgrade to windows 7 professional

    I have an issue I just bought with my recently purchased Toshiba satellite P755-S5390. I had upgraded to windows 7 professional. Now the the system control panel is malfunctioning and I lost one USB bay. This included the switch to turn on and off wi