UTF8 -character encoding

I have a problem showing other characters encoding besides english. i am using OLITE_10.3.0.2.0 and the repository in on a oracle 11g db.
the oracle 11g is configures to utf8 encoding during its installation and i can see the characters correctly of non english in char fields through my sql developer.
but when i synchronize my oracle client to my oracle server, the data that come are shown in ? and not in the correct characters. i have tried this
through my win32 oracle lite client and through my wince oracle lite client on a handheld.
in both clients the polite file has the DB_CHAR_ENCODING=UTF8 and no other setting for language.
in the server the polite.ini file has the DB_CHAR_ENCODING=UTF8 too.
the win 32 client is on windows pc that has the language installed, and this is the reason after all that i can see the characters through my sql developer. in my handheld wince client i can use the language characters in all other programs normally.
i have seen the characters that are shown as ? from the msql utility in both clients and from my own c# code.
what must i do in order to properly activate the utf8 in both oracle lite server and client in order to be able to access the characters from the repository?the DB_CHAR_ENCODING=UTF8 doesnt seem to do anything at all...
thank you for your time and efforts

i did the change in the registry. the setting was in english united kingdom and i changed it to utf8 but then the mobile server was raising exception and i couldnt connect even on the mobile manager from the webpage. then i changed it again to my nls_local as seen from http://download.oracle.com/docs/cd/E12095_01/doc.10302/e12548/cpolite.htm#BEHDABFD type is lanuage_country.char_encoding .
but nothing happened. i made a change in the database and after the msync, i went to msql and the new row was still in '?'. yes i see this problem after the msyncs. even after the first msync where the database was created i couldnt see any characters correct. but as i said even if i go through the msql and do an insert into with my characters , they are saved as '?' in the database. the character set on the repository oracle 11g db is utf8. the mobile server polite ini file has db_char_encoding=utf as i already mentioned.
rekounas what do you mean with "What happens if you put in a special character before a sync"? In your case, did you do anything specific to make the oracle lite work with these languages?
I would appreciate if you could provide me with a sample of your working polite.ini (server) and polite.txt (client) files of the [all databases] section
thank you
Edited by: user2955130 on Nov 12, 2009 11:31 PM

Similar Messages

Internationalisation ServletFilter and UTF8 Character-Encoding

Hello to all
I use a non common but nice way to internationalize my web-application.
I use a ServletFilter to replace the text-keys. So static-resources stays static-resources that can be cached and don't need to be translated each time they are requested.
But there is a little problem to get it working with utf-8.
In my opinion there is only one way to read the response-content. I have to use a own HttpServletResponseWrapper like recommended under [http://java.sun.com/j2ee/tutorial/1_3-fcs/doc/Servlets8.html#82361].
If I do so it is no more possible to use ServletQutputStream (wrapper.getOutputStream()) to write back the modified/internationalized content (e.g. with german umlauts/Umlaute) back to the response in the right encoding.
Writer out = new BufferedWriter(new OutputStreamWriter(wrapper.getOutputStream(), "UTF8"));Using PrintWriter for writing does not work. Because umlauts are send in the wrong encoding (iso-8859-1). With the network-sniffer Wireshark I've seen that � comes as FC that means as iso-8859-1 encoded character.
It obviously uses the plattforms default-encoding although the documentation does not mentions this explictly for the Constructor ([PrintWriter(java.io.Writer,boolean)|http://java.sun.com/j2se/1.4.2/docs/api/java/io/PrintWriter.html#PrintWriter(java.io.Writer,%20boolean)]).
So my questions:
1. Is there a way to get response-content without loosing option to call wrapper.getOutputStream().
or
2. can I set the encoding for my printwriter
or
3. can I encode the content before writing it to the printwriter and will this solve the problem
new String(Charset.forName("UTF8").encode(content).array(), "UTF8") did not work.
Here comes my code:
The Filter to tanslate the resources/response:
import java.io.IOException;
import java.io.PrintWriter;
import javax.servlet.Filter;
import javax.servlet.FilterChain;
import javax.servlet.FilterConfig;
import javax.servlet.ServletException;
import javax.servlet.ServletRequest;
import javax.servlet.ServletResponse;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import de.modima.util.lang.Language;
public class TranslationFilter implements Filter
     private static final Log log = LogFactory.getLog(TranslationFilter.class);
     public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) throws IOException, ServletException
          String lang = Language.setLanguage((HttpServletRequest) request);
          CharResponseWrapper wrapper = new CharResponseWrapper((HttpServletResponse)response, "UTF8");
          PrintWriter out = response.getWriter();
          chain.doFilter(request, wrapper);
          String content = wrapper.toString();
          content = Language.translateContent(content, lang);
          content += "                                                                                  ";
          wrapper.setContentLength(content.length());
          out.write(content);
          out.flush();
          out.close();
     public void destroy(){}
     public void init(FilterConfig filterconfig) throws ServletException{}
     }The response-wrapper to get acces to the response content:
import java.io.CharArrayWriter;
import java.io.PrintWriter;
import javax.servlet.http.HttpServletResponse;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
public class CharResponseWrapper extends TypedSevletResponse
     private static final Log log = LogFactory.getLog(CharResponseWrapper.class);
     private CharArrayWriter output;
     public String toString()
          return output.toString();
     public CharResponseWrapper(HttpServletResponse response, String charsetName)
          super(response, charsetName);
          output = new CharArrayWriter();
     public PrintWriter getWriter()
          return new PrintWriter(output, true);
     }The TypedResponse that takes care for setting the right http-header informations according to the given charset:
import java.nio.charset.Charset;
import java.util.StringTokenizer;
import javax.servlet.http.HttpServletResponse;
import javax.servlet.http.HttpServletResponseWrapper;
public class TypedSevletResponse extends HttpServletResponseWrapper
     private String type;
     private String charsetName;
      * @param response
      * @param charsetName the java or non-java name of the charset like utf-8
     public TypedSevletResponse(HttpServletResponse response, String charsetName)
          super(response);
          this.charsetName = charsetName;
     public void setContentType(String type)
          if (this.type==null && type!=null)
               StringTokenizer st=new StringTokenizer(type,";");
               type=st.hasMoreTokens()?st.nextToken():"text/html";
               type+="; charset="+getCharset().name();
               this.type=type;
          getResponse().setContentType(this.type);
   public String getContentType()
      return type;
   public String getCharacterEncoding()
          try
               return getCharset().name();
          catch (Exception e)
               return super.getCharacterEncoding();
     protected Charset getCharset()
          return Charset.forName(charsetName);
     }Some informations about the enviroment:
OS: Linux Debian 2.6.18-5-amd64
Java: IBMJava2-amd64-142
Apserver: JBoss 3.2.3
Regards
Markus Liebschner
Edited by: MaLie on 30.04.2008 11:52
Edited by: MaLie on 30.04.2008 11:54 ... typo
Edited by: MaLie on 30.04.2008 12:04

Hello cndvg
yes I did.
I found the solution in this forum at [Filter inconsistency Windows-Solaris?|http://forum.java.sun.com/thread.jspa?threadID=520067&messageID=2518948]
You have to use a own implementation of ServletOutputStream.
public class TypedServletOutputStream extends ServletOutputStream
     CharArrayWriter buffer;
     public TypedServletOutputStream(CharArrayWriter aCharArrayWriter)
          super();
          buffer = aCharArrayWriter;
     public void write(int aInt)
          buffer.write(aInt);
     }Now the CharResponseWrapper looks like this.
public class CharResponseWrapper extends TypedSevletResponse
     private static final Log log = LogFactory.getLog(CharResponseWrapper.class);
     private CharArrayWriter output;
     public String toString()
          return output.toString();
     public CharResponseWrapper(HttpServletResponse response, String charsetName)
          super(response, charsetName);
          output = new CharArrayWriter();
     public PrintWriter getWriter() throws IOException
          return new PrintWriter(output,true);
     public ServletOutputStream getOutputStream()
          return new TypedServletOutputStream(output);
     }Regards
MaLie

What every developer should know about character encoding

This was originally posted (with better formatting) at Moderator edit: link removed/what-every-developer-should-know-about-character-encoding.html. I'm posting because lots of people trip over this.
If you write code that touches a text file, you probably need this.
Lets start off with two key items
1.Unicode does not solve this issue for us (yet).
2.Every text file is encoded. There is no such thing as an unencoded file or a "general" encoding.
And lets add a codacil to this – most Americans can get by without having to take this in to account – most of the time. Because the characters for the first 127 bytes in the vast majority of encoding schemes map to the same set of characters (more accurately called glyphs). And because we only use A-Z without any other characters, accents, etc. – we're good to go. But the second you use those same assumptions in an HTML or XML file that has characters outside the first 127 – then the trouble starts.
The computer industry started with diskspace and memory at a premium. Anyone who suggested using 2 bytes for each character instead of one would have been laughed at. In fact we're lucky that the byte worked best as 8 bits or we might have had fewer than 256 bits for each character. There of course were numerous charactersets (or codepages) developed early on. But we ended up with most everyone using a standard set of codepages where the first 127 bytes were identical on all and the second were unique to each set. There were sets for America/Western Europe, Central Europe, Russia, etc.
And then for Asia, because 256 characters were not enough, some of the range 128 – 255 had what was called DBCS (double byte character sets). For each value of a first byte (in these higher ranges), the second byte then identified one of 256 characters. This gave a total of 128 * 256 additional characters. It was a hack, but it kept memory use to a minimum. Chinese, Japanese, and Korean each have their own DBCS codepage.
And for awhile this worked well. Operating systems, applications, etc. mostly were set to use a specified code page. But then the internet came along. A website in America using an XML file from Greece to display data to a user browsing in Russia, where each is entering data based on their country – that broke the paradigm.
Fast forward to today. The two file formats where we can explain this the best, and where everyone trips over it, is HTML and XML. Every HTML and XML file can optionally have the character encoding set in it's header metadata. If it's not set, then most programs assume it is UTF-8, but that is not a standard and not universally followed. If the encoding is not specified and the program reading the file guess wrong – the file will be misread.
Point 1 – Never treat specifying the encoding as optional when writing a file. Always write it to the file. Always. Even if you are willing to swear that the file will never have characters out of the range 1 – 127.
Now lets' look at UTF-8 because as the standard and the way it works, it gets people into a lot of trouble. UTF-8 was popular for two reasons. First it matched the standard codepages for the first 127 characters and so most existing HTML and XML would match it. Second, it was designed to use as few bytes as possible which mattered a lot back when it was designed and many people were still using dial-up modems.
UTF-8 borrowed from the DBCS designs from the Asian codepages. The first 128 bytes are all single byte representations of characters. Then for the next most common set, it uses a block in the second 128 bytes to be a double byte sequence giving us more characters. But wait, there's more. For the less common there's a first byte which leads to a sersies of second bytes. Those then each lead to a third byte and those three bytes define the character. This goes up to 6 byte sequences. Using the MBCS (multi-byte character set) you can write the equivilent of every unicode character. And assuming what you are writing is not a list of seldom used Chinese characters, do it in fewer bytes.
But here is what everyone trips over – they have an HTML or XML file, it works fine, and they open it up in a text editor. They then add a character that in their text editor, using the codepage for their region, insert a character like ß and save the file. Of course it must be correct – their text editor shows it correctly. But feed it to any program that reads according to the encoding and that is now the first character fo a 2 byte sequence. You either get a different character or if the second byte is not a legal value for that first byte – an error.
Point 2 – Always create HTML and XML in a program that writes it out correctly using the encode. If you must create with a text editor, then view the final file in a browser.
Now, what about when the code you are writing will read or write a file? We are not talking binary/data files where you write it out in your own format, but files that are considered text files. Java, .NET, etc all have character encoders. The purpose of these encoders is to translate between a sequence of bytes (the file) and the characters they represent. Lets take what is actually a very difficlut example – your source code, be it C#, Java, etc. These are still by and large "plain old text files" with no encoding hints. So how do programs handle them? Many assume they use the local code page. Many others assume that all characters will be in the range 0 – 127 and will choke on anything else.
Here's a key point about these text files – every program is still using an encoding. It may not be setting it in code, but by definition an encoding is being used.
Point 3 – Always set the encoding when you read and write text files. Not just for HTML & XML, but even for files like source code. It's fine if you set it to use the default codepage, but set the encoding.
Point 4 – Use the most complete encoder possible. You can write your own XML as a text file encoded for UTF-8. But if you write it using an XML encoder, then it will include the encoding in the meta data and you can't get it wrong. (it also adds the endian preamble to the file.)
Ok, you're reading & writing files correctly but what about inside your code. What there? This is where it's easy – unicode. That's what those encoders created in the Java & .NET runtime are designed to do. You read in and get unicode. You write unicode and get an encoded file. That's why the char type is 16 bits and is a unique core type that is for characters. This you probably have right because languages today don't give you much choice in the matter.
Point 5 – (For developers on languages that have been around awhile) – Always use unicode internally. In C++ this is called wide chars (or something similar). Don't get clever to save a couple of bytes, memory is cheap and you have more important things to do.
Wrapping it up
I think there are two key items to keep in mind here. First, make sure you are taking the encoding in to account on text files. Second, this is actually all very easy and straightforward. People rarely screw up how to use an encoding, it's when they ignore the issue that they get in to trouble.
Edited by: Darryl Burke -- link removed

DavidThi808 wrote:
This was originally posted (with better formatting) at Moderator edit: link removed/what-every-developer-should-know-about-character-encoding.html. I'm posting because lots of people trip over this.
If you write code that touches a text file, you probably need this.
Lets start off with two key items
1.Unicode does not solve this issue for us (yet).
2.Every text file is encoded. There is no such thing as an unencoded file or a "general" encoding.
And lets add a codacil to this – most Americans can get by without having to take this in to account – most of the time. Because the characters for the first 127 bytes in the vast majority of encoding schemes map to the same set of characters (more accurately called glyphs). And because we only use A-Z without any other characters, accents, etc. – we're good to go. But the second you use those same assumptions in an HTML or XML file that has characters outside the first 127 – then the trouble starts. Pretty sure most Americans do not use character sets that only have a range of 0-127. I don't think I have every used a desktop OS that did. I might have used some big iron boxes before that but at that time I wasn't even aware that character sets existed.
They might only use that range but that is a different issue, especially since that range is exactly the same as the UTF8 character set anyways.
>
The computer industry started with diskspace and memory at a premium. Anyone who suggested using 2 bytes for each character instead of one would have been laughed at. In fact we're lucky that the byte worked best as 8 bits or we might have had fewer than 256 bits for each character. There of course were numerous charactersets (or codepages) developed early on. But we ended up with most everyone using a standard set of codepages where the first 127 bytes were identical on all and the second were unique to each set. There were sets for America/Western Europe, Central Europe, Russia, etc.
And then for Asia, because 256 characters were not enough, some of the range 128 – 255 had what was called DBCS (double byte character sets). For each value of a first byte (in these higher ranges), the second byte then identified one of 256 characters. This gave a total of 128 * 256 additional characters. It was a hack, but it kept memory use to a minimum. Chinese, Japanese, and Korean each have their own DBCS codepage.
And for awhile this worked well. Operating systems, applications, etc. mostly were set to use a specified code page. But then the internet came along. A website in America using an XML file from Greece to display data to a user browsing in Russia, where each is entering data based on their country – that broke the paradigm.
The above is only true for small volume sets. If I am targeting a processing rate of 2000 txns/sec with a requirement to hold data active for seven years then a column with a size of 8 bytes is significantly different than one with 16 bytes.
Fast forward to today. The two file formats where we can explain this the best, and where everyone trips over it, is HTML and XML. Every HTML and XML file can optionally have the character encoding set in it's header metadata. If it's not set, then most programs assume it is UTF-8, but that is not a standard and not universally followed. If the encoding is not specified and the program reading the file guess wrong – the file will be misread.
The above is out of place. It would be best to address this as part of Point 1.
Point 1 – Never treat specifying the encoding as optional when writing a file. Always write it to the file. Always. Even if you are willing to swear that the file will never have characters out of the range 1 – 127.
Now lets' look at UTF-8 because as the standard and the way it works, it gets people into a lot of trouble. UTF-8 was popular for two reasons. First it matched the standard codepages for the first 127 characters and so most existing HTML and XML would match it. Second, it was designed to use as few bytes as possible which mattered a lot back when it was designed and many people were still using dial-up modems.
UTF-8 borrowed from the DBCS designs from the Asian codepages. The first 128 bytes are all single byte representations of characters. Then for the next most common set, it uses a block in the second 128 bytes to be a double byte sequence giving us more characters. But wait, there's more. For the less common there's a first byte which leads to a sersies of second bytes. Those then each lead to a third byte and those three bytes define the character. This goes up to 6 byte sequences. Using the MBCS (multi-byte character set) you can write the equivilent of every unicode character. And assuming what you are writing is not a list of seldom used Chinese characters, do it in fewer bytes.
The first part of that paragraph is odd. The first 128 characters of unicode, all unicode, is based on ASCII. The representational format of UTF8 is required to implement unicode, thus it must represent those characters. It uses the idiom supported by variable width encodings to do that.
But here is what everyone trips over – they have an HTML or XML file, it works fine, and they open it up in a text editor. They then add a character that in their text editor, using the codepage for their region, insert a character like ß and save the file. Of course it must be correct – their text editor shows it correctly. But feed it to any program that reads according to the encoding and that is now the first character fo a 2 byte sequence. You either get a different character or if the second byte is not a legal value for that first byte – an error.
Not sure what you are saying here. If a file is supposed to be in one encoding and you insert invalid characters into it then it invalid. End of story. It has nothing to do with html/xml.
Point 2 – Always create HTML and XML in a program that writes it out correctly using the encode. If you must create with a text editor, then view the final file in a browser.
The browser still needs to support the encoding.
Now, what about when the code you are writing will read or write a file? We are not talking binary/data files where you write it out in your own format, but files that are considered text files. Java, .NET, etc all have character encoders. The purpose of these encoders is to translate between a sequence of bytes (the file) and the characters they represent. Lets take what is actually a very difficlut example – your source code, be it C#, Java, etc. These are still by and large "plain old text files" with no encoding hints. So how do programs handle them? Many assume they use the local code page. Many others assume that all characters will be in the range 0 – 127 and will choke on anything else.
I know java files have a default encoding - the specification defines it. And I am certain C# does as well.
Point 3 – Always set the encoding when you read and write text files. Not just for HTML & XML, but even for files like source code. It's fine if you set it to use the default codepage, but set the encoding.
It is important to define it. Whether you set it is another matter.
Point 4 – Use the most complete encoder possible. You can write your own XML as a text file encoded for UTF-8. But if you write it using an XML encoder, then it will include the encoding in the meta data and you can't get it wrong. (it also adds the endian preamble to the file.)
Ok, you're reading & writing files correctly but what about inside your code. What there? This is where it's easy – unicode. That's what those encoders created in the Java & .NET runtime are designed to do. You read in and get unicode. You write unicode and get an encoded file. That's why the char type is 16 bits and is a unique core type that is for characters. This you probably have right because languages today don't give you much choice in the matter.
Unicode character escapes are replaced prior to actual code compilation. Thus it is possible to create strings in java with escaped unicode characters which will fail to compile.
Point 5 – (For developers on languages that have been around awhile) – Always use unicode internally. In C++ this is called wide chars (or something similar). Don't get clever to save a couple of bytes, memory is cheap and you have more important things to do.
No. A developer should understand the problem domain represented by the requirements and the business and create solutions that appropriate to that. Thus there is absolutely no point for someone that is creating an inventory system for a stand alone store to craft a solution that supports multiple languages.
And another example is with high volume systems moving/storing bytes is relevant. As such one must carefully consider each text element as to whether it is customer consumable or internally consumable. Saving bytes in such cases will impact the total load of the system. In such systems incremental savings impact operating costs and marketing advantage with speed.

XML parser not detecting character encoding

Hi,
I am using Jdeveloper 9.0.5 preview and the same problem is happening in our production AS 9.0.2 release.
The character encoding of an xml document is not correctly being detected by the oracle v2 parser even though the xml declaration correctly contains
<?xml version="1.0" encoding="ISO-8859-1" ?>
instead it treats the document as UTF8 encoding which is fine until a document comes along with an extended character which then causes a
java.io.UTFDataFormatException: Invalid UTF8 encoding.
at oracle.xml.parser.v2.XMLUTF8Reader.checkUTF8Byte(XMLUTF8Reader.java:160)
at oracle.xml.parser.v2.XMLUTF8Reader.readUTF8Char(XMLUTF8Reader.java:187)
at oracle.xml.parser.v2.XMLUTF8Reader.fillBuffer(XMLUTF8Reader.java:120)
at oracle.xml.parser.v2.XMLByteReader.saveBuffer(XMLByteReader.java:448)
at oracle.xml.parser.v2.XMLReader.fillBuffer(XMLReader.java:2023)
at oracle.xml.parser.v2.XMLReader.tryRead(XMLReader.java:972)
at oracle.xml.parser.v2.XMLReader.scanXMLDecl(XMLReader.java:2589)
at oracle.xml.parser.v2.XMLReader.pushXMLReader(XMLReader.java:485)
at oracle.xml.parser.v2.XMLReader.pushXMLReader(XMLReader.java:192)
at oracle.xml.parser.v2.XMLParser.parse(XMLParser.java:144)
as you can see it is explicitly casting the XMLUTF8Reader to perform the read.
I can get around this by hard coding the xml input stream to be processed by a reader
XMLSource = new StreamSource(new InputStreamReader(XMLInStream,"ISO-8859-1"));
however the manual documents that the character encoding is automatically picked up from the xml file and casting into a reader is not necessary, so I should be able to write
XMLSource = new StreamSource(XMLInStream)
Does anyone else experience this same problem?
having to hardcode the encoding causes my software to lose flexibility.
Jarrod Sharp.

An XML document should be created with 'ISO-8859-1' encoding to be parsed as 'ISO-8859-1' encoding.

Wrong character encoding from flash to mysql

Hi, im experiencing problems with character encoding not
functioning correctly when sending from flash to mysql. What i am
doing is doing a contact form in flash which then sends the value
to a php file which takes the values and inserts them into a table.
As i'm using icelandic charecters i need the char encoding to be
either latin1 or utf8 in mysql, or at least i think so. But it
seems that flash or the php document isn't sending in the same
format as i have selected in mysql because all special icelandic
characters come scrambled in the mysql table. Firefox tells me
tough that the html document containing the flash movie is using
utf-8.

I don't know anything about Icelandic characters, but Flash
generally really likes UTF-8. So it should be sending that if that
is what it is starting with.
You aren't using any kind of useCodePage? That will mess it
up.
Are you sure that the input method is Icelandic?
In the testing environment can you list variables (from the
debug menu) and see if they look proper? If they do then Flash is
readying them correctly and the problem must be coming in further
down stream.

Detecting character encoding from BLOB stream... (PLSQL)

I'am looking for a procedure/function which can return me the character encoding of a "text/xml/csv/slk" file stored in BLOB..
For example...
I have 4 files in different encodings (UTF8, Utf8BOM, ISO8859_2, Windows1252)...
With java I'can simply detect the character encoding with JuniversalCharDet (http://code.google.com/p/juniversalchardet/)...
thank you

Solved...
On my local PC I have installed Java 1.5.0_00 (because on DB is 1.5.0_10)...
With Jdeveloper I have recompiled source code from:
http://juniversalchardet.googlecode.com/svn/trunk/src/org/mozilla/universalchardet
http://code.google.com/p/juniversalchardet/
After that I have made a JAR file and uploaded it with loadjava to my database...
C:\>loadjava -grant r_inis_prod -force -schema insurance2 -verbose -thin -user username/password@ip:port:sid chardet.jarAfter that I have done a java procedure and PLSQL wrapper example below:
   public static String verifyEncoding(BLOB p_blob) {
       if (p_blob == null) return "-1";
       try
        InputStream is = new BufferedInputStream(p_blob.getBinaryStream());
        UniversalDetector detector = new UniversalDetector(null);
        byte[] buf = new byte[p_blob.getChunkSize()];
        int nread;
        while ((nread = is.read(buf)) > 0 && !detector.isDone()) {
            detector.handleData(buf, 0, nread);
        detector.dataEnd();
        is.close();
       return detector.getDetectedCharset();
       catch(Exception ex) {
           return "-2";
   }as you can see I used -2 for exception and -1 if input blob is null.
then i have made a PLSQL procedure:
function f_preveri_encoding(p_blob in blob) return varchar2 is
language Java name 'Zip.Zip.verifyEncoding(oracle.sql.BLOB) return java.lang.String';After that I have uploaded 2 different txt files in my blob field.. (first one is encoded with UTF-8, second one with WINDOWS-1252)..
example how to call:
declare
   l_blob blob;
   l_encoding varchar2(100);
begin
select vsebina into l_blob from dok_vsebina_dokumenta_blob where id = 401587359 ;
l_encoding := zip_util.f_preveri_encoding(l_blob);
if l_encoding = 'UTF-8' then
   dbms_output.put_line('file is encoded with UTF-8');
elsif l_encoding = 'WINDOWS-1252' then
   dbms_output.put_line('file is encoded with WINDOWS-1252');
else
    dbms_output.put_line('other enc...');
end if;
end;Now I can get encoding from blob and convert it to database encoding and store datas in CLOB field..
Here you have a chardet.jar file if you need this functionality..
https://docs.google.com/open?id=0B6Z9wNTXyUEeVEk3VGh2cDRYTzg
Edited by: peterv6i.blogspot.com on Nov 29, 2012 1:34 PM
Edited by: peterv6i.blogspot.com on Nov 29, 2012 1:34 PM
Edited by: peterv6i.blogspot.com on Nov 29, 2012 1:38 PM

UTF8 character set conversion for chinese Language

Hi friends,
Would like to some basic explanation on UTF8 feature,what does it help while converting the data from chinese language.
Would like to know what all characters this UTF8 will not support while converting from chinese language.
Thanks & Regards
Ramya Nomula

Not exactly sure what you are looking for, but on MetaLink, there are numerous detailed papers on NLS character sets, conversions, etc.
Bottom line is that for traditional Chinese characters (since they are more complicated), they require 4 bytes to store the characters (such as UTF-8, and AL32UTF8). Some mid-eastern characters sets also fall in this category.
Do a google search on "utf8 al32utf8 difference", and you will get some good explanations.
e.g., http://decipherinfosys.wordpress.com/2007/01/28/difference-between-utf8-and-al32utf8-character-sets-in-oracle/
Recently, one of our clients had a question on the differences between these two character sets since they were in the process of making their application global. In an upcoming whitepaper, we will discuss in detail what it takes (from a RDBMS perspective) to address localization and globalization issues. As far as these two character sets go in Oracle, the only difference between AL32UTF8 and UTF8 character sets is that AL32UTF8 stores characters beyond U+FFFF as four bytes (exactly as Unicode defines UTF-8). Oracle’s “UTF8” stores these characters as a sequence of two UTF-16 surrogate characters encoded using UTF-8 (or six bytes per character). Besides this storage difference, another difference is better support for supplementary characters in AL32UTF8 character set.
You may also consider posting your question on the Globalization Suport forum which pertains more to these types of questions.
Globalization Support

Setting DEFAULT character encoding ???

To set the default character encoding to UTF-8 for JVM I can do the
following on Solaris:
LC_ALL=en_US.UTF-8; java ....
How do I achieve the same on Windows?
Is there a java property where I can specify the default character
encoding or do I have to getBytes( str, "UTF-8") everywhere?
Thanks,
Artur...

Hi Artur,
there is a way. The property is file.encoding. Example:
import java.io.*;
public class ShowEncoding {
public static void main(String[] args) {
System.out.println("Default encoding: " +
new InputStreamReader(System.in).getEncoding());
}On my system I get the following results:
java ShowEncoding Default encoding: Cp1252
java -Dfile.encoding=UTF-8 ShowEncoding Default encoding: UTF8
java -Dfile.encoding=Latin1 ShowEncoding Default encoding: ISO8859_1I don't know, if setting this property using the -D argument is documented or implementation-independendent. But so far, it was working for me. For setting the locale, you may use the properties user.language and user.region. But this is undocumented and implementation-dependendent.
I wouldn't use these properties at all. I would rather specify the encoding in all constructors. Probably using an application property file.
Good luck!
Marcus.

Setting Character encoding programmaticaly?

Hi,
I am using Sun J2ME wireless toolkit 2.1, and i have a problem with characer encoding. I am receiving text from a .NET web service, and after some processing in the client, i send the string back.
The problem is, the string i am sending back includes Turkish characters. These are sent as question marks instead of characters.
I have failed to find a method that changes the character encoding used while making a web service call.
Actually, i could not see any way to change the encoding overall. For the emulator, property file can be used, but what about the devices i'll be deploying the app? It'd be really great if someone could point me in the right direction.
Best Regards

Hi,
My situation is as follows. I have .NET web services on the server side, and i am using mobile devices as clients. When i get a string from method A in web service , i can display it on the device screen without a problem. after that, if i send the same string that i've received from method A as a parameter to method B, the .NET code receives garbage instead of turkish chars.
At the moment i am encoding turkish chars at the client side, and decoding them at the .net web server processing code.
I'd like to try setting the encoding to utf8, but as i have written, i have not seen any way of doing this. Changing properties file for emulator is possible, but how can i do it for the target devices. I have not seen an api call for this purpose in midp or cldc docs. Thanks for your answer
Regards

Why, after all these years, can't Thunderbird auto-detect character encoding

judging by all the existing messages and complaints about this, not to mention erroneous posts that say the problem is solved when it isn't, I have to conclude Mozilla either doesn't believe this is a problem or doesn't care to fix it. The bottom line is that there is no way to tell Thunderbird to automatically display emails in the character coding format they were written in. I could understand cases where the headers are not properly filled in, but I see tons of emails in which the encoding is plainly there in the headers within the message source. You can force it, but if you do so via the menu VIEW->Character Encoding->UTF8 (for example) it won't "stick" if you view another message. But who would want it to "stick" permanently anyway? What the average user really wants is to be able to toggle VIEW->Character Encoding->Auto Detect from its default "off" to simply "on", and not have to bother with it anymore.
This is a problem that seems to have gone on forever, and it NEVER happens with other email clients. If there is some backdoor way to actually make autodetect work, I'd appreciate knowing about it. But more important, I think ALL users would appreciate it if it were not some secret "backdoor" setting, but a simple global menu choice for all accounts. Can Mozilla please fix this problem once and for all?

You said...
''Thunderbird is supposed to be using the encoding in the mail.''
I figured is "should", i'm just reporting that it doesn't
You said...
''Setting auto detect to on disables that.''
Please explain. I've looked at every setting I can find and there is no way to set auto detect to "ON". I DID try setting it to "universal" in an attempt top solve the problem, but I have since restored it to "off", because the universal setting doesn't help.
you said...
''"Based on your earlier response I assume you need to press the F10 key to see the tools menu you were refered to." ''
No... I never said that anywhere. I DID refer to Menu->View_>Character Encoding, and I did refer to right clicking on individual folders, to get to the properties dialog, and the general information tab. But F-10 doesn't do anything
You said...
''I have examines dozens of mails in my inbox and each honours the character encoding set in the HTML''
Well, mine NEVER did. A short example from an email I got today pretty much is exemplative of all mail I get from GMAIL...
--089e013a0572a067a404fc73ceda
Content-Type: text/plain; charset=UTF-8
Ok, very good. Thank you. Phoenix sent you a friend request on Facebook by
the way. Talk to you soon.
--089e013a0572a067a404fc73ceda
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
<p dir=3D"ltr">Ok, very good. Thank you. Phoenix sent you a friend request=
=C2=A0 on Facebook by the way.=C2=A0 Talk to you soon.</p>
--089e013a0572a067a404fc73ceda--
See those incidences pf "=C2=A0"? Each one displays as a strange character, a capitol A with a curved line over it. If I manually set my default encoding to UTF 8, the weird characters go away. If I leave it as Western, there is nothing I can do to tell Thunderbird to "auto detect".
Anyway, I suppose at this point that no one responsible for the product coding is seriously looking at my issue, which is why its never been solved. If anyone does intend to help track it down and solve it, I'll be happy to provide all the examples and screen shots they ask for. Otherwise.

Changing character encoding

Hi,
I have a procedure in my database that produces a .csv-output for Excel.
Using http headers I get Excel to open the "file" produced.
My problem is our swedish characters, åäö.
Excel (at least Excel 2003) wants iso-8859-1 encoding for these characters to work.
So my procedure uses convert() to go from database charset UTF8 to WE8ISO8859P1.
This worked fine under Oracle Portal but not so under Apex Listener on WebLogic.
I think the listener is converting my iso text to utf on the way to the browser.
Is this so?
I've read that it "defaults to utf8" and "bound to utf8" but nothing official.
My listener version is 1.1.3.243.11.40
Kind regards
Tomas

Hi Tomas,
I'm sorry it took some time to prepare an answer for you...
Excel (at least Excel 2003) wants iso-8859-1 encodingThis hasn't changed with 2007 as far as I've experienced it - it still doesn't like UTF-8 in CSV files unless you use the import function.
So my procedure uses convert() to go from database charset UTF8 to WE8ISO8859P1.I've experimented a lot on that issue. If you ever have to deal with EURO signs (and a few other specialities) I'd consider WE8ISO8859P15.
Anyway, since you say you use a procedure, I assume you aren't using the APEX standard CSV function for export, right? We had similar issues with UTF8-Character sets on OHS, but no problems with standard CSV export on APEX Listener (we received ANSI file encoding as requested by the client) so I came to the conclusion this is not an APEX Listener specific issue, but has to be something in the way we build our custom export process using htp.p and owa_util.
I'm not familiar with Portal and APEX, but I assume that Portal uses mod_plsql and set PlsqlNLSLanguage to a Windows charset, e.g. AMERICAN_AMERICA.WE8MSWIN1252 or even your local territory. I guess this affects the output stream handling, but it's not the recommended way to run APEX. Since APEX has been "renamed" from HTMLDB to APEX, the installation guide requires AL32UTF8 as NLSLanguage parameter, and this is what the APEX Listener enforces by not giving you any option on that. But as I said before, we had the same problems with exports on OHS, so we sometimes ignored the installation guide to get proper files.
Without that option, there is one option that definetly works and one that might work, but I haven't implemented it (yet). So I start with the working one: Prepare a blob and download it as complete file using WPG_DOCLOAD. I'm not sure, but I guess the APEX standard export performs a similar operation. An additional advantage of that approach is the fact that you get proper filesize information when the download starts, so progress and time estimation is accurate...
I implemented the following procedure for the generic export (download) part:
PROCEDURE csv_export (in_clob IN CLOB, in_filename IN VARCHAR2, in_charset IN VARCHAR2 DEFAULT 'WE8MSWIN1252')
AS
    l_blob           BLOB;
    l_length         INTEGER;
    l_dest_offset    INTEGER := 1;
    l_src_offset     INTEGER := 1;
    l_lang_context   INTEGER := DBMS_LOB.DEFAULT_LANG_CTX;
    l_warning        INTEGER;
BEGIN
    -- create new temporary BLOB
    DBMS_LOB.createtemporary(l_blob, FALSE);
    -- tranform the input CLOB into a BLOB of the desired charset
    /** @TODO: check whether lang_context should be an additional parameter
     ** the DBMS_LOB documentation doesn't say much about that parameter
    DBMS_LOB.convertToBlob( dest_lob     => l_blob,
                            src_clob     => in_clob,
                            amount       => DBMS_LOB.LOBMAXSIZE,
                            dest_offset => l_dest_offset,
                            src_offset   => l_src_offset,
                            blob_csid    => nls_charset_id(in_charset),
                            lang_context => l_lang_context,
                            warning      => l_warning);
    -- determine length for header
    l_length := DBMS_LOB.getlength(l_blob);
    -- create response header
    OWA_UTIL.mime_header('text/comma-separated-values', false);
    htp.p('Content-length: ' || l_length);
    htp.p('Content-Disposition: attachment; filename="'||in_filename||'"');
    -- close the headers
    OWA_UTIL.http_header_close;
    -- download the BLOB
    WPG_DOCLOAD.download_file( l_blob );
    -- release BLOB from memory
    DBMS_LOB.freetemporary(l_blob);
EXCEPTION
    WHEN OTHERS THEN
      DBMS_LOB.freetemporary(l_blob);
      RAISE;
END csv_export;To use this in your procedure, you need to do the following
DECLARE
-- other variables here
l_clob CLOB;
BEGIN
-- create new temporary CLOB
DBMS_LOB.createtemporary(l_clob, FALSE);
-- loop to prepare your content - just an example
-- use the one you use right now for streaming with htp.p
-- and replace the htp.p with the append
FOR a in 1..10
LOOP
    DBMS_LOB.append(dest_lob => l_clob, src_lob => 'any_VARCHAR2_or_CLOB_content_or_variable');
END LOOP;
-- perform actual export/download
csv_export(in_clob => l_clob, in_filename => yourfilename.csv);
-- stop any other rendering unless you want that, e.g. for a confirmation or something similar
apex_application.g_unrecoverable_error := true;
-- release BLOB from memory
    DBMS_LOB.freetemporary(l_clob);
EXCEPTION
    WHEN OTHERS THEN
      DBMS_LOB.freetemporary(l_clob);
      RAISE;
END;Adapt that example as needed.
If you need streaming for some reason, you should start some research on HTP.PUTRAW and the surrounding procedures for setting transferencoding and headers to fit to that mode. It should work as well, but I don't like the 2000 bytes size limit that comes along with RAW and I knew the BLOB-approach works for proper downloads...
I hope this helps you solve your problem.
-Udo

Locale and character encoding. What to do about these dreadful ÅÄÖ??

It's time for me to get it into my head how this works. Please, help me understand before I go nuts.
I'm from Sweden and we use a few of these weird characters like ÅÄÖ.
If I create a file called "övrigt.txt" in windows, then the file will turn up as "?vrigt.txt" on my Linux pc (At least in the console, sometimes it looks ok in other apps in X). The same is true if I create the file in Linux and copy it to Windows, it will look just as weird on the other side.
As I (probably) can't change the way windows works, my question is what I have to do to have these two systems play nicely with eachother?
This is the output from locale:
LANG=en_US.utf8
LC_CTYPE="en_US.utf8"
LC_NUMERIC="en_US.utf8"
LC_TIME="en_US.utf8"
LC_COLLATE=C
LC_MONETARY="en_US.utf8"
LC_MESSAGES="en_US.utf8"
LC_PAPER="en_US.utf8"
LC_NAME="en_US.utf8"
LC_ADDRESS="en_US.utf8"
LC_TELEPHONE="en_US.utf8"
LC_MEASUREMENT="en_US.utf8"
LC_IDENTIFICATION="en_US.utf8"
LC_ALL=
Is there anything here I should change? I have tried using ISO-8859-1 with no luck. Mind you that I want to have the system wide language set to english. The only thing I want to achieve is that "Ö" on widows should turn up as "Ö" i Linux as well, and vice versa.
Please save my hair from being torn off, I'm going bald here...

Hey, thanks for all the answers!
I share my files in a number of ways, but mainly trough a web application called Ajaxplorer (very nice btw...). The thing is that as soon as a windows user uploads anything with special chatacters in the file name my programs, xbmc, console etc, refuses to read them correctly. Other ways of sharing is through file copying with usb sticks, ssh etc. It's really not the way of sharing that is the problem I think, but rather the special characters being used sometimes.
I could probably convert the filenames with suggested applications but then I'll set the windows users in trouble when they want to download them again, won't I?
I realize that it's cp1252 that is the bad guy in this drama. Is there no way to set/use cp1252 as a character encoding in Linux? It's probably a bad idea as utf8 seems like the future way to go, but the fact that these two OS's can't communicate too well in this area is pretty useless if you ask me.
To wrap this up I'll answer some questions...
@EVRAMP: I'm actually using pcmanfm, but that is only for me and I'm not dealing very often with vfat partitions to be honest.
@pkervien: Well, I think I mentioned my forms of sharing above. (kul med lite arch-svenskar!)
@quarkup: locale.gen is edited and both sv.SE and en_US have utf-8 and ISO-8859 enabled and generated.
...and to clearify things even further. It doesn't matter if I get or provide a file via a usb stick, samba, ftp or by paper. All I want is for "Ö" to always be "Ö", everywhere.
I can't believe how hard this is to get around. Linus is finish for crying out loud. I thought he'd sorted this out the first thing he did. Maybe he doesn't deal with windows or their users at all

XML Character Encoding Using UTL_DBWS

Hi,
I have a database with WINDOWS-1252 character encoding. I'm using UTL_DBWS to call a web service method which echoes a given string. For this purpose, I do the following:
DECLARE
    v_wsdl CONSTANT VARCHAR2(500) := 'http://myhost/myservice?wsdl';
    v_namespace CONSTANT VARCHAR2(500) := 'my.namespace';
    v_service_name CONSTANT UTL_DBWS.QNAME := UTL_DBWS.to_qname(v_namespace, 'MyService');
    v_service_port CONSTANT UTL_DBWS.QNAME := UTL_DBWS.to_qname(v_namespace, 'MySoapServicePort');
    v_ping CONSTANT UTL_DBWS.QNAME := UTL_DBWS.to_qname(v_namespace, 'ping');
    v_wsdl_uri CONSTANT URITYPE := URIFACTORY.getURI(v_wsdl);
    v_str_request CONSTANT VARCHAR2(4000) :=
'<?xml version="1.0" encoding="UTF-8" ?>
<ping>
    <pingRequest>
        <echoData>Dev Team üöäß</echoData>
    </pingRequest>
</ping>';
    v_service UTL_DBWS.SERVICE;
    v_call UTL_DBWS.CALL;
    v_request XMLTYPE := XMLTYPE (v_str_request);
    v_response SYS.XMLTYPE;
BEGIN
    DBMS_JAVA.set_output(20000);
    UTL_DBWS.set_logger_level('FINE');
    v_service := UTL_DBWS.create_service(v_wsdl_uri, v_service_name);
    v_call := UTL_DBWS.create_call(v_service, v_service_port, v_ping);
    UTL_DBWS.set_property(v_call, 'oracle.webservices.charsetEncoding', 'UTF-8');
    v_response := UTL_DBWS.invoke(v_call, v_request);
    DBMS_OUTPUT.put_line(v_response.getStringVal());
    UTL_DBWS.release_call(v_call);
    UTL_DBWS.release_all_services;
END;
/Here is the SERVER OUTPUT:
ServiceFacotory: oracle.j2ee.ws.client.ServiceFactoryImpl@a9deba8d
WSDL: http://myhost/myservice?wsdl
Service: oracle.j2ee.ws.client.dii.ConfiguredService@c881d39e
*** Created service: -2121202561 - oracle.jpub.runtime.dbws.DbwsProxy$ServiceProxy@afb58220 ***
ServiceProxy.get(-2121202561) = oracle.jpub.runtime.dbws.DbwsProxy$ServiceProxy@afb58220
Collection Call info: port={my.namespace}MySoapServicePort, operation={my.namespace}ping, returnType={my.namespace}PingResponse, params count=1
setProperty(oracle.webservices.charsetEncoding, UTF-8)
dbwsproxy.add.map: ns, my.namespace
Attribute 0: my.namespace: xmlns:ns, my.namespace
dbwsproxy.lookup.map: ns, my.namespace
createElement(ns:ping,null,my.namespace)
dbwsproxy.add.soap.element.namespace: ns, my.namespace
Attribute 0: my.namespace: xmlns:ns, my.namespace
dbwsproxy.element.node.child.3: 1, null
createElement(echoData,null,null)
dbwsproxy.text.node.child.0: 3, Dev Team üöäß
request:
<ns:ping xmlns:ns="my.namespace">
   <pingRequest>
      <echoData>Dev Team Ã¼Ã¶Ã¤ÃŸ</echoData>
   </pingRequest>
</ns:ping>
Jul 8, 2008 6:58:49 PM oracle.j2ee.ws.client.StreamingSender _sendImpl
FINE: StreamingSender.response:<?xml version = '1.0' encoding = 'UTF-8'?>
<env:Envelope xmlns:env="http://schemas.xmlsoap.org/soap/envelope/"><env:Header/><env:Body><ns0:pingResponse xmlns:ns0="my.namespace"><pingResponse><responseTimeMillis>0</responseTimeMillis><resultCode>0</resultCode><echoData>Dev Team Ã¼Ã¶Ã¤ÃŸ</echoData></pingResponse></ns0:pingResponse></env:Body></env:Envelope>
response:
<ns0:pingResponse xmlns:ns0="my.namespace">
   <pingResponse>
      <responseTimeMillis>0</responseTimeMillis>
      <resultCode>0</resultCode>
      <echoData>Dev Team Ã¼Ã¶Ã¤ÃŸ</echoData>
   </pingResponse>
</ns0:pingResponse>As you can see the character encoding is broken in the request and in the response, i.e. the SOAP encoder does not take into consideration the UTF-8 encoding.
I tracked down the problem to the method oracle.jpub.runtime.dbws.DbwsProxy.dom2SOAP(org.w3c.dom.Node, java.util.Hashtable); and more specifically to the calls of oracle.j2ee.ws.saaj.soap.soap11.SOAPFactory11.
My question is: is there a way to make the SOAP encoder use the correct character encoding?
Thanks a lot in advance!
Greetings,
Dimitar

I found a workaround of the problem:
    v_response := XMLType(v_response.getBlobVal(NLS_CHARSET_ID('CHAR_CS')), NLS_CHARSET_ID('AL32UTF8'));Ugly, but I'm tired of decompiling and debugging Java classes ;)
Greetings,
Dimitar

What's the difference of character encoding between 1.4.0and1.4.2 in Linux

As i find, the character encoding about chinese in jdk1.4.2 no langer the same of jdk1.4.0.
In jdk1.4.0, the character encoding used the "file.encoding" system property, we often set the
property with "gb2312".
But in jdk1.4.2, i find that the default character encoding no longer used the "file.encoding" system property.
Who knows the reason?
Test Program:
public class B{
public static void main(String args[]) throws Exception{
byte [] bytes = new byte[]{(byte)0xD6,(byte)0xD0,(byte)0xCE,(byte)0xC4};
String s1 = new String(bytes);
String s2 = new String(bytes,System.getProperty("file.encoding"));
System.out.println("s1="+s1+" , s2="+s2);
System.out.println("s1.length=" + s1.length() + " , s2.length="+s2.length());
run four times and the result list:
[root@app15 component]# /usr/local/j2sdk1.4.0/bin/java -Dfile.encoding=ISO-8859-1 -cp . B
s1=中文 , s2=中文
s1.length=4 , s2.length=4
[root@app15 component]# /usr/local/j2sdk1.4.0/bin/java -Dfile.encoding=gb2312 -cp . B
s1=中文 , s2=中文
s1.length=2 , s2.length=2
[root@app15 component]# /usr/local/j2sdk1.4.2/bin/java -Dfile.encoding=ISO-8859-1 -cp . B
s1=中文 , s2=中文
s1.length=4 , s2.length=4
[root@app15 component]# /usr/local/j2sdk1.4.2/bin/java -Dfile.encoding=gb2312 -cp . B
s1=中文 , s2=??
s1.length=4 , s2.length=2
[root@app15 component]#

I don't know for sure, but:
-- The API documentation for String says that "new String(byte[])" uses "the platform's default charset".
-- The API documentation for Charset says "The default charset is determined during virtual-machine startup and typically depends upon the locale and charset being used by the underlying operating system."
You'll notice that it doesn't say anything about using the file.encoding system value, so presumably (based on your experiments) it doesn't. I did a search for "java default charset" and didn't find anything specific, but this site says "As of Java 1.4.1, the default Charset varies from platform to platform" and suggests you explicitly hard-code your charset. I would agree with that.

Why differing Character Encoding and how to fix it?

I have PRS-950 and PRS-350 readers, both since 2011.
In the last year, I've been getting books with Character Encoding that is not easy to read. In playing around with my browsers and View -> Encoding menus, I have figured out that it has something to do with the character encoding within the epub files.
I buy books from several ebook stores and I borrow from the library.
The problem may be the entire book, but it is usually restricted to a few chapters, with rare occasion where the encoding changes within a chapter. Usually it is for a whole chapter, not part, and it can be seen in chapters not consecutive to each other.
It occurs whether the book is downloaded directly to my 950 reader or if I load it to either reader from my computer(s), which are all Mac OS X of several versions fom 10.4 to Mountain Lion. SInce it happens when the book is downloaded directly, I figure the operating system of my computer is not relevant.
There are several publishers involved, though Baen (no DRM ebooks) has not so far been one of them.
If I look at the books with viewers on the computer, the encoding is the same. I've read them in Calibre, in the Sony Reader App, and in Adobe Digital Editions 2.0. It's always the same.
I believe the encoding is inherent to the files. I would like to fix this if I can to make the books I've purchased, many of them in paper and electronically, more enjoyable to read on my readers.
Example: Iâ€™ve is printed instead of I've.
â€™ for apostrophe
â€œ the opening of a quotation,
â€? for closing the quotation,
and I think â€” is for a hyphen.
When a sentence had â€œâ€™m for " 'm at the beginning of a speech (when the character was slurring his words) it took me a while to figure out how it was supposed to read.
â€œâ€™Sides, â€™tis only for a moon. That ainâ€™t long.â€?
was in one recent book.
Translation: " 'Sides, 'tis only for a moon. That ain't long."
See what I mean?
Any ideas?

Hi
I wonder if it’s possible to download a free ebook with such issue, in order to make some “tests”.
Perhaps it’s possible, on free ebooks (without DRM), to add fonts by using softwares like Sigil.

UTF8 -character encoding

Similar Messages

Maybe you are looking for