UTF-8 encoding Funny characters in DB

Dear All,
I am Facing a Critical issue which has been lagging, dragging
from 3 days, still couldn't figure out an issue
I've an XML file which am Getting it from a Server using
<cffile action="READ" file="#application.settings.paths"
variable="xml" charset="utf-8">
and using an XMLPARSE to parse that xml and trying to insert
that xml data into Database, in XML i've these
characters master
â€™s, which should be like a single quote
encoded like this but when saving to Database
using a coldfusion query, Data is saving as some Funny
Characters (i.e., master
?s),
xml encoding is in UTF-8 and i don't know how to convert that
zunk characters to normal characters like ( master's) - single
quote)
here are the things i tried.
in Coldfusion Administrator i added a Connection String
"useUnicode=true&characterEncoding=UTF-8"
and checked the box which says "
enable unicode for datasources configured for non-latin
characters"
Used a ConvertCharset Function passing xml object .. [/li]
ii)
<cfscript>
function convertCharset(str,charsetFrom,charsetTo)
var resultStr="";
var javaString="";
var byteArray="";
javaString = CreateObject("java", "java.lang.String");
javaString.init(str);
byteArray = javaString.getBytes(charsetFrom);
resultStr = CreateObject("java", "java.lang.String");
resultStr.init(byteArray,charsetTo);
return resultStr.toString();
</cfscript>
<cfcontent type="text/html; charset=UTF-8">
<cfset setEncoding("URL", "UTF-8")>
<cfset setEncoding("Form", "UTF-8")>
tried this method also
http://www.bennadel.com/blog/1206-Content-Is-Not-Allowed-In-Prolog-ColdFusion-XML-And-The- Byte-Order-Mark-BOM-.htm[/b
Please let me know if i need to do anything.. other than the
above methods,
Thanks

I am using SQL SERVER 2005 Database,
Field is "Description" Varchar(2000)
did you perform your test using the same table, code, etc.?
Yes
did you read in & dump out the xml file? Yes, I dumped
the xml file and if i open in NOTEPAD in UTF-8 (filetype) then i
see a single quote instead of that different character.
is it really utf-8?
so i think it's utf-8,
if your mojibake. chars are from an ms word document, then
they're not utf-8 but a superset of
latin-1.
they are not from MS WORD, i got an XML file which has all
the course and presentation information..structured properly except
those characters.. like
("younus has a Bachelorâ€™s degree). i see
that in UTF-8
so i want to know to which format do i need to convert to when
saving in Database (SQL SERVER 2005)
Thanks.

Similar Messages

[SOLVED] Problems opening folders with UTF-8 encoded characters

Hello everyone, I'm having an issue when I acess folders in all my programs ( except Dolphin File Manager). Every time I open the folder navigation window in my programs, folders with UTF-8 encoded characters ( such as "ç", "á ", "ó", "í", etc ) are not shown or the folder name not show these characters, therefore, I can not open documents inside these folders.
However, as you saw, I can type these characters normally. Here's my "locale.conf" :
LANG="en_US.UTF-8:ISO-8859-1"
LC_TIME="pt_BR.UTF-8:ISO-8859-1"
And here's the output of the command "locale -a" :
C
en_US.utf8
POSIX
Last edited by regmoraes (2015-04-17 12:55:19)

Thing is, when I run locale -a, I get
$ locale -a
C
de_DE@euro
de_DE.iso885915@euro
de_DE.utf8
en_US
en_US.iso88591
en_US.utf8
ja_JP
ja_JP.eucjp
ja_JP.ujis
ja_JP.utf8
japanese
japanese.euc
POSIX
So an entry for every locale I have uncommented in my locale.conf. Just making sure, by "following the steps in the beginner's guide", you also mean running locale-gen?
Are those folders on a linux filesystem like ext4 or on a windows (ntfs?)

WebLogic returns funny characters in XML

Hi,
I have an application developed on Tomcat that includes a servlet that receives and XML post, processes the XML, then constructs an XML response packet (using JDOM) and flushes the response packet back to the calling process. This worked great under Tomcat, however now that I have ported it to WebLogic 6.1sp3, WebLogic appears to be adding some funny characters at the beginning and end of the XML response packet. Here's what it looks like.
00ae
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE scansresponse SYSTEM "scansresponse.dtd">
<scansresponse>
<transferresult>failure</transferresult>
</scansresponse>
0000
Any ideas on what might be causing this and how I can correct this problem? I've tried many different things and am stumped.
Thanks,
Chris

Your funny characters are from http "chunking". See the following link for more info: http://ken.coar.org/slides/HTTP/Chunking.html .
Each chunk is a hex number that indicates how big the chunk is. This can be a good thing! Processing the xml in chunks is faster than looping through it line by line.

Romaji yen sign in Terminal in the UTF-8 encoding

Hello all,
I have a MacBook Pro with a Japanese keyboard running Mac OS X 10.6.2. In Romaji mode, the Japanese keyboard has a dedicated yen sign (¥) key, and Option-¥ produces a backslash (\). In Terminal, for some reason, the ¥ key produces \ without the Option modifier. (Option-¥ also produces \ in Terminal, which is normal behavior.)
A similar situation was discussed in an older topic, http://discussions.apple.com/thread.jspa?messageID=10665836 , where the problem was diagnosed as having the Shift JIS encoding enabled in Terminal. However, this doesn‘t reflect my situation, since the only encoding that is enabled in my Terminal is UTF-8 – and there‘s certainly a yen sign available in UTF-8.
I am able to type other UTF-8 characters in Terminal in Romaji mode; for example, I can type Option-e e to produce é, and entering the command *echo é | od -x* within Terminal shows that the correct UTF-8 byte sequence is generated for é. Since the command *echo -e '\0302\0245'* within Terminal will produce a yen sign there, the problem seems to be connected to the key mapping rather than to a stty interface problem.
Is there anyone running 10.6.2 with a Japanese keyboard who can type the ¥ key in Romaji mode in Terminal with the UTF-8 encoding enabled, and have a yen sign appear rather than a backslash?
(This topic was initially posted in the +Installation and Setup+ forum, and I‘ve taken the advice of a kind soul there to repost the topic in this forum.)

I don't know the exact reason why ¥ is forcefully converted to \ in Terminal (even in UTF-8 encoding), and anyway it would be better to add an option to turn off this conversion (or there may already be a hidden option which I can't find).
But the conversion may be helpful for many users, as expected from the following reasons:
I guess there is no key for backslash on the Japanese keyboard of MacBook Pro. If this is the case, then being able to input \ by just hitting the ¥-key (instead of typing option-¥) may be "useful" for may Terminal users (because \ is used much more frequently than ¥ in programs). Kotoeri has an option to swap ¥ and option-¥ keys (so hitting ¥-key inputs \ and option-¥ inputs ¥), but this setting is global (i.e., not restricted to Terminal.app), so making this as the default setting may confuse most of Japanese users (they don't use Terminal.app at all, but uses ¥ as the currency symbol in other apps). Even Terminal users would use ¥ more frequently than \ in apps other then Terminal, so don't want to modify the global setting.
Another reason may be that there are still many Japanese textbooks for programing which uses ¥ as the escape character (I guess you know why). For example the first C program looks like: printf("Hello World!¥n"); So many beginners would try to input ¥ as written in the textbook, without knowing the escape character in UTF-8 should be \, not ¥. Converting ¥ to \ may be helpful for these users (of course they would be surprised to see not ¥ but \ appears on the screen, but anyway the program would work).
You can send a bug report or feature request at:
http://www.apple.com/feedback/macosx.html

UTF-8 Encoding errors during nightly batch runs

My boss recently tasked me with researching (and hopefully resolving) why our XML frequently has UTF-8 encoding errors.
I've been in the IS world for less than a year now so please bear with me when it comes to terms, data flow, etc.
Overview:
Our Oracle DB spits out XML for the nightly batch runs into a file location, lets say C:\xPression\CustomerData\Certificate.xml. The XML is in Courier New font but some characters make their way into the XML but arent supported. The big one is the elongated ' - ' character. Just one instance of this and the entire XML fails.
When the batch job is run sometimes there are encoding errors (¿, ¡, -, etc) and every morning I have to come in, finding the invalid character, fix it and have the job re-run.
I want to know if there's a way so that the XML that comes out is always in the Courier New font, or is there a way to convert it.

I want to know if there's a way so that the XML that comes out is always in the Courier New font, or is there a way to convert it.
First thing first, an XML file is a text file, it doesn't have a "font" but an encoding.
The font is the graphical representation of characters and it is related to whatever client tool you're using to view the content, not to the content itself.
That being said, a lot of fonts do not support the full range of unicode characters so you may get replacement characters in some case.
We're missing some information to provide an answer :
- what's the database version?
- what's the character set of the database?
- how are you generating and writing the XML to the file ? UTL_FILE, dbms_xslprocessor, dbms_xmldom?
If the file is generated using UTF-8 encoding then the issue might just be that you're not using an UTF-8-enable editor.

UTF-8 encoding vs ISO 8859-1 encoding

The iTunes tech specs call for UTF-8 encoding of the XML feed file; a friend of mine uses feed generator software through his blog that uses ISO 8859 encoding. Is there a way to convert the latter to UTF-8 so that iTunes tags may be successfully added?
When I tried editing his XML file, I got error messages when I submitted the file to RSS feed validator sites (such as http://feedvalidator.org/. Any help or knowledge is appreciated because I am not the least bit expert in this coding arena.

You don't need to convert iso 8859-1 (us-ascii) to utf-8 unless you have nonstandard characters. Basically, ascii is a subset of utf-8 and for English it will serve you just fine. You can have iTunes tags in the xml file even if the file itself is encoded in iso 8859-1.
The error you see at feedvalidator.org is most likely a warning.
Hope this helps!
- Andy Kim
Potion Factory
http://www.potionfactory.com

UTF-8 encoding

Hi,
I'm having trouble with parsing XML stored in NCLOB column using UTF-8 encoding.
Here is what I'm running:
Windows NT 4.0 Server
Oracle 8i (8.1.5) EE
JDeveloper 3.0, JDK 1.1.8
Oracle XML Parser v2 (2.0.2.5?)
The following XML sample that I loaded into the dabase contains two UTF-8 multi-byte characters:
<?xml version="1.0" encoding="UTF-8"?>
<G><A>GBotingen, BrC<ck_W</A></G>
G(0xc2, 0x82)otingen, Br(0xc3, 0xbc)ck_W
If I'm not mistaken, both multibyte characters are valid UTF-8 encodings and they are defined in ISO-8859-1 as:
0xC2 LATIN CAPITAL LETTER A WITH CIRCUMFLEX
0xFC LATIN SMALL LETTER U WITH DIAERESIS
I wrote a Java stored function that uses the default connection object to connect to the database, runs a Select query, gets the OracleResultSet, calls the getCLOB method and calls the getAsciiStream() method on the CLOB object. Then it executes the following piece of code to get the XML into a DOM object:
DOMParser parser = new DOMParser();
parser.setPreserveWhitespace(true);
parser.parse(istr); // istr getAsciiStream
XMLDocument xmldoc = parser.getDocument();
Before the stored function can do other thinks, this code seems to throw an exception complaining that the above XML contains "Invalid UTF8 encoding".
Now, when I remove the first mutlibyte character (0xc2, 0x82) from the XML, it parses fine.
Also, when I do not remove this character, but connect via the jdbc racle:thin driver (note that now I'm not running inside the RDBMS as stored function anymore) the XML is parsed with no problem and I can do what ever I want with the XMLDocument. Note that I loaded the sample XML into the database using the thin jdbc driver.
One more thing, I tried two database configurations with WE8ISO8859P1/WE8ISO8859P1 and WE8ISO8859P1/UTF8 and both showed the same problem.
I'll appreciate any help with this issue. Thanks...

I inserted the document once by using the oci8 driver and once by using the thin driver. Then I used the DBMS_LOB package to look at the individual characters and convert those characters using the ASCII function.
It looks like that when I inserted the document using the OCI8 driver, they got converted into a pair of 191 (0xbf) characters. However, when I used the thin driver they ended up being stored as 195 (0xc3) and 130 (0x82).
So it looks like that the OCI8 driver is corrupting the individual characters and that if the characters is not corrupted they cause a following exception to be thrown:
Error: 440, SQL execution error, ORA-29532: Java call terminated by uncaught Java exception: java.io.UTFDataFormatException: Invalid UTF8 encoding. ORA-06512: at "SYSTEM.GETWITHSTYLE", line 0 ORA-06512: at line 1
Note that my other example of mutli-byte character (C<) also gets corrupted by the OCI8 driver but does not cause the above exception to be thrown if it's inserted via the thin driver.
null

Steps to UTF-8 Encoding with Oracle 8i and Weblogic 6.1SP1

What are the Steps to UTF-8 Encoding with Oracle 8i and Weblogic
          6.1SP1?
          I have:
          - Oracle 8.1.5 database created with character set=UTF8 and national
          character set=UTF8
          - Weblogic 6.1SP1 without any encoding mechanism set
          (though I did play with
          <jsp-param><param-name>encoding</param-name>
          <param-value>UTF-8</param-value>
          </jsp-param>
          in the weblogic.xml for a while though it seemed not to make a
          difference)
          - JSP pages set to content='text/html; charset=UTF-8'
          - JSP form POSTs set to enctype="UTF-8"
          I can copy and paste Chinese Kanji from a UTF8 encoded web page into
          form text boxes but when I post the data it comes back as different
          Kanji. Then once it is posted the Kanji stays the same on repeated
          posts. The same Kanji text also looks different when viewed in a form
          text box than when viewed as straight text on the page.
          Is there anything else? Or am I already encoding characters twice?
          Please help!
          Mel Christie


Hi Experts,
Please correct me if am asking you the question in wrong way.
I have ARCGIS with oracle database 10gr2 in production server.
My work is to connect AUTOCAD S/W (client computer which is connected in LAN) to ARCGIS in order to access the toposheets available in SDE user.
When iam trying to connect iam getting this error:The specified credentials are not valid or provider is not able to establish a connection.
I checked the path to production server by pinging and user/passcode too but not helpful.
Please help me in this , very urgent.
Thanks.
Edited by: user13355644 on Jul 3, 2010 3:53 AM
Edited by: user13355644 on Jul 22, 2011 2:55 AM

Iweb pages hosted on a web server do not look right. Funny characters appea

any web experts here?
I have designed a bunch of pages in iWeb. Published them and uploaded them to my web host. However when I view the files via the web, they are not correct. The most obvious is that quote marks, bullets and paragraph markers are shown with funny characters.
Take a look at www.dancingacademy.co.uk to see an example. Excuse the URL but its the only way I know of showing an example of what's wrong.
If I view the files locally on my mac (ie publish from iweb to a folder and open them from the local folder) then the files look fine.
It's only when they get sent from a webserver they look funny.
Any ideas what my web hosting company is doing wrong?

The server you are using is (mis)configured to force all browsers to use the wrong encoding. See this note (Server Settings section) for possible fixes:
http://homepage.mac.com/thgewecke/iwebchars.html

Parsing a UTF-8 encoded XML Blob object

Hi,
I am having a really strange problem, I am fetching a database BLOB object containing the XMLs and then parsing the XMLs. The XMLs are having some UTF-8 Encoded characters and when I am reading the XML from the BLOB, these characters lose their encoding, I had tried doing several things, but there is no means I am able to retain their UTF encoding. The characters causing real problem are mainly double qoutes, inverted commas, and apostrophe. I am attaching the piece of code below and you can see certain things I had ended up doing. What else can I try, I am using JAXP parser but I dont think that changing the parser may help because, here I am storing the XML file as I get from the database and on the very first stage it gets corrupted and I have to retain the UTF encoding. I tried to get the encoding info from the xml and it tells me cp1252 encoding, where did this come into picture and I couldn't try it retaining back to UTF -8
Here in the temp.xml itself gets corrupted. I had spend some 3 days on this issue. Help needed!!!
ResultSet rs = null;
    Statement stmt = null;
    Connection connection = null;
    InputStream inputStream = null;
    long cifElementId = -1;
    //Blob xmlData = null;
    BLOB xmlData=null;
    String xmlText = null;
    RubricBean rubricBean = null;
    ArrayList arrayBean = new ArrayList();
      rs = stmt.executeQuery(strQuery);
     // Iterate till result set has data
      while (rs.next()) {
        rubricBean = new RubricBean();
        cifElementId = rs.getLong("CIF_ELEMENT_ID");
                // get xml data which is in Blob format
        xmlData = (oracle.sql.BLOB)rs.getBlob("XML");
        // Read Input stream from blob data
         inputStream =(InputStream)xmlData.getBinaryStream();
        // Reading the inputstream of data into an array of bytes.
        byte[] bytes = new byte[(int)xmlData.length()];
         inputStream.read(bytes);
       // Get the String object from byte array
         xmlText = new String(bytes);
       // xmlText=new String(szTemp.getBytes("UTF-8"));
        //xmlText = convertToUTF(xmlText);
        File file = new File("C:\\temp.xml");
        file.createNewFile();
        // Write to temp file
        java.io.BufferedWriter out = new java.io.BufferedWriter(new java.io.FileWriter(file));
        out.write(xmlText);
        out.close();

What the code you posted is doing:
// Read Input stream from blob data
inputStream =(InputStream)xmlData.getBinaryStream();Here you have a stream containing binary octets which encode some text in UTF-8.
// Reading the inputstream of data into an
into an array of bytes.
byte[] bytes = new byte[(int)xmlData.length()];
inputStream.read(bytes);Here you are reading between zero and xmlData.length() octets into a byte array. read(bytes[]) returns the number of bytes read, which may be less than the size of the array, and you don't check it.
xmlText = new String(bytes);Here you are creating a string with the data in the byte array, using the platform's default character encoding.
Since you mention cp1252, I'm guessing your platform is windows
// xmlText=new new String(szTemp.getBytes("UTF-8"));I don't know what szTemp is, but xmlText = new String(bytes, "UTF-8"); would create a string from the UTF-8 encoded characters; but you don't need to create a string here anyway.
//xmlText = convertToUTF(xmlText);
File file = new File("C:\\temp.xml");
file.createNewFile();
// Write to temp file
java.io.BufferedWriter out = new java.io.BufferedWriter(new java.io.FileWriter(file));This creates a Writer to write to the file using the platform's default character encoding, ie cp1252.
out.write(xmlText);This writes the string to out using cp1252.
So you have created a string treating UTF-8 as cp1252, then written that string to a file as cp1252, which is to be read as UTF-8. So it gets mis-decoded twice.
As the data is already UTF-8 encoded, and you want the output, just write the binary data to the output file without trying to convert it to a string and then back again:// not tested, as I don't have your Oracle classes
final InputStream inputStream = new BufferedInputStream((InputStream)xmlData.getBinaryStream());
final int length = xmlData.length();
final int BUFFER_SIZE = 1024;                  // these two can be
final byte[] buffer = new byte[BUFFER_SIZE];   // allocated outside the method
final OutputStream out = new BufferedOutputStream(new FileOutputStream(file));
for (int count = 0; count < length; ) {
   final int bytesRead = inputStream.read(buffer, 0, Math.min(BUFFER_SIZE, (length - count));
   out.write(buffer, 0, bytesRead);
   count += bytesRead;
}Pete

How to save a UTF-8 encoded text file ?

hi People
I have a little script which reads the source text from a layer and saves it to a .txt file. This is on a Mac and all was good until recently when I tried opening the .txt file on a PC in Notepad and found my ˚ degree symbols all whack.
Resaving the .txt file in TextEdit as Unicode (UTF-8) encoding solved the problem, now opens fine in Notepad.
But ideally I'd like the script to output the .txt as UTF-8 in the first place. It's currently Western (Mac OS Roman). I've tryed adding in myfile.encoding = "UTF8" but the resulting file is still Western (and the special charaters have wigged out again)
any help greatly appreciated../daniel
    var theComp = app.project.activeItem;
    var dataRO = theComp.layer("dataRO").sourceText;
    // prompt user to save file
    var theFile = new File ("~/Desktop/"+ theComp.name + "_output.txt");
    theFile = theFile.saveDlg("Save an ASCII export file.");
    if (theFile != null) {          // check user didn't cancel dialog
        theFile.lineFeed = "windows";
        //theFile.encoding = "UTF8";
        theFile.open("w","TEXT","????");
        theFile.writeln("move details:");
        theFile.writeln(dataRO.value.toString());
    theFile.close();

Hi,
Got it, it seems, the utf-8 standard use 2-bytes (and more) encoding on accents and special characters.
I found some info there with some code http://ivoronline.com/Coding/Theory/Tutorials/Encoding%20-%20Text%20-%20UTF%208.php
However there was some error so I fixed it. (However for 3 and 4 bytes characters i didnt test it. So maybe you'll have to change back the 0xbf to 0x3f or something else.)
So here is the code.
Header 1
function convertCharToUTF(character){
    var utfBytes = "";
    c = character.charCodeAt(0)
    if (c < 0x80) {
        utfBytes = String.fromCharCode (c);
    else if (c < 0x800) {
        utfBytes = String.fromCharCode (0xC0 | c>>6);
        utfBytes += String.fromCharCode (0x80 | c & 0xbF);
    else if (c < 0x10000) {
        utfBytes = String.fromCharCode (0xE0 | c>>12);
        utfBytes += String.fromCharCode (0x80 | c>>6 & 0xbF);
        utfBytes += String.fromCharCode (0x80 | c & 0xbF);
    else if (c < 0x200000) {
        utfBytes += String.fromCharCode (0xF0 | c>>18);
        utfBytes += String.fromCharCode (0x80 | c>>12 & 0xbF);
        utfBytes += String.fromCharCode (0x80 | c>>6 & 0xbF);
        utfBytes =+ String.fromCharCode (0x80 | c & 0xbF);
        return utfBytes
function convertStringToUTF(stringToConvert){
    var utfString = ""
    for (var i = 0 ; i < stringToConvert.length; i++){
        utfString = utfString + convertCharToUTF(stringToConvert.charAt (i))
    return utfString;
var theFile= new File("~/Desktop/_output.txt");
theFile.open("w", "TEXT");
theFile.encoding = "BINARY"
theFile.linefeed = "Unix"
theFile.write("ï»¿");//or theFile.write(String.fromCharCode (0xEF) + String.fromCharCode (0xEB) + String.fromCharCode (0xBF)
theFile.write(convertStringToUTF("Your stuff éàçËôù"));
theFile.close();

Export SQL View to Flat File with UTF-8 Encoding

I've setup a package in SSIS to export a SQL view to a flat file and it's working fine. I now need to make that flat file UTF-8 encoded. The package executes but still shows the files as ANSI encoded.
My package consists of a Source (SQL View) -> Derived Column (casts the fields to DT_WSTR) -> Destination Flat File (Set to output UTF-8 file).
I don't get any errors to help me troubleshoot further. I'm running SQL Server 2005 SP2.

Unless there is a Byte-Order-Marker (BOM - hex file prefix: EF BB BF) at the beginning of the file, and unless your data contains non-ASCII characters, I'm unsure there is a technical difference in the files, Paul.
That is, even if the file is "encoded" UTF-8, if your data is only ASCII values (decimal values 0-127, hex 00-7F), UTF-8 doesn't really serve a purpose over ANSI encoding. Now if you're looking for UTF-8 with specifically the BOM included, and your data is all standard ASCII, the Flat File Connection Manager can't do that, it seems.
What the flat file connection manager is doing correctly though, is encoding values that are over decimal 127/hex 7F in UTF-8 when the encoding of the connection manager is set to 65001 (UTF-8).
Example:
Input data built with a script component as a source (code at the bottom of this post) and with only one WSTR output column hooked to a flat file destination component:
a string containing only decimal value 225 (german Eszett character - ß)
Encoding set to ANSI 1252 looks like:
E1 0D 0A (which is the ANSI encoding of the decimal character value 225 (E1) and a CR-LF (0D 0A)
Encoding set to UTF-8 65001 looks like:
C3 A1 0D 0A (which is the UTF-8 encoding of the decimal character value 225 (C3 A1) and a CR-LF (0D 0A)
Note that for values over decimal 127, UTF-8 takes at least two bytes and up to four for the remaining values available.
So, I'm comfortable now, after sitting down and going through this, that the flat file connection manager is working correctly, unless you need a BOM.
1
Imports System
2
Imports System.Data
3
Imports System.Math
4
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
5
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper
6
7
Public Class ScriptMain
8
    Inherits UserComponent
9
10
    Public Overrides Sub CreateNewOutputRows()
11
        Output0Buffer.AddRow()
12
        Output0Buffer.col1 = ChrW(225)
13
    End Sub
14
15
End Class
Phil

[Solved] Automount Generic MP3 Player with UTF-8 encoding

Hello, everybody!
Ubuntu refugee here.
So far I'm going fine with Arch, I just have a couple of problems related to my Generic USB Mp3 player:
1) I want HAL to mount the player with UTF-8 encoding. Right now, it shows Arabic characters as ??????.
2) In Ubuntu, it used to recognize the player as an MP3 player, give it a nice icon, and add it as a music source in Rhythmbox. In Arch, on the other hand, the player is mounted as a generic USB flash drive. How can I make Arch recognize it as an MP3 player?
Thanks in advance.
Last edited by farghal (2008-05-10 20:17:09)

GOT IT WORKING!! Yay!
I got the solution from here:
http://blog.pcode.nl/2006/08/24/introdu … io-player/
The trick is to make HAL identify your Digital Audio Player (DAP) by adding a rule to /usr/share/hal/fdi/information/10freedesktop/10-usb-music-players.fdi -- and since, in my case, Ubuntu already had a 10-usb-music-players.fdi file that recognized my player, all I had to do was boot up from an Ubuntu live cd and copy Ubuntu's 10-usb-music-players.fdi over Arch's.
Now my issues with Arch are down to only one: http://bbs.archlinux.org/viewtopic.php?pid=360647
Thanks everybody.
Last edited by farghal (2008-05-10 20:16:49)

UTF-8 and Chineese Characters

I have a JSP with the following line at the very top:
<%@ page contentType="text/html; charset=utf-8"%>
This is so that it will use UTF-8 encoding to display non-english characters. Doing this, allows me to display Arabic, Hebrew, and English characters that are encoded in UTF-8 format (i.e. \u0643). However, I still can not display Chineese characters. For example, I have the String \u4E2D being read from a file and outputed on to the JSP (no different than my other non-English characters) and it does not display properly (I only see a box in its place). Can anyone tell me why this is?
I do not have the proper Chineese character set downloaded, however I don't understand how the Hebrew and Arabic display properly, when I never explicitly downloaded any sort of character set for them.
Thanks in advance.

;-D
I'm only human!
I certainly agree that UTF-8 should work. Just thought that trying a couple of other encodings might work faster than trying to figure out why UTF-8 wasn't doing the job!
As for where the character set is stored... both IE6 and the JDK will have knowledge of the character set. However, this doesn't automatically mean that they are able to display it. Both require the right font to be able to do this, and neither English Windows IE6 or the JDK carry a font as standard that is able to display the Chinese character set. By installing the Chinese language pack, the font has now been provided, which is why everything's working happily.
As for being able to prompt the user in downloading this, I'm not entirely sure whether this is possible these days. This certainly happened in Windows 9x/NT4, where IE prompted you to download the pack, but this proved to be such an unpopular method that M$ took the prompt out, and now expect you to install it off disc as of Win2K.
Hope that helps!
Martin Hughes

How to change UTF-8 encoding for XML parser (PL/SQL) ?

Hello,
I'm trying to parse xml file stored in CLOB.
p := xmlparser.newParser;
xmlparser.parseCLOB(p, CLOB_xmlBody);
Standard PL/SQL parser encoding is UTF-8. But my xml CLOB contain ISO-8859-2 characters.
Can you advise me, please, how to change encoding for parser?
Any help would be appreciated.
null

Do you documents contain an XML Declaration like this at the top?
<?xml version="1.0" encoding="ISO-8859-2"?>
If not, then they need to. The XML 1.0 specification says that if an XML declaration is not present, the processor must default to assume its in UTF-8 encoding.

UTF-8 encoding Funny characters in DB

Similar Messages

Maybe you are looking for