Exporting accented characters (Unicode) in metadata

Hi,
I wrote a little script to export the metadata to an xml file. I am having some problems where the tags contain accented characters (áéö etc.).
I managed to write the xml file with unicode encoding and the unicode characters contained in the script are transferred correctly.
However the special characters in the metadata tags are being transformed to some unreadable characters.
Is there any way to make sure these characters are transferred correctly?
Thank you
Balint

David,
Thank you for the help.
I tried setting the output file encoding to UTF-8 or UTF-16, with or without setting the BOM. The accented characters still come out unreadable. I tried the examples that came with the SDK (output set to the console) and they too scramble the accented characters.
In other cases, where using the iterator function, the fields that contain accented characters are completely ignored.
Finally i found the .rawData function in Photoshop that could export the accented characters. This is a much slower solution, because Photoshop has to open all the files to read the metadata.
I also experimented with my own XML files in Bridge. I can open these, do my transformations, save them and all the characters are well preserved.
Balint

Similar Messages

Ipad will not display cyrillic/accented characters (unicode)

I sent an eMail (gmail) to several people that included English, Spanish and Ukrainian (cyrillic alphabet). [these would be Unicode] When I received the reply and looked at it on my iPad, the Spanish specific characters as well as the Ukrainian cyrillic characters were substituted by what looks like Chinese characters. When checking the eMail on my MacBook Pro and my iMac they all displayed properly. Any Ideas?

I use gMail in Firefox with UTF-8 encoding on my MacBook and iMac.
On my iPad I use the mail app with gMail feeding it. Here is what one of the messages looked like:
Just a snapshot of weather - Charlottetown, Chascomous (Argentina) and Winnipeg on Thursday, 5 December at 10am our time, 11am Chascomous, 8am Winnipeg, 4pm Ukraine.
��郋��郋郱郇�邾郋郕郈郋迣郋迡邽 - 虼訄�郅郋��訄�郇, Chascomous (��迣迮郇�邽郇訄) � ��郇郇�郈迮迣 � �迮�赲迮�, 5 迣��迡郇� 赲 10 �訄郇郕� 郇訄� �訄�, 11 �訄郇郕� Chascomous, 8 �訄郇郕� �� 請諸諱菩�, 4 赲迮�郋�訄苺郕�訄�郇訄.
S籀lo una instant獺nea de tiempo - Charlottetown, Chascomous (Argentina) y Winnipeg el jueves 5 de diciembre a las 10 a.m. nuestro tiempo, 11 a.m. Chascomous 8am Winnipeg, 16:00 Ucrania.

Accented characters in exported images getting munged

Lightroom 2.x, OS X 10.5.6, G5, Nikon D80, Raw/NEF/DNG
I'm trying to sort out a problem I'm having that looks similar to other issues with international characters, but is a little different. My problem is specific to keywords.
I don't think this should matter, but I generally import as DNG and never save metadata to XMP. I mainly export to JPEG (to a folder that I zip up) in order to upload photos (outside of Lr) to a sharing site.
What I'm finding is that if I tag my photos (in Lr) with keywords that have accented characters (which I use a fair amount because I often travel to Francophone areas to take photos, and my IPTC data and keywords use correct French spelling) upon upload to any photosharing site I try, the keywords will be munged, duplicated and/or generally messed up. For example, I have a keyword "café". Upon export I can see this correctly in the IPTC Keyword metadata (via a third party EXIF/IPTC viewer, and also using "Get Info" in the Finder). However, as soon as I upload this JPEG to any photo sharing site the photo will end up with a series of keywords applied to it: "caf, cafe, cafã©"
Similarly, a keyword like "naïve" will end up as two keywords: "naã¯ve, nave" I get similar results with HTTP uploaders as well as uploader plugins inside Lr. And so on for è, ô &etc. In the case of Flickr, there will be one "copy" of the keyword that is correct. The others will still be present.
I've read hints that suggest that the IPTC metadata stored in the exported file by Lr might not be Unicode-clean, which causes anything that parses and normalizes select metadata (as all online sharing sites will want to do) to choke. I've heard rumours that the XMP data is properly Unicode clean, but most sharing sites ignore that data, since they are pretty much interested only in EXIF or IPTC metadata (and rightly so, I suppose.)
Is this a defect in the way Lr creates metadata in exported images? Or is is a problem with the way all these sites suck in and normalize the metadata stored in the image? Or is it a subtle interaction between the two that results in incompatibilities?
I haven't tried experimenting with creating or finding a JPEG with no keywords at all and manually inserting IPTC keywords with accents, just to see what Flickr (et al) do upon upload. I've also tried various permutations of the keyword options in the Lr export dialogue with no significant change in behaviour.
I'm interested in who else might be seeing this problem. Is it a Lr only thing, or is it Lr-on-Mac? Or something specific to the two sites I upload photos to?
I should point out that I've been seeing this behaviour since Lr 1.x.

(Jao, AdobeRGB is an experiment. I have a colour managed browser, so I wanted to see if I could tell the difference. This is a test photo from my junk pile.)
I think I see the problem. With something like ExifTool you can see that the keywords are stored in a number of places in the EXIF/IPTC header: Keywords, Subject and Hierarchical Subject. It is the Keywords section that is not Unicode clean, apparently. This is even more apparent if you show the output encoded in HTML entities:
Keywords : Franais, Test Shots, Wings of Paradise Butterfly Conservatory, butterfly, cafŽ, l'h™tel, na•ve, tests
Subject : Français, Test Shots, Wings of Paradise Butterfly Conservatory, butterfly, café, l'hôtel, naïve, tests
Hierarchical Subject : Français, Test Shots|tests, Wings of Paradise Butterfly Conservatory, butterfly, café, l'hôtel, naïve
So, it depends on what section the keywords are taken from. It appears that both Flickr and 23 will try to find the keywords in as many places as possible. This explains the many duplicates I am seeing.
So, now I just have to figure out why Subject is so wrong, and how to correct it (or leave it blank.)

Accented characters in tagged text

I normally write copy in a text editor (TextPad), paste into InDesign, and format in the Story Editor. Now I want to format in TextPad and import tagged text. But I am finding that accented characters from the numeric keypad as well as other more frequently used characters - €, £ - are being misinterpreted. For instance R$8·25 million (about €3 million) comes in as R$8Â·25 million (about â‚¬3 million) and Fundação de Amparo à Pesquisa do Estado de São Paulo is rendered as FundaÃ§Ã£o de Amparo Ã Pesquisa do Estado de SÃ£o Paulo.
I have the options to format the text file as ANSI or DOS, and I have used the headers <ASCII-WIN> and <ANSI-WIN>, but there seems to be no combination that brings the text in cleanly.
Substituting Unicode values for these characters gives me what I want and I can build a library of them to add in TextPad, but this is counter to my aim of more productivity.
Can anyone give me a the formatting options to enable me to use the numeric keypad to generate the extended character set in a form that will import as tagged text?
k

Yes, sorry, it had to be somewhere from InDesign to InDesign. What I actually meant was that your tagged text started life in InDesign. My tagged text is starting life in TextPad.
TextPad can save text in DOS (I guess that means ASCII), ANSI and UTF-8. It's possible that the text in the file I was importing wasn't actually in ANSI. I've restructured it now and it is importing accented characters correctly except, bizarrely, for the Euro symbol (yes, there is a € in the font - Myriad Pro). If I use Alt0128 to create the symbol - or the keyboard AltGr4 - import stops at the last complete line before the € would be encountered, and nothing more is imported. If I use <0x20AC> the symbol imports properly and the whole file is placed.
I also worked with two identical (except for the substitution of some Unicode characters) versions of the same file. Both were ANSI, and both were headed with <ANSI-WIN> and no other definition information. One file picked up the definitions from the InDesign decument and rendered correctly, the other ignored all the paragraph styles and simply imported text at the default paragraph style.
So please accept some points for your collection Ken for leading me to re-check the actual code set in the document.
If you have any ideas about the € problem or why one version of the file would not pick up style definitions I would be intrigued to hear.
k
As an afterthought, I've attached a chunk of each file to show one that picks up definitions and one that doesn't. My initial thought was that the one that does pick up definitions would bleat about them being missing, while the other wouldn't. But if I place them in a blank file, both complain that the definitions are missing. However, If I make an appropriately named definition to match one in the file, the version that picks up definitions will match it and only bleat about the other missing definitions. The file that isn't picking up definitions will fail to honour the paragraph style, and will complain that the remainder are missing.

Accented characters showing up as ? in JRE1.3 but ok in 1.2

I'm implementing a database web interface product that utilizes JSPs (on SunOS 5.7).
The problem is in the search form. When using accented characters (French language), the JSP calls on URLEncode, but all accented characters show up as '?'.
However, when editing a record, using accented characters is not a problem (i.e., the accented characters are properly stored in the fields).
Back on the server, I ran a small program to output accented characters and also to call java.net.URLEncoder to convert the characters.
The default JDK is J2SE (1.3.1). Compiliing and running the program results in question marks.
Using JDK 1.2, the accented characters show up fine.
It would appear that URLEncoder is not at fault, but instead, JDK 1.3.1 doesn't seem to handle the accented characters.
I figure there must be a setting somewhere, but I'm not sure where.
Here's the program I used (written in Win98, using standard Win-based character set and Unicode format \u00xx; in Unix, "more" displays the Win accented characters fine but "vi" displays them as \xxx; compiles and displays perfectly when using JDK 1.2):
import java.net.*;
class mine {
public static void main(String args[]) {
System.out.println("��") ;
System.out.println(URLEncoder.encode("��")) ;
System.out.println("\u00e0\u00e2\u00e4");
System.out.println("\351");
System.out.println("\351\347\356\364\373\340\350\342\344\374\357") ;
The output with JDK 1.2 is:
��
%E9%E7%EE%F4%FB%E0%E8%E2%E4%FC%EF
��
The output with JDK 1.3.1 is:
%3F%3F%3F%3F%3F%3F%3F%3F%3F%3F%3F

Between jdk1.2 and jdk 1.3 the default encoding of the vm changed.
You can get it by executing:
System.out.println("Default Encoding:" + System.getProperty("file.encoding"));
or
System.out.println("Default Encoding:" + (new java.io.InputStreamReader(System.in)).getEncoding());
The default encoding is used during the conversion of bytes to strings and vice verca.
Assume your default encoding is ISO8859_1. Then calling new String(byte[]) is equivalent to calling
new String(byte[], "ISO8859_1")
Now if you are converting a character from one encoding scheme to another and there is no mapping
for this character in the target scheme. Then the character will be replaced by a default character
which is (quite often) the question mark.
You can set the default encoding for a vm by passing it as a command line parameter
java -Dfile.encoding=ISO8859_1
java -Dfile.encoding=Cp1252

Accented characters do not display correctly if there is a variable beside it

Hello,
We are experiencing a problem when we have text with accented characters an a variable beside it within the same text box.
The problem is that the accented characters in the text do not display correctly in the preview or publised course to Flash9
These characters display correctly in Captivate in edit mode.
This is the process we have followed:
Export to XML
Translated
Import XML
Publish to Flash9
Captivate version: 4
OS: Windows Vista SP2
We have tried to work with locale in Spanish with no luck, the only solution we have found is to put the text and the variable in different text boxes
I have pasted an image of the preview and also of the text in Captivate edit mode.
Any help will be very welcomed !
Tess

Hi there
I agree with Lilybiri that you should definitely file a Bug Report.
However, a thought occurs here. Have you tried placing the accented text in a User Defined variable?
Assuming this is a workaround, my thought is that you could then just have a caption with two variables. The User Defined variable containing the accented text followed immediately by the system variable providing the Project Name.
Cheers... Rick
Helpful and Handy Links
Captivate Wish Form/Bug Reporting Form
Adobe Certified Captivate Training
SorcerStone Blog
Captivate eBooks

Accented characters, XML, Flash

I have a flash application which is pulling information to
populate dynamic fields from two XML files. We have three
languages supported, and have been having problems with the
non-english accented characters displaying properly
when they are called from the XML. I have checked that the
XML files are encoded in UTF-8, and we have also tried writing
the html code, the unicode code, putting the information in a
C DATA shell. I'm out of options that I can think of, and I'd
appreciate if any other folks have some input on this issue.
I did find this other thread which seems to be about the same
issue, but there was no resolution given on it.
http://www.adobe.com/cfusion/webforums/forum/messageview.cfm?forumid=15&catid=194&threadid =1212142&highlight_key=y&keyword1=accent%20characters

So the same questions arise. What happens in the testing
environment if you trace it or if you go Debug and list variables?
Also you say they aren't displaying properly. What does that
mean? Exactly how are they improper?
PS: I'm absolutely certain that the other thread was not
correctly saving as UTF-8.

Mail with French / Spanish accented characters appear as Question marks

Hi
I am facing issues in mails that have French / Spanish accented characters in mail subject.Accented characters appear as Question marks (?) when received in the inbox.Mail subject is read from properties file.
Please let me know the following
- Should I have the entries in properties file in Unicode ? For example French accent character � is represented as á
- Should I replace msg.setSubject(subject); to msg.setSubject(subject,"UTF-8");
Please suggest.
Below is code snippet :
     Session session = getSession();
               // create a message
               Message msg = new MimeMessage(session);
               // set the from and to address
               InternetAddress addressFrom = new InternetAddress(from);
               InternetAddress[] addressTo = new InternetAddress[recipients.length];
               for (int i = 0; i < recipients.length; i++)
                    addressTo[i] = new InternetAddress(recipients);
               msg.setFrom(addressFrom);
               msg.setRecipients(Message.RecipientType.TO, addressTo);
               msg.setSentDate(new Date());
*               msg.setSubject(subject);*
               MimeMultipart mp = new MimeMultipart("related");

String subject = "\u00C 9tat de l'inscription en ligne";You need 4 hex digits after \u. You only included 3.
mailSubject = new String(subject.getBytes(), "UTF-8");
message.setSubject(MimeUtility.encodeText(mailSubject,"UTF-8", "B"), "UTF-8");Remove the above two lines. You're trying to make
this much too hard. All you need is:
message.setSubject(subject, "utf-8");
Should I use ISO8859_1 as Charset or UTF-8 ?I think the characters you're included are
representable in iso-8859-1 so you can use
that.
I am using Outlook express as my mail client. Do we have to decode it ?No.

Accented characters in LR for mac.

Hi all:
I'm having problems with pictures with "accented characters" in the name. I can work with these pictures, no question mark is shown, and LR can show where the actual file is located. The problem is as following: if I try to sync a folder, then LR says there are x missing files and the same x new files. If I tell LR to proceed, then the "new" pictures loose keywords, tags, edits, etc.
I have the same pictures and catalog both in windows and in mac (both version 4.4), as I'm trying to swicht, and in windows there is no problem. So it is a particular mac version problem. I tried to check the problem: if I delete these accented characters, then no new nor missing files are detected in sync, but this is not a solution (I have thousends of pictures and I want to keep the accented characters).
Is there any other solution?
This same problem has been mentioned several times, but nobody find a good solution. For example in
http://forums.adobe.com/thread/608096 and in http://lightroomkillertips.com/?p=2778
Thanks in advance.

> the problem is that I'm trying to put a "hat" over a consonant
I suspect you'll find that, the more precise and specific your question, the more greater the likelihood of getting a speedy and helpful answer.
> Looks like the software only allows the circumflex over vowels.
> Option-i does work on my computer for vowels. (Built-in
> keyboard).
Nonsense. Nothing to do with your keyboard. Btw, you still haven't told us what character you're trying to insert.
> I guess I will have to use the Equation Editor.
That's an option, but not the only one. Just for the sake of argument, let's assume it's p-hat (not the party hat, but the symbol for sample proportion).
(1) Specifically in Word, you can use fields (overstrike). Go to Insert > Field… > Equations and Formulas > Eq; then type
EQ \O (p,^)
and confirm. (Note that "EQ" will be inserted by Word automatically, you don't need to type it again.)
Once you get the hang of it, you won't need to insert it using menu & submenu commands, but by typing and applying styles.
(2) The right way is Unicode. Currently, Unicode defines >100k codepoints, so, for obvious reasons, it's neither necessary nor practical for most keyboard layouts to be able to access anything but a small subset. You can enter Unicode characters using Character Viewer (aka Special Characters), or, more efficiently, using the Unicode Hex Input keyboard layout.
If the character is already defined as such in Unicode, eg, for all (upside down A), then hold down Option and type its Unicode hex code, ie 2200 -- ∀. However, p-hat isn't defined as such, so it must be entered as a combining character sequence. The sequence is ((Latin small letter p) + (combining circumflex accent)), or (p + 0302) -- p̂.
Unfortunately, all is not sweetness and light. Although this is how it should be done, the display may disappoint. Questions 12b and 12c in the respective Unicode FAQ explain what happens and why.

Help with French Accent Characters Corrupted

Hi, All.
I am developing a Flex Front end connect with Java back-end.
The back-end sends data retrieved from XML file to the Flex
front-end; displays it in an TextArea, and allow user to change.
After user changes the data, hit "Save" button, then Flex sends the
data to the back-end.
I check with the back-end, make sure the data was correct
when sending out to Flex, and the French accented characters gets
corrupted when sending back from Flex. However, in Flex side, this
change cannot be awared. (i.e. The French accents characters
display correctly in Flex, but sending wrong character back). I'm
guessing that might be something related to character sets.
However, I cannot find anywhere to set character sets in
HTTPService. Anybody has idea?

Use Ariel MS Unicode font.

Greek language notes display loses accented characters

I can type in Greek in TextEdit, save the file as Unicode (UTF-16), and drag it into my iPod Notes folder.
In OS10.4 I can use accented characters in TextEdit but, when the file is loaded onto the iPod, it omits the accented characters. They aren't displayed incorrectly - they simply aren't displayed at all.
Anyone ever tried uploading Greek to an iPod? Any Greek users out there?
Martin

Yes, the accented characters appear OK on my wife's video iPod.
I suppose there's no chance that Apple will update the software on my not-very-old color iPod?
No, I thought not. Money-grabbing *****s.
I love Apple but not ALL the time.

WE + CE cause problem with accented characters

While creating a CFF font which contains the WE as well as the CE character set, it causes problems with most of Adobe programs (Photoshop 7, Illustrator 10, Photoshop Elements 2.0) under Mac OS X. Problem: the accented characters get replaced by Helvetica (Photoshop) or Myriad (Illustrator). Only InDesign treats it right.
I tested under Mac OS X 10.2 as well as 10.3; no difference. I tested on 'old machines', but fresh machines as well (machines which didn't have the fonts installed before); no difference. All the same problem.
Deleting the CE characters solves the problem, but yeah, I want to create a CFF font which contains both character sets.
Generating the font in FontLab or FDK doesn't make any difference.
So, what's the solution? Sounds like a mystery.
Bas

Two comments:
OpenType/CFF fonts should work to some degree on versions of Mac OS from 8.x on. "Some degree" means that pre-Mac OSX, you need ATM for them to work, and you only get the Roman char set. On Mac OSX, Unicode savvy apps will give you access to the entire charset, but at the moment only Adobe apps support any OpenType features.
About your problem with CE charset: this is a known problem, and is still a mystery. Adam Twardoch has reported the same problem and has supplied some test fonts. I have spent over a day looking over the fonts, and looking into old code, and can't find any problem in the font data. Investigating further is likely to be a several day effort, and will happen, but not soon.

Tomcat unable to read accented characters from MySQL

Folks,
Can anyone help with me this problem?
It seems that my version of Tomcat is unable to read accented characters from my MySQL Database.
I've checked in the Database and the characters are all correctly represented there. But when, in my servlet code, I do:
String author = results.getString("author_surname");If the String contains any accented character then the character shows as a '?'. (Even before it gets to the JSP - I'm writing the results straight to catalina.out).
Looking around these forums I found that some people suggested adding
?useUnicode=TRUE&characterEncoding=UTF-8;to the end of my jdbc url. As in:
<ResourceParams name="jdbc/connection">
//a whole load of other params
<parameter>
<name>url</name>
<value>jdbc:mysql://localhost:3306/bookshop?useUnicode=TRUE&characterEncoding=UTF-8</value>
</parameter>
</ResourceParams>inside my server.xml
But it doesn't seem to make any difference. In addition, I doubt I even need to use Unicode as the accents I need are only: �� etc.
(Incidentally, writing that line into my server.xml, tomcat complains that it should finish with a semi-colon. Is that correct? Even if I put in the semi-colon, it still complains!!)
Any suggestions on this would be much appreciated. Thank you.

user13109986 wrote:
HI,
From http://download.oracle.com/docs/cd/B10501_01/server.920/a96529/ch9.htm
My understanding is the JDBC Api converts the string from the database to UTF-16.. If so is there any way to disable the UTF-16 encoding at JDBC API?That's exactly what it's supposed to do. There isn't even any concept of what it would mean to disable that: Java characters are UTF-16 representations of Unicode code-points, so there isn't anything else it could do.
I still suspect the JDBC part is working correctly and your writing-to-file isn't. I found this quote in the Wikipedia article on Windows-1256:
Windows-1256 is a code page used to write Arabic (and possibly some other languages that use Arabic script, like Persian) under Microsoft Windows. This code page is not compatible with ISO 8859-6 and MacArabic encodings.So was there a particular reason you chose Cp1256 and not ISO-8859-6 as the charset to write to the file with?

Accented characters and UTF-8

Hi all,
I have a problem with accented characters. I read that Plumtree 5.0 is completely Unicode enabled and all HTTP responses from remote web services are converted to Unicode (UTF-16). So the portal sends back to the client browser all pages in UTF-8.
We have a lots of portlet (ASP and JSP) that write data in external DB, for example SQLServer. When I fill an html form with accented characters that have to be saved in our DB, they are saved in UTF-8 because the gateway converts the HTTP response. We want that the data are saved as if we don't use the portal (without conversion). I tried to change the Charset with the ASP code (Response.Charset). This solves only the problem of displaying the right characters in the browser.
Could you explain me better how the portal make the conversion and how can I solve my problem?
Thank you very much,
Alberto Marchiaro

It might be helpful to clarify a few things first: 1. Both Java and VB Script will store strings in UTF-16/Unicode. If you have some code in your ASP file that looks like this: Dim strDatastrData = Request.Form("SomeName") then if you were to examine memory for the variable strData, you would see 16 bit characters. The same is true for Java. 2. String data is almost never sent over HTTP as UTF-16/Unicode. 3. Both Java and VBScript perform an implicit character set transcoding when reading string data out of a request or when writing string data out to a response. 4. ASP will perform the transcoding according to the value of the Session.CodePage value. If you have Session.CodePage to 65001, then ASP will expect the string data to be in UTF-8 and it will transcode UTF-8 in the request into UTF-16 in VB Script. Similarly, a Session.CodePage value of 65001 will cause "Response.Write" to convert UTF-16/Unicode into UTF-8. 5. All of the above is separate from how Java or VB Script interact with the database. Generally speaking an ASP module will use ODBC to communicate with the database. The ODBC layer knows that VB Script keeps strings in UTF-16/Unicode. The ODBC layer will perform the proper conversion into the database character set. Plumtree always recommends using UTF-16/Unicode in the database. You can do this relatively easily by declaring your database columns using the "N" datatypes such as NCHAR and NVARCHAR. However even if you using some other character set, the ODBC layer should always properly transcode from VBScript. The importantly thing to remember is that data that is sent over HTTP is never written directly to the database without going through some ASP or JSP code. Since the ASP and JSP code always uses UTF-16/Unicode, there should never be any issue with how the data is sent over HTTP. Here is an explanation for how Session.CodePage, Response.CharSet and Session.LCID work in ASP:****************************************************
1. Response.CharSet
2. Session.CodePage
3. Session.LCID
Here is an explanation of these properties and why they are important to non-English ASP gadget writers:
1. Response.CharSet
This property will cause the HTTP contentType header to be set with the specified character set. The HTTP header is the best way to tell the recipient what the character set is. The Plumtree HTTPGadgetProvider will read the ContentType header and then know how to properly trancode the portlet text into UTF-16/Unicode. Here
is an example of how to set this property:
Response.CharSet = "UTF-8"
2. Session.CodePage
This property tells the ASP engine which character set to send text in. Please remember that all text is encoded in Unicode on the Web Server. It only gets turned into the client character set when it is send down to the client. The Session.CodePage tells the engine which codepage to transcode into when sending down to the client. Please note that this property is an "integer" property not a string. So you have to know the number of the codepage that you would like to transcode into. Here is an example of how to use this property:
Session.CodePage = 65001
3. Session.LCID
This property tells the ASP engine which locale is being used. The locale is used by various VBScript functions such as FormatDateTime in order to format the date correctly for the locale. If the locale is a French locale, then the date will be formatted according to French rules. The locale does not really effect the character set, but if the portlet writer is going to the trouble of setting the other properties, then they should also set the LCID too. Here is an example of how to set this property:
Session.LCID = 1041
Please note that the examples that I am using are the appropriate examples for Japanese and UTF-8. The values for these properties are different for different character sets. For example, for ISO-2022-JP, the values would be:
Request.CharSet = "iso-2022-jp"
Session.CodePage = 50220
Session.LCID = 1041
A very helpful URL to figure out the values to use with Request.CharSet and Session.CodePage is the following:
http://msdn.microsoft.com/library/default.asp?url=/workshop/Author/dhtml/reference/charsets/charset4.asp

URL Access Scripting & accented characters

I'm trying to write a small iTunes script that checks whether track titles of an album match those in the MusicBrainz database. But I'm running into a small problem when I try to retrieve the album information when it's title has accented characters.
Here's a snippet of code:
property baseURL : "http://musicbrainz.org/ws/1/release/"
property tempFile : "Data:temp:tempAlbumInfo.xml"
property albumTitle : "með suð í eyrum við spilum endalaust"
set queryURL to "?type=xml&releasetypes=Official&limit=10&title=" & albumTitle
set the clipboard to (baseURL & queryURL)
-- Fetch the releases info from musicbrainz
tell application "URL Access Scripting"
download (baseURL & queryURL) to tempFile replacing yes
end tell
This gives me a 503 error from URL Access Scripting, but when I paste the variable just copied to the clipboard into Firefox, the page loads perfectly. Pasting it into Safari, it doesn't work.
I'm guessing it has something to do with either the way URL Access Scripting and Safari rawurlencode their URLs, or some Unicode / Latin-1 problems.
Oh, and for album titles without accented characters it all works perfectly. Any idea why this is happening? Thanks!

Hello
I think you need to escape URI string properly.
Try replacing the line:
set queryURL to "?type=xml&releasetypes=Official&limit=10&title=" & albumTitle
with these two lines:
set escapedTitle to do shell script "echo " & quoted form of albumTitle & " | perl -Mutf8 -MURI::Escape -lne 'print uri_escape($_);'"
set queryURL to "?type=xml&releasetypes=Official&limit=10&title=" & escapedTitle
Hope this may help,
H
Message was edited by: Hiroto (fixed typo: URI::Escape is the correct module.)

Exporting accented characters (Unicode) in metadata

Similar Messages

Maybe you are looking for