Accented character applescript to unix

My applescript writes a list of file names to a text file for future handling by unix script. If the name contains an accented character (say sacute U0015b) that character will be represented in the text file by a decomposed form (that is 3 bytes, one for ascii s, and two 0xc2b4 for acute accent in UTF-8 encoding). This is different from standard UTF-8 representation for sacute which is two bytes only 0xc59b.
Now shell commands do not take decomposed representation at all so I'm stuck. Two lines of attack I can think of (assuming I understood the problem correctly) :
a) force AppleScript to output file names in real UTF-8 not decomposed. How?
b) transcode decomposed form to true UTF-8. Can textutil handle that? If yes what is IANA_name | NSStringEncoding for that decomposed encoding? I cannot find it. If not what will?
  Mac OS X (10.4.2)  

Hi Big Burro,
   I wouldn't mind knowing the answer to this myself. Let me provide some background I learned recently. Jun T. recently observed that the decomposed form is the required format for HFS Plus volumes, as specified this technote that he found, Technical Note TN1150: HFS Plus Volume Format. His link, which I reproduced here, points to the "Unicode Subtleties" anchor where the decomposed form is discussed. As a consequence, I'm not sure that you're going to be able to specify that your utilities report in a different format because they're just accurately reporting what they see.
   Also, I don't know what to tell you about conversion. It's common knowledge that character encodings are ambiguous but I didn't think that this "decomposed" form was what they meant by that. I'll have to defer to someone more knowledgeable about conversions. At least when they join in you'll have read the technote.
Gary
~~~~
   You can't have everything. Where would you put it?
         -- Steven Wright

Similar Messages

  • How to use an accented character in a mnemonic ?

    Hi all,
    I'd like to set the mnemonic of a component (no matter what kind of component) to an accented character such as � / � / � / � / � / �.
    but with setMnemonic(char c) it doesn't work.
    with setMnemonic(int code), I can't find a virtual key for any of those character in class KeyEvent (KeyEvent.VK_� ??).
    Has anyone an idea ?
    Thanks.
    Have a nice week-end.
    Herbien
    PS : I know it seems strange to set a mnemonic to a accented character, but I must implement an existing C++ application in Java, and the application must mostly be the same. The users of this application are used to work with the keyboard..... :-(

    girija_pathak wrote:
    How can I know which is the last character? and How to handle it?You use the DUMP() function in SQL in order to see the actual content of the column.
    E.g.
    select DUMP(bac_person_id) from bkmap_personid_stg where bac_person_id like '%27136317%'
    The decimal character values will be displayed - enabling you to see where and what control characters characters exist in the string value for that column.

  • Can't delete file or directory with accent character

    I have a directory within my iTunes folder structure that has an accent character in it which I am trying to delete.
    If I do an ls on the directory I get this:
    ls -al ls: Radiů Disney_ Jams Vol. 2: No such file or directory
    total 0
    drwxr-xr-x 3 scott staff 102 Jul 29 13:16 .
    drwxrwxr-x@ 21 scott staff 782 Jul 29 13:24 ..
    I've tried rm -rf * but after that I attempt an ls -l I still get: ls: Radiů Disney_ Jams Vol. 2: No such file or directory
    Any way to delete the directory with offending filename? If I attempt to delete the parent directory I get a "Directory not empty message"

    macwiz,
    I tried the SMB path as well but the files were acting as though they were written with indelible electrons. I remembered that the files were originally created and deposited into the folder from a windows machine. I logged into my folder from the same machine and was able to delete them without any problem.
    I appreciate your help and have saved your suggestions (I'm sure they'll come in handy).
    cheers

  • How to define shortcut for accented character?

    Does anybody know of an easy way to define a keyboard shortcut to procude an accented character?
    More specifically, I would like to define Alt-O to produce ō .
    I have heard of Ukelele, but that seems a it complicated to me.
    Best,
    Gabriel.

    There are several approaches to using diacritical marks in general, your case included:
    - to write currently in a certain language, which uses that char or those chars; e.g. ā (a macron) is frequent in Lithuanian or, as in your case, to transcribe Japanese words (or Latin vowel length, for example)
    - to write occasionally, e.g. in a linguistic paper written in English, and you have to quote, from time to time, something like in silvā multae bestiae sunt (Latin) ‘there are many beasts in the forest’.
    a macron is available if you activate Lithuanian keylayout, but for sure you are not familiar with it. The alternative is to use a keylayout for linguists or dialectologists like my US Academic, which is exactly for such use. You mau find this incomfortable though, if you frequently need this, and using dead keys may be cumbersone (in fact, it isn’t, is an issue of practising it).
    Another solution is indeed UKELELE, you may create your custom keylayout, perhaps of Programmers type: option/alt key + a = ā [a macron] etc.
    There is a lot of theory on this issue, see my web pages (in English)
    http://www.unibuc.ro/e/prof/paliga_v_s/soft-reso/

  • When my Firefox language settings are fr_fr or fr_ca Firefox does not display the e with acute accent character correctly when it is displayed in a javascript alert box. However, it does display it correctly when my language settings are just fr. Please t

    Firefox does not display the e with acute accent character correctly from a javascript alert box when my browser language settings are fr_ca or fr_fr. However, it does it correctly when my browser language setting is fr. How do i get it to display e with acute accent and other iso8859 characters correctly in a javascript alert box when my browser language settings are fr_fr and fr_ca?
    == This happened ==
    Every time Firefox opened

    Use Unicode (UTF-8) for those characters.
    Then you will always be sure that they are displayed correctly.

  • Xmlelement and accented character,

    hi,
    i am using xmlelement to generate a xml file, but if a string contains an accented character it is not escaped, so in the file i obtain 'à' and not '& agrave;' (or & #224; ).
    but if there is a '>' it is converted in '& gt;'
    can anybody help me?
    thanks
    Edited by: user515918 on 9-lug-2009 6.42

    maybe this
    SQL> select xmlelement(noentityescaping e,regexp_replace(asciistr('A smàll samplé'),'\\(.{4})','&#x\1;')).getstringval() xml from dual
    XML                                                                            
    <E>A sm&&#35;x00E0;ll sampl&&#35;x00E9;</E>  ?

  • JPI 1.6 and accented character ( or french keyboard )

    Hi all,
    Since we migrate from JInitiator 1.3.22 to JPI 1.6 we corrected some of our bugs : Oracle Forms runs on Vista and 64-bits OS, images are way faster and clearer. Although, users are not able to type any accented character, à è ô ï ç for example. I search through the Application server configuration files without any success.
    Someone has any idea ?
    We use Forms 10g, Oracle database 10.2.0.3, Oracle Application Server 10g and JPI 1.6. Accented characters were fine on JInitiator.
    Thank you

    This could be bug / patch no. 5526175. It's an issue with Oracle's applet classes and the Sun JRE. We encountered this bug with Forms 10.1.2.2.0 and JPI 1.5 and applied the patch mentioned above. In 10.1.2.3.0 (patch set 3) this is supposed to be fixed, though we didn't test this specific case yet. So just upgrade to OAS / iDS 10gR2 patch set 3 or ask Oracle Support for a patch for your specific version.
    edit:
    sorry, the bug no. is 5023945.
    the patch no. is 5526175 (one-off for 10.1.2.2.0)
    Message was edited by:
    leidner

  • How to find base character from an accented character

    Hi, given an accented character (�, �, �, etc...) is there a way to retrieve its base character? In the case above, a, o and c respectivelly?
    I searched in this forum and google and didn't find a definitive answer.
    The reason i need this is because in my database, some records have accented content, and now i need to generate a textfile to transfer daily to a bank, but the bank doesn't accept accented characters.
    Thanks.

    Decompose the Unicode string -- i.e., perform an NFC transformation -- and then strip off the diacritical marks, as done in VietPad editor.
    There is a native class, java.text.Normalizer, but it is not made public until Mustang release.
    http://java.sun.com/javase/6/jcp/beta/

  • Flat-File MA accented character conversion

    Hello, we are using FIM 2010 R2 SP1 (4.1.3599) with a flat file MA to import/sync a csv flat-file generated by a HR system. We are using UTF-8 code page. The HR system has many international people in it, so there are many accented or diacritic characters. 
    We need to convert those to regular English non-accented equivalents.  I know we could write sync rule or rules extension code to do this. However, there is an option on the select attributes properties page, under advanced a checkbox that says "Replace
    accented characters with non-accented variants", which looked like an easy way to fix this issue.  However, enabling that option does not seem to change anything during the import/sync process.  Has anyone had any success
    using that option, and is there anything we are missing?

    We tried multiple code pages in the HR ma, including multiple Latin variations (don't see anything specifically called ISO-8859-1).  However, none of them showed the various characters correctly except for UTF-8.  We were getting a lot of bad
    conversion characters (like tm and copywright) using the other code pages.  The names are very global with multiple character sets involved (Middle Eastern, Asian and European).   

  • Force same character encoding in UNIX as in Windows

    My program reads a simple text file and outputs the same on the console. The file contains special characters like àéèéàöäül etc. The goal of this exercise is to make java recognize special characters both on the UNIX and on the Windows system in the same manner.
    I wrote the program on a Windows machine and then transferred the class file to the UNIX system and ran it there. It is giving two different outputs for the different systems.
    Here is the code
    public class LocaleFile {
    public static void main(String [] args ) throws IOException{
    System.out.println("Deafult Locale :: "+Locale.getDefault());
    System.out.println("Deafult Characterset :: "+Charset.defaultCharset());
    System.out.println("Resetting defailt local to de_DE ... ");
    Locale newLocale=new Locale(Locale.GERMAN.toString(), Locale.GERMANY.toString());
    Locale.setDefault(newLocale);
    System.out.println("After resetting Locale :: "+Locale.getDefault());
    BufferedReader bufferedReader=new BufferedReader( new InputStreamReader( new FileInputStream("helloworld.txt"), "ISO-8859-15"));
    String line=bufferedReader.readLine();
    do{
    System.out.println(line);
    line=bufferedReader.readLine();
    while(line!=null);
    bufferedReader.close();
    Windows Output
    Deafult Locale :: de_CH
    Deafult Characterset :: ISO-8859-1
    Resetting defailt local to de_DE ...
    After resetting Locale :: de_DE_DE
    Character set 1: Arijit
    Character set 2: è!éà£
    Character set 3: ü?öä$
    Character set 4: []{}
    End of Test
    UNIX Output
    bash-3.00$ java LocaleFile
    Deafult Locale :: en
    Deafult Characterset :: US-ASCII
    Resetting defailt local to de_DE ...
    After resetting Locale :: de_DE_DE
    Character set 1: Arijit
    Character set 2: ?!???
    Character set 3: ????$
    Character set 4: []{}
    End of Test
    bash-3.00$
    *Structure of helloworld.txt*
    Character set 1: Arijit
    Character set 2: è!éà£
    Character set 3: ü?öä$
    Character set 4: []{}
    End of Test
    Hope to hear from you soon guys.
    Thanks
    Arijit

    gimbal2 wrote:
    Just a thought, but I'd say that it is the character encoding of the command/bash prompt that is misleading you here. It probably actually has nothing to do with the code or the data, just the way the OS is displaying it.
    edit: ninja'd by Baftos. :)*@gimbal2*
    Well, thank you. That did raise a doubt. I changed the program slightly to write to a file say "recorded.txt" instead of printing it to the console.
    Code
    BufferedReader bufferedReader=new BufferedReader( new InputStreamReader( new FileInputStream("helloworld.txt"), "ISO-8859-15"));
    BufferedWriter bufferedWriter=new BufferedWriter( new OutputStreamWriter(new FileOutputStream("recorded.txt")));
    String line=bufferedReader.readLine();
    do{
    bufferedWriter.write(line.toString());
    System.out.println(line);
    line=bufferedReader.readLine();
    while(line!=null);
    bufferedReader.close();
    bufferedWriter.close();
    However the output remained the same

  • How to run an applescript using unix command

    Hi All,
    Can any one help me in giving me the command for running an apple script thru postupgrade shell file in other words i wana run an apple script while running my new installer that will upgrade the old version of application but while installing i want to run an applescript so as i know ill have to give that applescript path in the postupgrade script but how to run it while installation and where to keep it (is it postupgrade). May be in short how can i run apple script thru unix command.

    Use the osascript command.
    In my installer, I use the DropDMG program to build my DMG disk image. It doesn't quit automatically, so I have to do the following:
    osascript -e "tell application \"DropDMG\" to quit"
    Type "man osascript" for more information.

  • Accented character problem fixed with Safari 5?

    Inquiring minds want to know.

    No. When I view document source, it appears the accented characters are font specific on users such as:
    http://discussions.apple.com/profile.jspa?userID=207549
    To avoid such problems, the server would have to distribute an ampersand based source character such as those described in the HTML specification here:
    http://rabbit.eng.miami.edu/info/htmlchars.html
    If it appears in Windows, an accented chracter may not appear on the Mac, or in a different font. If you convert any accented characters in data entry via the server to ampersand codes that are equivalent, then all platforms can see them. This is not a Safari 5 specific problem, this is a problem of how platforms define such characters.
    Message was edited by: a brody

  • Oracle 10g express - accent character

    Hi,
    We have a problem with the 10g Express database for Windows. If you create a column VARCHAR(2) for example, you can't insert a string like 'éé'. You can only insert one character with accent.
    This problem doesn't exist in Oracle 10g Standard Edition. Is there a mean to change this behaviour ?
    Thank you

    The behavior here doesn't depend on the edition of Oracle (express and standard, in other words, will behave the same), but on the character set and NLS_LENGTH_SEMANTICS initialization parameter.
    By default, when you declare a VARCHAR2(2), you are allocating 2 bytes of storage. If you are using a single-byte character set (i.e. Windows-1252), all characters require one byte of storage. If you are using a variable-length character set (i.e. UTF-8), some characters require 1 byte of storage, some require 2 bytes, and some will require 3 bytes. I'll wager that your XE database is using the UTF-8 character set, so accented characters will generally require 2 bytes of storage, hence you can only add one to a VARCHAR2(2).
    One option you always have is to explicitly specify whether you want character or byte semantics when you create a table. That is, a VARCHAR2(2 BYTE) is equivelent to the default declaration of VARCHAR2(2) but a declaration of VARCHAR2(2 CHAR) allocates space for two characters in the current character set.
    If you want the default when you declare a new column/ variable to be that Oracle should use character semantics rather than byte semantics, you can also set the initialization parameter NLS_LENGTH_SEMANTICS to CHAR from the default of BYTE.
    Justin

  • Applescript and Unix Alias

    Hi,
    I would like to know how do you identify an alias with Applesctipt ? I don't speak about a "finder/apple" like alias but a Unix "ln" like alias.
    I would like to find whether or not a file is an alias.
    Thanks & regards.

    Thank you for the 'file' suggestion.
    Please note that ...
    file -h
    ... presents 'file's help 'Usage' line; but ...
    file /Utilities
    ... (where 'Utilities' was created at the root level, via 'ln -s /Applications/Utilities /Utilities'), on a G4 PowerMac, running MacOS X 10.4.11, resulted with ...
    /Utilities: symbolic link to 'com.apple.Terminal.plist
    file /Utilities
    ... (where 'Utilities' was created at the root level, via 'ln -s /Applications/Utilities /Utilities'), on a MacBook (late 2007), running dog crap - a.k.a. MacOS X 10.5.2, resulted with ...
    '/Utilities: directory
    Still on the MacBook, I then created a '/dev/null' symbolic link (via 'ln -s /dev/null /dogcrap') ...
    file /dogcrap
    ... resulted with ...
    /dogcrap: character special (3/2)

  • Character encoding in unix environment.

    Hi,
    Users try to download files from my application. These files are uploaded by others. They may contain characters from foreign languages as well. The file names are displayed properly on the screen, but when the user clicks on the link to download the file, the servlet set-up to read the file path is unable to read the file name on UNIX with jdk1.5.0_06. The same piece of code works perfectly well on windows with same jdk.
    This is how i get the file name from the request:
    String filePath = new String((httpRequest.getServletPath() + httpRequest.getPathInfo()).getBytes("ISO8859_1"), "UTF-8");
    Same line of code throws out junk characters in unix.
    Please advice
    ~sandeep

    I would like to add that in the windows environment, the application is deployed on tomcat 5.x, but on the production (unix) machine, the request is routed through apache http server onto the tomcat server. Does this make any difference?

Maybe you are looking for