Non-english caracters in file names

Hi,
I use archlinux, KDE, Dolphin.
I have a problem with file names with the norwegian caracters ø æ å in some file names. I get the message that "the file does not exist". I can change the file name in the terminal, but it's a lot of work. In Dolphin the place of this letters there are a question sign. In the terminal there are different signs and a number.
Can anybody tell me what causes the problem and a way to solve it.

harida wrote:What I meant was, can it have something to do with the fact that I have downloaded these files from a torrent site,
and that they have been stored on ntfs by the user sharing them?
I don't know, but I just experienced something similar.
I used cpio to pass files from a Bluewhite64 JFS /home partition to my new Maxtor Basics 1TB external USB disk, which I had reformatted from NTFS to JFS. SOME file and directory names containing the letter 'ø' or 'Ø' are visible in file managers and in a console, yet when I try to open them the system claims they don't exist. I can NOT open them from the command line either. The names appear garbled under Bluewhite64 as well, but there I can open them without problems. I had always thought I would get safe copies with cpio...
I have LOCALE="nb_NO.utf8" in /etc/rc.conf, and
nb_NO.UTF-8 UTF-8
nb_NO ISO-8859-1
activated in /etc/locale.gen
Interestingly, with the keyboard mapped to 'no.latin1', the letter 'ø' appears as a 'funny' character on the console under runlevel 3. When I try the other choice for Norwegian, 'no.map.gz', I get a dvorak keyboard instead of qwerty. You can never win.
The problematic files in question, i.e. of the ones I copied, are old and have passed through various OS's. That may be part of of the problem, but it does not explain the Norwegian keyboard anomalies in Arch, or the fact that the files are loadable and readable under Bluewhite64.
Last edited by whaler (2009-04-26 23:12:37)

Similar Messages

How to load file thru reader which contains non-english char in file name

Hi ,
I want to know how to load file in english machine thru reader which contains non-english chars in file names (eg. 置顶.pdf)
as LoadFile gives error while passing unicode converted file name.
Regards,
Arvind

You don't mention what version of Reader? And you are using the AcroPDF.dll, yes?
Sent from my iPad

Non-english characters in file names show as question marks

It's probably iocharset=utf8 question, but it's not only cd-rom - native partitions with "defaults" options behave no better. What is the correct solution? Current locale is en_US.UTF-8, which should be OK.

native partitions with "defaults" options behave no better
What filesystems?
For ntfs read http://wiki.archlinux.org/index.php/HAL
Policies
NOTE: this is deprecated from hal => 0.5.10
and
mount.ntfs linking
As of hal => 0.5.10 the above policy may not work. This is a workaround forcing hal to use the ntfs-3g driver instead of the standard ntfs driver. Please note that this method will use the ntfs-3g driver for all NTFS drives on your system! As root create a symbolic link from mount.ntfs to mount.ntfs-3g.
# ln -s /sbin/mount.ntfs-3g /sbin/mount.ntfs
Possible issues using this method:
if mount is called with "-i" option it doesn't work
possible issues with the kernels ntfs module
Locale issues
If you use KDE, you may have problem with filenames containing non-latin characters. This happens because kde's mounthelper is not parsing correctly the policies and locale option. There is a workaround for this:
1) Remove the "/sbin/mount.ntfs-3g" which is a symlink. code: rm /sbin/mount.ntfs-3g
2) Replace it with a new bash script containing:
#!/bin/bash
/bin/ntfs-3g $1 $2 -o locale=en_US.UTF-8 #put your own locale here
3) Make it executable: chmod +x /sbin/mount.ntfs-3g
There is only a problem with partition labels containing spaces, so if you have such a label, replace the space with an underscore, otherwise when you try to mount it you will get an error.
4) Add NoUpgrade=sbin/sbin/mount.ntfs-3g to pacman.conf.
I think your understand Russian. Welcome to http://linuxforum.ru/index.php?showtopic=53488 and
http://archlinux.org.ru/

[AS] Problem with non English characters in file path

I wrote a script that exports a pdf file from ID, rasterizes it in PS, applies an action, saves it as another pdf file, and finally creates a Mail message, and attaches the file to it (the last part is written in AppleScript).
The problem is that it doesn't work when the path to this file contains non English characters.
This works:
make new attachment with properties {file name:"/Volumes/Macintosh HD/BackUp Tetard/Test.pdf"}
but this doesn't:
make new attachment with properties {file name:"/Volumes/Macintosh HD/BackUp Têtard /Test.pdf"}
I remember vaguely that I read somewhere that AppleScript can work with Unicode — in other words with such characters — starting from some version, don't remember which exactly, but it seems to me — Leopard.
I am on Mac OS X 10.4.11 right now. Will updating solve this problem? Does anybody know any solution to this problem: a scripting addition, some hidden setting, etc.
I made a little test: used a Russian character — ё and it works, but when I use — ê (Dutch) it doesn't. May it have something to do with the Region setting in International panel?
Thanks in advance,
Kasyan

Kasyan, as of Leopard AppleScript treats all text as Unicode pre this you can specify 'as Unicode text'. Try a test with these.
-- Leopard
set x to POSIX path of (path to desktop)
-- Pre Leopard
set x to POSIX path of (path to desktop as Unicode text)
-- Leopard
set x to POSIX path of (choose file without invisibles)
-- Pre Leopard
set x to POSIX path of ((choose file without invisibles) as Unicode text)

Infoview creating non-sensical bookmarks and file names

Thank you for any help or suggestions!
I am using Crystal Reports 2008, the CMC repository, and InfoView. I am producing PDF reports, many with bookmarks based on the group tree.
I have found that when I run my report off my workstation, all is fine, but when I run the same report out of InfoView, the top level bookmark is a string of nonsensical numbers and characters.
Also, when emailing the report via the schedule option in InfoView, a similar non-sensical name is created.
Is there a way for me to ensure that this top level bookmark/file name something readable, perhaps the name of the report?
Thank you for your help!

To fix your locales, see http://wiki.archlinux.org/index.php/Configuring_locales -- in specific, you'll need to uncomment the en_US.utf8 locale at least, since that's the one that's specified in your rc.conf

Java.io.File and non-unicode characters in file name

Unix filesystem object names are byte sequences. These byte sequences are not required to correspond to any character sequence in the current or any locale. How do I open a file if it has characters that do not corrospond to a valid unicode encoding for some current locale? Unless I am missing something, if I do a list on a parent directory that has some file names like this, those file names do not get added to the list. Hmmm....
R.

OK, create.c is a program that will create a file whose name is not a character in the 'ja' locale.
Lister.java defines a class that lists files in the current directory. For each file, it spits out the 'toString()' version of the file, the char array of the name as hex, and the 'getBytes' byte array of the name.
So, what you can do is compile and run create.c, which will create a file whose name is a single byte whose hex value is 99. Then compile and run Lister.java, which will give you the following output (shown for two different locales:
$ export LANG=
$ java Lister
name:?; chars:99,; bytes:99,
$ export LANG=ja
$ java Lister
name:?; chars:fffd,; bytes:3f,
---------------------------------------------Note that when running in the JA locale, there is no character corresponding to byte value 0x99. So, Java uses the replacement character 0xFFFD, and the '?' character 0x3F, as a replacement.
The point is that there are files which Java cannot uniquely represent as a straight String. I suppose we could get the filename via JNI, do the conversion ourselves, and then use the private-use area of Unicode to encode all our strings, but ugh.
//create.c
#include <stdio.h>
int main()
   const char* name = "\x99";
   FILE* file = fopen( name, "w" );
   if( file == NULL )
      printf( "could not open file %s\n", name );
      return 1;
   fclose( file );
   return 0;
// Lister.java
import java.io.*;
public class Lister
    public static void main( String[] args )
        new Lister().run();
    public void run()
        try
            doRun();
        catch( Exception e )
            System.out.println( "Encountered exception: " + e );
    private void doRun() throws Exception
        File cwd = new File( "." );
        String[] children = cwd.list();
        for( int i = 0; i < children.length; ++i )
            printName( children[ i ] );
    private void printName( String s )
        System.out.print( "name:" );
        System.out.print( s );
        System.out.print( "; chars:" );
        printCharsAsHex( s );
        System.out.print( "; bytes:" );
        printBytesAsHex( s );
        System.out.println();
    private void printCharsAsHex( String s )
        for( int i = 0; i < s.length(); ++i )
            char ch = s.charAt( i );
            System.out.print( Integer.toHexString( ch ) + "," );
    private void printBytesAsHex( String s )
        byte[] bytes = s.getBytes();
        for( int i = 0; i < bytes.length; ++i )
            byte b = bytes[ i ];
            System.out.print( Integer.toHexString( unsignedExtension( b ) ) + "," );
    private int unsignedExtension( byte b )
        return (int)b & 0xFF;
}

Non English caracters in Policy Server invitation mail

Letters that are not in the English alphabet do not come out as they should when invitation and confirmation mails are sent from Adobe Policy Server.
In my case the Norwegian letters Æ Ø Å are not showing correct. But I'm guessing this goes for all other non eng. letters.
Example Å = Ã¥
I have installed Adobe Policy Server (automatic install) with JBOSS/Tomcat and use the IIS smtp server. Does anyone know where I have to do changes to get things correct?
Regards
Michael Sletvold

Hello Chris
Thank you for pointing me in the right direction. However I can not get it to work. It said utf8 and not utf-8 in the jboss-run.bat so I have tired both entries in the run.bat file (one at a time):
run.bat
set JAVA_OPTS=%JAVA_OPTS% -Xms128m -Xmx512m -Dfile.encoding=utf-8
set JAVA_OPTS=%JAVA_OPTS% -Xms128m -Xmx512m -Dfile.encoding=utf8
I also changed the jboss-run.bat to -Dfile.encoding=utf-8 without any sucsess.
The regional settings on the win2003 server is set to Norwegian and I have a full server restart after each time I make a change in the *.bat file. Any tip on what I might be doing wrong would be appretiated.
Regards
Michael

[SOLVED!] On USB drives, problems with non-English chars and HAL

Hello,
I am having a problem with non-English caracters (áãàçéẽê...) on files stored on my USB drive.
On Windows they're created with the correct name. But on Linux the files have the non-English characteres replaced by '?' and are not accessible.
If I manuallly mount the drives using 'mount -o iocharset=utf8 /dev/sdb1 /media/usbdisk' the characters are OK, so I think I just need to get HAL to pass the correct parameters to mount. However I don't know how to do that, and haven't found any good solution.
I tried to build a custom kernel setting the default charset as UTF-8 and it didn't work.
Any ideas? I'm using x86-64, HAL 0.5.13-3 and my locale is pt-BR.UTF-8.
Thanks!
EDIT: Actually, this is not a HAL problem, but a problem with 'exo'. For the solution, I edited /etc/xdg/xfce4/mount.rc and added iocharset=utf8 to the [vfat] category.
Last edited by Renan Birck (2009-11-28 20:54:23)

I don't use Thunar presently, but I looked in the Thunar Volume Manager doc and I didn't find anything to change the mount options of removable drives. I am not quite sure if it's possible or not. Maybe someone using it can tell for sure.
But if it is not possible to change the mount options, a possible solution is to disable the Thunar Volume Manager plugin and to use something else more configurable to manage the automount function.
Personally I use the halevt package from AUR which uses configuration files in the xml format.
It's not so easy to use but is highly configurable.
But there exists other tools also.
I can help you with halevt if you choose that way...

[SOLVED] Non english chars kdemod 4 problem

Hello, I have a little problem with KDE and the non english charactes.
If I open a file with non english chars in its name I get something like this:
(In this case kwrite opens "other" file but in other applications it fails with an error of file not found)
Other sympton is that in KDE menu my name have bad chars too:
(It must be López)
And the third sympton is that if try to rename a file in the desktop, I can't write accented chars (á é í ó ú). At the begining the keyboard in this rename dialog was totally in english but i have got a semi spanish keyboard (i can write ñ letters) with the apropiate /etc/hal/fdi/policy/10-keymap.fdi file.
But the most strange is that in general, in all Kde and non-kde applications and even in the console, non english chars works ok. I can go to the file->Open menu of the application and open a file with non english chars in its name. The problem seems to reside in the part of kde that passes the name of the file to the application (¿kwin?)
my locale is es_ES@UTF8 and as I said I have configured correctly the 10-keymap.fdi file.
I have read in some forums that something like this could be a kde or qt bug, but for me it's not clear as i don't see a general complaining about this.
Any idea will be apreciated.
Thanks in advance,
Christian.
Last edited by christian (2009-03-27 14:52:17)

SanskritFritz wrote:
That should be "es_ES.utf8"
Sorry, i mispelled it in the post.
Of course, my locale is es_ES.utf8:
LANG=es_ES.utf8
LC_CTYPE="es_ES.utf8"
LC_NUMERIC="es_ES.utf8"
LC_TIME="es_ES.utf8"
LC_COLLATE=C
LC_MONETARY="es_ES.utf8"
LC_MESSAGES="es_ES.utf8"
LC_PAPER="es_ES.utf8"
LC_NAME="es_ES.utf8"
LC_ADDRESS="es_ES.utf8"
LC_TELEPHONE="es_ES.utf8"
LC_MEASUREMENT="es_ES.utf8"
LC_IDENTIFICATION="es_ES.utf8"
LC_ALL=
I don't think this could be the source of the problem, because, except in the places I said in the firs post, the rest of my system works perfectly.

Problem with non english caracter

Hi,
I'm using JRockit 1.5.0_03, I have a problem with pages with non english caracters. is it possible to change certain properties of JVM like "user.country", "file.encoding" or "user.language"? If yes, how can I change it?
Thanks in advanced

Hi,
I'm using JRockit 1.5.0_03, I have a problem with pages with non english caracters. is it possible to change certain properties of JVM like "user.country", "file.encoding" or "user.language"? If yes, how can I change it?
Thanks in advanced

File naming on Russian and limitation on length of file names

Hi.
I post this message as a note for any users for whom English is not native language and in everyday practice they use native language on Linux systems.
Torrents and Linux
System:
OS: Archlinux x86_64
FS: Reiserfs
Locale: UTF-8
Client: rtorrent
torrent: http://torrents.ru/forum/viewtopic.php?t=1445641
Problem:
if one try to download this torrent with the help of rtorrent he see message
Hashing: Storage error: [Hash checker was unable to map chunk: Слишком длинное имя файла]
(this in on Russian, on English this is: 'File name too long')
Namely rtorrent couldn't handle this file
Белов А. В. - Микроконтроллеры AVR в радиолюбительской практике. Автоматика. Радиоэлектроника. Связь. Радио. Радиосвязь. Любительская радиосвязь.(2007)(336).djvu
(on Russian again).
The same message prints Transmission when one try to download this particular file.
Reason:
the name is exceeded the limitation on length of file name: 255 bytes. See, e.g.
$ echo 'Белов А. В. - Микроконтроллеры AVR в радиолюбительской практике. Автоматика. Радиоэлектроника. Связь. Радио. Радиосвязь. Любительская радиосвязь.(2007)(336).djvu' | wc -c
279
The problem doesn't depend on type of FS, it is in Linux kernel (Linux VFS). Reiserfs has limit in 4032 bytes on length of file name, Reiser4 has limit in 3976 bytes.
In this case NTFS is more advanced FS:
the limitation on length of file name is 255 UTF-16 code units (255*2 bytes).
Of course if one use 8-bit locale (e.g. cp1251, koi8-r and etc) then this problem for THIS file is not appeared:
echo 'Белов А. В. - Микроконтроллеры AVR в радиолюбительской практике. Автоматика. Радиоэлектроника. Связь. Радио. Радиосвязь. Любительская радиосвязь.(2007)(336).djvu' | iconv -f utf-8 -t koi8-r | wc -c
162
Nevertheless there is no guarantee that someone won't face with longer named files and in that situation the advantage of NTFS against any Linux FS is evident.
Info about FS limitation is taken from wikipedia.
I would glad to hear comments and suggestions about this particular problem (because I don't know how to download files of this torrent and store them on my Linux box).
P.S. As I understand FreeBSD has the same problem in this case.
P.P.S. I think that this problem could appear for example on Linux which serves as samba server (file server) when windows user want to place file with native alphabet letters in name on server.

This is a reported bug.

Removing non-English characters from data.

Ours is global system with some data with non-English characters. We want to download file by removing this non-English characters.
Any suggestions how we can remove these non-English characters from file..?

The FM u said
     Replace non-standard characters with standard characters
   Functionality
     SCP_REPLACE_STRANGE_CHARS processes a text so that it only contains
     simple characters. Special characters and national characters are
     replaced in such a way that the text remains reasonably legible.
     The character set 1146 is used by default. In this case the following
     replacements are made, for example:
      Æ ==> AE        (AE)
      Â ==> A         (Acircumflex)
      Ä ==> Ae        (Adieresis)
      £ ==> L         (sterling)
     Note that the new text can be longer than the old.
So i dont think it ll be useful for eliminating the sp. chars.
U have to check each and every alphabet with std 26 alphabets
Thanks & Regards
vinsee

Remove LF characters from file names

I have a folder full of files with filenames that contain LF character (ASCII code 10). I want to use Automator's "Replace Text" funcion to remove these non printing characters from file names. Is there a way to do it?
If automator is not able to do this task, I will take a bash script or applescript solution as well...

Take a look at: http://stackoverflow.com/questions/4417588/sed-command-to-fix-filenames-in-a-dir ectory
(I changed tr -d "\r\n" to tr -d "\n", but try both)
for f in ~/Desktop/*
do
    new="$(printf %s "$f" | tr -d "\n")"
    if [ "$f" != "$new" ]; then
        mv "$f" "$new"
    fi
done

Adobe 8.x Pro Removing Part of File Name

As SOP, I name files by what they are preceded by the date. For example, a letter from John today would be "2008 0519 Ltr from John". When combining files, Adobe is dropping the numbers and booking the file as "Ltr from John". Of course, that is messing up the chronological order of my files. Is there a way to grab the entire file name?
Many thanks.

Spaces and other non-standard characters in file names is one of the results of the long file names that MS introduced. The names can be useful at times, but special characters and spaces should be avoided. As Geo indicated, spaces are commonly represented by an underscore. It is interesting that many web sites have also fallen prey to the MS names with spaces. Other browsers (not IE) try to replace such spaces with %20 for use. MS tends to ignore industry standards and causes a lot of problems as a result.
This is all likely part of your problem. Good luck in resolving it. Geo's advice it probably about as sound as you will get.

Attachment File non-English Name DISAPPEARS problem in BizTalk 2010 SMTP Adapter

Hello ,
I'm using BizTalk 2010 SMTP Adapter for sending mail with attachments by setting them via property SMTP.Attachments
//Attachment
msgEmail(SMTP.Attachments)= AttachmentList;
I have files in several languages (In English and in Russian partialy) for the example
My attachment list looks like this:
"C:\Temp\Files\EnglishNameFile.xml | C:\Temp\Files\RussianFileName_РусскоеИмя.xml";
After the sending Mail with this attachments the second file (it's name partialy in Russian) received without this part name
(The non-english part of name is DISAPPEARS)
like this:
RussianFileName_.xml ( must be RussianFileName_РусскоеИмя.xml)
The NON-English part is DISAPPEARS!!!
And if i have file that doesn't have latin latters (non-english) at all than BizTalk SMTP Adapter change name
to default one like ATT41233.xml
I found this behaviour occur in other non-english languages also!!!
Unfortunately i'm not found any info about this
Any help would be very
much appreciated
Vadim

Refer to this link -
http://social.msdn.microsoft.com/Forums/en-US/163a47cf-db31-49a5-9ee3-ce9272ba24ff/setting-contenttransferencoding-in-dynamic-smtp-port?forum=biztalkgeneral
There is an option of the Multipart message that controls the filename and the charset used to control how the attachment is treated, including content-transfer encoding.
Regards.

Non-english caracters in file names

Similar Messages

Maybe you are looking for