Filenames with unicode characters

Hello, I have a question regarding filenames with unicode characters on an arabic windows xp.
I have a string, which the user entered and want to create a file with this string as filename. So my question:
Which unicode characters are allowed in a filename? I know, that on a german windows " / \ * ? < > : * are not allowed from the ASCII set, but which unicode chars are allowed? Is this language dependend on windows? Maybe there exists a method, which checks a string for this allowed characters.
Thanks for your help

AFAIK the illegal characters are always the same (you listed them already) and as long as the filesystem supports it (read: you use NTFS and not FAT) you may use any other unicode character.
You might have troubles displaying those characters 'though if you don't happen to have the correct fonts installed, but that would only be a cosmetic issue.

Similar Messages

Does photoshop cs5 sdk support file names with unicode characters?

Hi,
For the export module, does the windows sdk have support for filenames with unicode characters?
The ExportRecord struct in PIExport.h has an attribute filename declared as char array.
So I have a doubt whether filenames with unicode characters will be supported for export module.
Thanks in advance,
Senthil.

SPPlatformFileSpecificationW is not available in ReadImageDocumentDesc.
It has SPPlatformFileSpecification_t which is coming as null for export module.
I am using phostoshop cs5 sdk for windows.
Thanks,
Senthil.

Getting PDF filename with unicode chars

Hello,
I'm trying to write a plugin that gets the file path of the current active document. The code looks like this:
AVDoc avDoc = AVAppGetActiveDoc();
PDDoc pdDoc = AVDocGetPDDoc(avDoc);
ASFile file = PDDocGetFile (pdDoc);
ASPathName filePath = ASFileAcquirePathName (file);
This works fine for most documents, but for documents with unicode characters in the name each unicode character is replaced with '.' in filePath. For example, if the document is "测试中文关键词搜索!@#$%^&().pdf", then filePath becomes ".........!@#$%^&().pdf". Am I missing something required to get unicode filenames?
Thanks.

You were right, the plugin was getting a char* from ASFileSysDisplayStringFromPath. I removed that and added this which seems to have fixed my problem:
ASText pathText = ASTextNew();
ASFileSysDisplayASTextFromPath(ASGetDefaultFileSys(), filePath, pathText);
wchar_t *pathString = (wchar_t*)ASTextGetUnicode(pathText);
Thank you!

CRVS2010 Beta - Cannot export report to PDF with unicode characters

My report has some unicode data (Chinese), it can be previewed properly in the windows form report viewer. However, if I export the report document to PDF file, the unicode characters in exported file are all displayed as a square.
In the version of Crystal Report 2008 R2, it can export the Chinese characters to PDF when I select a Chinese font in report. But VS2010 beta cannot export the Chinese characters even a Chinese font is selected.

Barry, what is the specific font you are using?
The below is a reformatted response from Program Management:
Using non-Chinese font with Unicode characters (Chinese) the issue is reproducible when using Arial font in Unicode characters field. After changing the Unicode character to Simsun (A Chinese font named 宋体 in report), the problem is solved in Cortez and CR both.
Ludek

Performance of JEditorPane with unicode characters

Hi,
I'm using a JEditorPane to edit rather large (> 15000 words) but simple HTML files. Everyting is fine until I add even a single unicode character to the text with a character code higher than 255, like a Greek omega (\u03A9). With the unicode character the control starts to take an incredibly long time to redraw (sometimes minutes) when you resize it, for instance. The strangest thing is that removing the character again does not restore performance. Can anyone explain why this is happening?
import javax.swing.*;
import javax.swing.text.html.HTMLEditorKit;
public class EditorPaneTest {
public static void main(String[] args) {
StringBuffer html = new StringBuffer();
html.append("<html><body>");
// Uncomment next line, run and resize frame to see problem
// html.append("<p>\u03A9</p>");
for (int i = 0; i < 2000; i++) {
html.append("<p>Testing, testing, testing...</p>");
html.append("</body></html>");
JFrame jFrame = new JFrame("Test");
jFrame.setSize(300, 300);
JEditorPane jEditorPane = new JEditorPane();
jEditorPane.setEditorKit(new HTMLEditorKit());
jFrame.add(new JScrollPane(jEditorPane));
jFrame.setDefaultCloseOperation(JInternalFrame.EXIT_ON_CLOSE);
jFrame.setVisible(true);
jEditorPane.setText(html.toString());
}Any help would be much appreciated.
Thanks,
Rasmus

In the meantime, I had to solve my problem one way or another, and the only thing that came up to my mind was to use JavaMail API.
It is not quite what I was hoping for, because it doesn't provide opening of default e-mail client on local machine, but at least it can send e-mail with Unicode characters in the subjects line, recipient addresses, etc.
Make a new message using JavaMail and then set it's properties in a fairly simple manner, like this:
message.setSubject( MimeUtility.encodeText("+ ... some Unicode text with Cyrillic symbols ... +", "UTF-8", "B") );I'd still like to see if there are any suggestions on how to do the similar thing with java.awt.Desktop.
Regards,
PS

Problems with Filenames with Chinese Characters

I seem to have problems with filenames with 2-byte characters like Chinese, Japanese and Korean in various apps. The problems occur when i download chinese music and import into iTunes :
1. Torrent ( and all other ) files with 2-byte characters filename become junk characters in Finder after download from Safari.
2. Music files created/downloaded by uTorrent displays 2-byte characters correctly in Finder.
3. These downloaded music files when imported into iTunes becomes junk characters.
4. Let say I wish to have a list of all filenames from (2). I do a ls -R > abc.txt in Terminal. abc.txt displays the filenames correctly in vi. However they become junk characters again when view using TextEdit.
I've selected Chinese, Japanese, Korean in International setting. This is not about input method, just want to get the filenames correct in Finder and iTunes. Any advice is appreciated.

3. These downloaded music files when imported into iTunes becomes junk characters.
The ID 3 tags need to be in Unicode.
http://homepage.mac.com/thgewecke/mlingos9.html#itunes
4. they become junk characters again when view using TextEdit.
TextEdit needs to be set to the correct encoding, it's not automatic.

Bash script to trim all filenames with special characters recursively?

Hi,
I have a 30 GB directory full of data I recovered from a friend's laptop after her Windows XP crashed. I'd like to burn that data, but I can't, because many of the filenames contain weird characters (spaces, accents, things even worse that my XTerm displays as inverted question marks). So, mkisofs exits with an error.
I'd like to clean that mess up, but it would take months to do that manually. Well, I only know a very little Bash, but I think this problem is already too heavy for my modest knowledge. Here's the problem:
- check the contents of directory ~/backup recursively
- find files whose filenames contain characters other than [A-Za-z0-9] and then maybe "-" and "_" and ".".
- replace these characters either by an "_" or just erase them
Now how would I translate that into a little Bash script?
Cheers...

Heyyyyy... nice idea

Copying filenames with invalid characters in OSX

I'd like to copy some folders for back up to an external hard disk from the internal hard disk on my G4. Problem is that some of the folders contain files were created in OS 9 and their filenames contain "/" symbols which apparently are not acceptable in OS X (10.3.9) which I am now running. After trying to drag the folders containing the files to the external hard disk I get an error message saying that they can't be copied because some of the filenames contain invalid characters. The problem files are scattered among files in many folders all over my G4's internal hard disk, and the task of finding and manually changing each one's filename is daunting. Is there a way to find and replace "/" with "." ? or some other more automated fix of the invalid characters? I'd prefer to keep the original files in their current folders on the G4. thanks
G4 Mac OS X (10.3.9)

Does any of this software do what you want?
(20586)

Filenames with strange characters

(With apologies if this is a previously answered question.)
One of my Java programs copies users' home areas around our network. I'm getting filenotfoundexceptions with some of the users' files, specifically those which use extended character sets (we have many international students who use special programs to create files with e.g. Chinese characters). I'm at a loss to know how to diagnose this problem and would appreciate any advice. The filenames are retrieved as result of a listFiles() and the copy method uses BufferedInput\OutputStreams.
David R.

Your problem must be with the names of the files, rather than with the content. I did a few experiments and found that Java (I'm using SDK 1.3 on Windows 2000) can't open files with Cyrillic characters in the file name, either for input or for output. Throws a FileNotFoundException, as you said. There shouldn't be any problem with Chinese content if you're using input and output streams.
So, the problem is diagnosed. There are several bug reports that sound like this. Don't know if it has been fixed in SDK 1.4.

Problem figuring out the encoding for filenames with special characters

I'm not sure if this is the right forum, but this does seem like an OS issue.
I brought in a lot of mp3 and m3u files from a Windows machine to my new Mac. Some of the mp3 files have accented characters in their names, and these names appear in the m3u files. But if I add the m3u file to iTunes, it fails to recognize these names and so I lose all the mp3's with special characters in their names.
I tried to fix this by grabbing the files name in Python, but that didn't work either!
Here's an example: the file's name is "Voilà l'été.mp3"
The m3u files says "Voil\xe0 l'\xe9t\xe9.mp3" -- this doesn't work.
From os.listdir(), I get Voila\xcc\x80 l'e\xcc\x81te\xcc\x81.mp3", but sticking it in an m3u files doesn't work either. (Note that here the characters are encoded as unaccented letter + two byte code for the accent).
When I try these strings from python, e.g. doing os.stat(), they both work; but iTunes doesn't understand any of them!
I'd appreciate any hints on how to enter these names in the m3u file so that iTunes can read it. Thanks!

I know nothing about "m3u" files and how iTunes interprets the file names in them, but if it is not a relative/absolute path problem, then how about just putting the raw file names (not the ones with backslash escape) in m3u file? For example, just put
Voilà l'été.mp3
in m3u?
As for Unicode encoding, HFS+ file system uses the "decomposed form" for accented characters. This means, as you write, à is hex "61 cc 80" in UTF-8, i.e., "a + COMBINING GRAVE ACCENT". The pre-composed form is hex "c3 a0". But my experience is that in most cases both pre-composed and decomosed forms work at the user level (not at the lowest file system level).

Initialising strings with unicode characters

This works
System.out.println("Hello World");
but this will not compile
System.out.println("你好");
How do I get unicode characters into my Java source?
I am running Windows XP and editing my files using notepad.
If I save my source as ASCII it compiles, but I do not get the foreign characters.
If I save my file as utf-8 or unicode the source will not compile.

I have got it!
On Windows XP using notepad the java source file can be "saved as" Unicode.
The source can then be compiled using;
javac HelloWorld.java -encoding unicode
The code compiles and executes.
It is even possible to give variables names that are Chinese characters, which is really what you would expect to be able to do.

Problem crawling filenames with national characters

Hi
I have a big problem with filenames containing national (danish) characters.
The documents gets an entry in in wk$url but have error code 404 (Not found).
I'm running Oracle RDBMS 9.2.0.1 on Redhat Advanced Server 2.1. The
filesystem is mounted on the oracle server using NFS.
I configure the Ultrasearch to crawl the specific directory containing
several files, two of which contains national characters in their
filenames. (ls -l)
<..>
-rw-rw-r-- 1 user group 13 Oct 4 13:36 crawlertest_linux_2_fxeFXE.txt
-rw-rw-r-- 1 user group 19968 Oct 4 13:36 crawlertest_windows_fxeFXE.doc
<..>
(Since the preview function is not working in my Mozilla browser, I'm
unable to tell whether or not the national characters will display
properly in this post. But they represent lower and upper cases of the
three special danish characters.)
In the crawler log the following entries are added:
<..>
file://localhost/<DIR_PATH>/crawlertest_linux_2_B|C?C%C?C?.txt
file://localhost/<DIR_PATH>/crawlertest_linux_2_B|C?C%C?C?.txt
Processing file://localhost/<DIR_PATH>/crawlertest_linux_2_%e6%f8%e5%c6%d8%c5.txt
WKG-30008: file://localhost/<DIR_PATH>/crawlertest_linux_2_%e6%f8%e5%c6%d8%c5.txt: Not found
<..>
file://localhost/<DIR_PATH>/crawlertest_windows_B|C?C%C?C?.doc
file://localhost/<DIR_PATH>/crawlertest_windows_B|C?C%C?C?.doc
Processing file://localhost/<DIR_PATH>/crawlertest_windows_%e6%f8%e5%c6%d8%c5.doc
WKG-30008:
file://localhost/<DIR_PATH>/crawlertest_windows_%e6%f8%e5%c6%d8%c5.doc:
Not found
<..>
The 'file://' entries looks somewhat UTF encoded to me (some chars are
missing because they are not printable) and the others looks URL
encoded.
All other files in the directory seems to process just fine!.
In the wk$url table the following entries are added:
(select status url from wk$url where url like '%crawlertest%'; )
404 file://localhost/<DIR_PATH>/crawlertest_linux_2_%e6%f8%e5%c6%d8%c5.txt
404 file://localhost/<DIR_PATH>/crawlertest_windows_%e6%f8%e5%c6%d8%c5.doc
Just for testing purpose a
SELECT utl_url.unescape('%e6%f8%e5%c6%d8%c5') from dual;
Actually produce the expected resulat : fxeFXE
To me this indicates that the actual filesystem scanning part of the
crawler can sees the files, but the processing part of the crawler can
not open the file for reading and it therefor fails with error 404.
Since the crawler (to my knowledge is written in Java i did some
experiments, with the following Java program.
import java.io.*;
class filetest {
public static void main(String args[]) throws Exception {
try {
String dirname = "<DIR_PATH>";
File dir = new File(dirname);
File[] fs = dir.listFiles();
for(int idx = 0; idx < fs.length; idx++) {
if(fs[idx].canRead()) {
System.out.print("Can Read: ");
} else {
System.out.print("Can NOT Read: ");
System.out.println(fs[idx]);
} catch(Exception e) {
e.printStackTrace();
The performance of this program is very depending on the language
settings of the current shell (under Linux). If LC_ALL is set to "C"
(which is a common default) the program can only read files with
filenames NOT containing national characters (Just as the Ultrasearch
crawler). If LC_ALL is set to e.g. "en_US", then it is capable of
reading all the files.
I therefor tried to set the LC_ALL environment for the oracle user on
my oracle server (using locale_config, and .bash_profile) but that did
not seem to fix the problem at hand.
So (finally) my question is; is this a bug in the Ultrasearch crawler
or simply a mis configuration of my execution environment. If the
latter how do i configure my system correctly?
Yours sincerely
Martin Dahl Pedersen, Visanti ( mdp at visanti dot com )

I've posted my problems as a TAR on METALINK a little week ago.
And it turns out to be a new bug in UltraSearch.
It is now filed under BUG:2673282
-- mdp

QRcode printing with unicode characters

Hi,
We're trying to print out QRcode in smartforms by referencing below link with output device ZBWIPP and device type ZBWIPPQR.
http://www.rjruss.info/2010/09/how-to-printpdf-qr-codes-in-standard.html
Meanwhile, we need to print out trad. Chinese in same smartform.
Seems that QRcode and trad. Chinese are in different character set.
Would there be any device type which supports QRcode and unicode ?
or any revision requires for ZBWIPPQR device type?
Please help.
Thank you.

Hello Nieves,
Just came aware of your question and see its been a while so you may have moved on.
However I created the post you refer to.
Also I posted the following SCN blog post that covers using a Japanese Device type to allow Kanji with the QRcode.
The SAP note referenced in the blog also has Chinese device types so could be adapted. The device types do come with some limitations so may not be suitable in all cases.
If you still have the requirement have a read of my SCN blog and happy to help further if required.
Cheers
Robert
Barcodes in SAP with the Barcode Writer in Pure Postscript update

Filenames with non-latin characters aren't found by the filesystem [S]

This might be a bug, but I'm hoping it's just a config file problem.
I have a few files here and there on my NTFS drive that have Japanese characters in their filenames. Sometime recently (I don't have an exact date when they disappeared), they stopped showing up at all. If I browse to a folder that used to contain filenames with Japanese characters, it just appears empty in Gnome. Using ls from a terminal also says the directory is empty. They used to work just fine, but a recent upgrade must have broken them.
Does anyone have any ideas what I can do to get my files to appear again? Is there some way to enable unicode support for filenames or something?
Many thanks!
Edit: Rebooting the system fixed it, though I still think that was a pretty strange problem. Any ideas what was up?
Last edited by ColdPie (2007-11-11 02:07:11)

The funny thing is that bold font [when message unread in message list] shows OK, ie in greek, but when i click on unread message, it is assumed to have been read, so it changes over to medium [non bold] and the encoding changes as well into the one that is not greek and thus unreadable. In ~/.sylpheed/sylpheedrc the fonts are:
widget_font=
message_font=-microsoft-sylfaenarm-medium-r-normal-*-*-160-*-*-p-*-iso8859-7
normal_font=-monotype-arial-medium-r-normal-*-12-*-*-*-*-*-iso8859-7
bold_font=-monotype-arial-bold-r-normal-*-12-*-*-*-*-*-iso8859-7
small_font=-monotype-arial-medium-r-normal-*-12-*-*-*-*-*-iso8859-7
In /etc/gtk, for gtk1.2 apps the file refering to greek encoding [el] seems to be fine [exactly the same as in slackware 9.1].

[SOLVED] mpd refuses to play songs with unicode in the filenames

Hi,
I've been using MPD+Sonata for a while now (two great pieces of software), unfortunately I just tried to play some songs stored in files with unicode characters (i think they're unicode, the ones with the accents, like 'è') and they refuse to play.
I modified my mpd.conf file to set filesystem_charset to both ISO-8859-1 and UTF-8 (restarting mpd and rebuilding the DB each time) and then it simply refused to add the songs to the database. Furthermore, when I uncommented the line again, MPD still won't add them to the database.
Any ideas what's going on?
Thanks in advance!
Last edited by capnfabs (2009-01-17 07:01:14)

ah... ok! it seems that the problem is deeper than that. I have my music stored on an NTFS drive and it would appear that these files have disappeared, as far as linux is concerned. I'll move the thread to something more appropriate
And that was fixed by a remount. Sorry for wasting everyone's time! :-/
Last edited by capnfabs (2009-01-17 07:00:59)

Filenames with unicode characters

Similar Messages

Maybe you are looking for