Filenames with strange characters

(With apologies if this is a previously answered question.)
One of my Java programs copies users' home areas around our network. I'm getting filenotfoundexceptions with some of the users' files, specifically those which use extended character sets (we have many international students who use special programs to create files with e.g. Chinese characters). I'm at a loss to know how to diagnose this problem and would appreciate any advice. The filenames are retrieved as result of a listFiles() and the copy method uses BufferedInput\OutputStreams.
David R.

Your problem must be with the names of the files, rather than with the content. I did a few experiments and found that Java (I'm using SDK 1.3 on Windows 2000) can't open files with Cyrillic characters in the file name, either for input or for output. Throws a FileNotFoundException, as you said. There shouldn't be any problem with Chinese content if you're using input and output streams.
So, the problem is diagnosed. There are several bug reports that sound like this. Don't know if it has been fixed in SDK 1.4.

Similar Messages

  • Filenames with unicode characters

    Hello, I have a question regarding filenames with unicode characters on an arabic windows xp.
    I have a string, which the user entered and want to create a file with this string as filename. So my question:
    Which unicode characters are allowed in a filename? I know, that on a german windows " / \ * ? < > : * are not allowed from the ASCII set, but which unicode chars are allowed? Is this language dependend on windows? Maybe there exists a method, which checks a string for this allowed characters.
    Thanks for your help

    AFAIK the illegal characters are always the same (you listed them already) and as long as the filesystem supports it (read: you use NTFS and not FAT) you may use any other unicode character.
    You might have troubles displaying those characters 'though if you don't happen to have the correct fonts installed, but that would only be a cosmetic issue.

  • Report previews and prints with strange characters

    I have a custom XI application which uses CCrystalReportViewer11 (C++ wrapper around ActiveX control) to preview reports. This works for hundreds of installations, but for one the text is displayed and printed with strange characters. If the customer exports the report the characters are displayed as normal. What could be wrong?

    More than likely, it's a printer driver issue. It may even be that there is no default printer driver installed. Also, see if setting a different default printer driver helps. You can also set the "No Printer" option in the report by opening it in the designer and going to the File menu and selecting "Page Setup".
    Ludek

  • Bash script to trim all filenames with special characters recursively?

    Hi,
    I have a 30 GB directory full of data I recovered from a friend's laptop after her Windows XP crashed. I'd like to burn that data, but I can't, because many of the filenames contain weird characters (spaces, accents, things even worse that my XTerm displays as inverted question marks). So, mkisofs exits with an error.
    I'd like to clean that mess up, but it would take months to do that manually. Well, I only know a very little Bash, but I think this problem is already too heavy for my modest knowledge. Here's the problem:
    - check the contents of directory ~/backup recursively
    - find files whose filenames contain characters other than [A-Za-z0-9] and then maybe "-" and "_" and ".".
    - replace these characters either by an "_" or just erase them
    Now how would I translate that into a little Bash script?
    Cheers...

    Heyyyyy... nice idea

  • Copying filenames with invalid characters in OSX

    I'd like to copy some folders for back up to an external hard disk from the internal hard disk on my G4. Problem is that some of the folders contain files were created in OS 9 and their filenames contain "/" symbols which apparently are not acceptable in OS X (10.3.9) which I am now running. After trying to drag the folders containing the files to the external hard disk I get an error message saying that they can't be copied because some of the filenames contain invalid characters. The problem files are scattered among files in many folders all over my G4's internal hard disk, and the task of finding and manually changing each one's filename is daunting. Is there a way to find and replace "/" with "." ? or some other more automated fix of the invalid characters? I'd prefer to keep the original files in their current folders on the G4. thanks
    G4   Mac OS X (10.3.9)  

    Does any of this software do what you want?
    (20586)

  • Problems with Filenames with Chinese Characters

    I seem to have problems with filenames with 2-byte characters like Chinese, Japanese and Korean in various apps. The problems occur when i download chinese music and import into iTunes :
    1. Torrent ( and all other ) files with 2-byte characters filename become junk characters in Finder after download from Safari.
    2. Music files created/downloaded by uTorrent displays 2-byte characters correctly in Finder.
    3. These downloaded music files when imported into iTunes becomes junk characters.
    4. Let say I wish to have a list of all filenames from (2). I do a ls -R > abc.txt in Terminal. abc.txt displays the filenames correctly in vi. However they become junk characters again when view using TextEdit.
    I've selected Chinese, Japanese, Korean in International setting. This is not about input method, just want to get the filenames correct in Finder and iTunes. Any advice is appreciated.

    3. These downloaded music files when imported into iTunes becomes junk characters.
    The ID 3 tags need to be in Unicode.
    http://homepage.mac.com/thgewecke/mlingos9.html#itunes
    4. they become junk characters again when view using TextEdit.
    TextEdit needs to be set to the correct encoding, it's not automatic.

  • Issue with strange characters

    Hi All,
    A strange scenario ..
    In the source system A text appears to have a hyphen (-)..
    When my BW QA environment pulls the data in the PSA the same hyphen appears to be '#'
    but my Dev environment pulls the same record, it appears the characters shows correctly as hyphen ..
    What could be the possible reason for this issue? Could you guide me where should I look for these?
    And the next part of the question :-
    When the same record is sent via OHS, the hyphen actually shows as hyphen in the generated file(atleast when I try to view the file via al11).. When the file is sent via FTP, the hyphen actually gets converted to some strange characters(âu20ACu201C )..
    Could someone please guide me ..
    Thanks,
    Debajyoti

    Hi Debajyoti,
    Check the RSKC t-code, if you have the hyphen character, or put "ALL_CAPITAL" to let all character as valid.
    Sometimes, there are errors and/or conversion of the characters that you can see in the error message in the load.
    The message sometimes has an hexadecmal number (I don't know if it is your case) that you can check with the next link.
    [http://www.utf8-chartable.de/unicode-utf8-table.pl?number=512|http://www.utf8-chartable.de/unicode-utf8-table.pl?number=512]
    There are characters that are allowed in the source system (i.e. R/3) because of system language configured in R/3 and not in BW, or Unicode configured in R/3 and not in BW.
    Also, there are some control characters that sap doestn allow to be loaded (you can see them in the list).
    You can validate each character that comes from PSA with a function module in the mapping (defined as routine) for the particular InfoObject that you want, and if it is the case, you can convert it or replace it
    Regards, Federico

  • Problem figuring out the encoding for filenames with special characters

    I'm not sure if this is the right forum, but this does seem like an OS issue.
    I brought in a lot of mp3 and m3u files from a Windows machine to my new Mac. Some of the mp3 files have accented characters in their names, and these names appear in the m3u files. But if I add the m3u file to iTunes, it fails to recognize these names and so I lose all the mp3's with special characters in their names.
    I tried to fix this by grabbing the files name in Python, but that didn't work either!
    Here's an example: the file's name is "Voilà l'été.mp3"
    The m3u files says "Voil\xe0 l'\xe9t\xe9.mp3" -- this doesn't work.
    From os.listdir(), I get Voila\xcc\x80 l'e\xcc\x81te\xcc\x81.mp3", but sticking it in an m3u files doesn't work either. (Note that here the characters are encoded as unaccented letter + two byte code for the accent).
    When I try these strings from python, e.g. doing os.stat(), they both work; but iTunes doesn't understand any of them!
    I'd appreciate any hints on how to enter these names in the m3u file so that iTunes can read it. Thanks!

    I know nothing about "m3u" files and how iTunes interprets the file names in them, but if it is not a relative/absolute path problem, then how about just putting the raw file names (not the ones with backslash escape) in m3u file? For example, just put
    Voilà l'été.mp3
    in m3u?
    As for Unicode encoding, HFS+ file system uses the "decomposed form" for accented characters. This means, as you write, à is hex "61 cc 80" in UTF-8, i.e., "a + COMBINING GRAVE ACCENT". The pre-composed form is hex "c3 a0". But my experience is that in most cases both pre-composed and decomosed forms work at the user level (not at the lowest file system level).

  • Words HD with strange characters in friend's IDs

    I went into settings on Words and changed my username from the strange Polish looking id to my actual Facebook ID and the password to mine and then saved that.  Now only one of my fellow players is identified with his proper Facebook ID.  All the others are identified to me with strange unrecognizable IDs (zyngawf_33060480 e.g. and one as kyrstal1277).  My games with them remained intact, but now I can't tell who I'm playing with; I have no idea who zyngawf_3306480 is.  They all seem to know who I am.  How can I fix this?

    I deleted the game and installed it again and then logged in with Facebook and it's OK now.   I now see my friends real names again.  And I didn't lose any of my current games.  Weird.

  • Problem crawling filenames with national characters

    Hi
    I have a big problem with filenames containing national (danish) characters.
    The documents gets an entry in in wk$url but have error code 404 (Not found).
    I'm running Oracle RDBMS 9.2.0.1 on Redhat Advanced Server 2.1. The
    filesystem is mounted on the oracle server using NFS.
    I configure the Ultrasearch to crawl the specific directory containing
    several files, two of which contains national characters in their
    filenames. (ls -l)
    <..>
    -rw-rw-r-- 1 user group 13 Oct 4 13:36 crawlertest_linux_2_fxeFXE.txt
    -rw-rw-r-- 1 user group 19968 Oct 4 13:36 crawlertest_windows_fxeFXE.doc
    <..>
    (Since the preview function is not working in my Mozilla browser, I'm
    unable to tell whether or not the national characters will display
    properly in this post. But they represent lower and upper cases of the
    three special danish characters.)
    In the crawler log the following entries are added:
    <..>
    file://localhost/<DIR_PATH>/crawlertest_linux_2_B|C?C%C?C?.txt
    file://localhost/<DIR_PATH>/crawlertest_linux_2_B|C?C%C?C?.txt
    Processing file://localhost/<DIR_PATH>/crawlertest_linux_2_%e6%f8%e5%c6%d8%c5.txt
    WKG-30008: file://localhost/<DIR_PATH>/crawlertest_linux_2_%e6%f8%e5%c6%d8%c5.txt: Not found
    <..>
    file://localhost/<DIR_PATH>/crawlertest_windows_B|C?C%C?C?.doc
    file://localhost/<DIR_PATH>/crawlertest_windows_B|C?C%C?C?.doc
    Processing file://localhost/<DIR_PATH>/crawlertest_windows_%e6%f8%e5%c6%d8%c5.doc
    WKG-30008:
    file://localhost/<DIR_PATH>/crawlertest_windows_%e6%f8%e5%c6%d8%c5.doc:
    Not found
    <..>
    The 'file://' entries looks somewhat UTF encoded to me (some chars are
    missing because they are not printable) and the others looks URL
    encoded.
    All other files in the directory seems to process just fine!.
    In the wk$url table the following entries are added:
    (select status url from wk$url where url like '%crawlertest%'; )
    404 file://localhost/<DIR_PATH>/crawlertest_linux_2_%e6%f8%e5%c6%d8%c5.txt
    404 file://localhost/<DIR_PATH>/crawlertest_windows_%e6%f8%e5%c6%d8%c5.doc
    Just for testing purpose a
    SELECT utl_url.unescape('%e6%f8%e5%c6%d8%c5') from dual;
    Actually produce the expected resulat : fxeFXE
    To me this indicates that the actual filesystem scanning part of the
    crawler can sees the files, but the processing part of the crawler can
    not open the file for reading and it therefor fails with error 404.
    Since the crawler (to my knowledge is written in Java i did some
    experiments, with the following Java program.
    import java.io.*;
    class filetest {
    public static void main(String args[]) throws Exception {
    try {
    String dirname = "<DIR_PATH>";
    File dir = new File(dirname);
    File[] fs = dir.listFiles();
    for(int idx = 0; idx < fs.length; idx++) {
    if(fs[idx].canRead()) {
    System.out.print("Can Read: ");
    } else {
    System.out.print("Can NOT Read: ");
    System.out.println(fs[idx]);
    } catch(Exception e) {
    e.printStackTrace();
    The performance of this program is very depending on the language
    settings of the current shell (under Linux). If LC_ALL is set to "C"
    (which is a common default) the program can only read files with
    filenames NOT containing national characters (Just as the Ultrasearch
    crawler). If LC_ALL is set to e.g. "en_US", then it is capable of
    reading all the files.
    I therefor tried to set the LC_ALL environment for the oracle user on
    my oracle server (using locale_config, and .bash_profile) but that did
    not seem to fix the problem at hand.
    So (finally) my question is; is this a bug in the Ultrasearch crawler
    or simply a mis configuration of my execution environment. If the
    latter how do i configure my system correctly?
    Yours sincerely
    Martin Dahl Pedersen, Visanti ( mdp at visanti dot com )

    I've posted my problems as a TAR on METALINK a little week ago.
    And it turns out to be a new bug in UltraSearch.
    It is now filed under BUG:2673282
    -- mdp

  • Numbers is exporting with strange characters

    I created a huge spreadsheet in Numbers 2.3, upgraded to Numbers 3.0. When I try to export to a CSV I get this strange charictor "Â" where there is supposed to be a space.  I have tried exporting differnet CSV version.  The problem is not uniform through the document, it recognizes some of the spaces, but for the most part, anwhere there is a space the "Â" is substituted for the space.  I can reopen the CSV in numbers and not have the spaces, but when I try to upload it to a website it has them, and I also have opened it in excell and it still has the problem. I have spent countless hours building this spreadsheet and if I cannont convert it correctly it is pretty much useless to me.  Please help!

    Manny,
    You can continue to use the previous version of Numbers which is located in the folder "/Applications/iWork '09"
    You can post a bug report or feedback to Apple using the menu item "Numbers > Provide Numbers Feedback".
    There seems to be several issues around CSV files.

  • Filenames with non-latin characters aren't found by the filesystem [S]

    This might be a bug, but I'm hoping it's just a config file problem.
    I have a few files here and there on my NTFS drive that have Japanese characters in their filenames.  Sometime recently (I don't have an exact date when they disappeared), they stopped showing up at all.  If I browse to a folder that used to contain filenames with Japanese characters, it just appears empty in Gnome.  Using ls from a terminal also says the directory is empty.  They used to work just fine, but a recent upgrade must have broken them.
    Does anyone have any ideas what I can do to get my files to appear again?  Is there some way to enable unicode support for filenames or something?
    Many thanks!
    Edit: Rebooting the system fixed it, though I still think that was a pretty strange problem.  Any ideas what was up?
    Last edited by ColdPie (2007-11-11 02:07:11)

    The funny thing is that bold font [when message unread in message list] shows OK, ie in greek, but when i click on unread message, it is assumed to have been read, so it changes over to medium [non bold] and the encoding changes as well into the one that is not greek and thus unreadable.  In ~/.sylpheed/sylpheedrc the fonts are:
    widget_font=
    message_font=-microsoft-sylfaenarm-medium-r-normal-*-*-160-*-*-p-*-iso8859-7
    normal_font=-monotype-arial-medium-r-normal-*-12-*-*-*-*-*-iso8859-7
    bold_font=-monotype-arial-bold-r-normal-*-12-*-*-*-*-*-iso8859-7
    small_font=-monotype-arial-medium-r-normal-*-12-*-*-*-*-*-iso8859-7
    In /etc/gtk, for gtk1.2 apps the file refering to greek encoding [el] seems to be fine [exactly the same as in slackware 9.1].

  • Pages document turned to gibberish-strange characters in rectangles. Can't read underlying text (still there)

    I use a Pages document (password protected) as a frequent reference. It is 22 pages of text. Suddenly most text was replaced with strange characters, each encased in a little rectangle, so document was unreadable. Selecting a section of text, I could right-click and see that underlying text was still there, just not readable onscreen. Style was "Normal*." I don't know what the " * " means. I selected all and changed style to "Heading 9" on the list. All text reappeared, so my data is ok - for now (though not formatted as I want it). I can't trust Pages, however, and don't know what caused the problem or how to avoid it.
    Note that the style now shows "Heading 9*". I don't know why the " * ". I tried selecting all and making "Normal" the global style, but the strange characters wouldn't budge. Switching back to Normal brings back gibberish.
    Some text is Verdana; I just noticed that changing font also got rid of gibberish. Changing back to Verdana brought back gibberish.
    Note that not every line of text was changed to gibberish; some were just fine.
    Other Pages documents are fine - right now. I notice that there are asterisks in Style in other text documents.
    I'm an experienced computer user, new to Pages in the fall when it came with my MacBook Air. In this document, format isn't all that important, but I can imagine where this would be a huge disaster for other types of documents.
    Any help in figuring out what's going on most appreciated.

    Tom -
    "That sounds like Apple's Last Resort font, which gets used when for some reason the app forgets about the font being used.  Use Fontbook to make sure you don't have duplicate Verdanas on your machine.  Also try cleaning your font caches with Onyx or by doing a safe boot"
    Wow! Who would have guessed! Thank you for the advice; Fontbook revealed a number of duplicate fonts - ot Verdana, interestingly enough -  which I have now resolved. I then switched all text to Trebuchet MS, and everything reverted to normal. I can live with Trebuchet.
    This problem appeared on the same day (before or after I wasn't sure) I installed MS Office. Could this have caused the problem?
    Thanks for your prompt reply, explanation and solutions.

  • Strange characters in pdf

    Hello,
    i have a strange problem with "strange characters" (such as "à", "ö" etc). I have to edit pdf files produced with Adobe Acrobat by other people. In some cases, when i try to "copy and paste" text from this file into a "note" (within the same file), the strange characters appear scrambled, such as "Ÿ" instead of "ü", or "Ž" instead of "é".
    I noticed the following:
    1) on the screen, the original text looks correct, but after copying and pasting it into the note-tool (within Acrobat), it looks scrambled. If i paste it into another application (TextEdit or Word), it stays scrambled.
    2) this happens not for all files, but only for some of them. note that all files come from the same source...
    3) The same string of text looks and behaves absolutely fine when i open the pdf file with another application such as Preview.
    4) In Acrobat, i can edit the scrambled notes to make out of an "Š" an "é" - but that is a time consuming way of doing things.
    I think it has something to do with Font availability, or Unicode, or so, but a don't find the right thing to do. And i really need a solution, since editing within Acrobat is part of my money-earning work ...
    Who has a useful hint ???
    Thanks in advance,
    thomas

    Guessing here, but possibly the others and you aren't set to the same language? - the font codes for the diacriticals may vary between languages?

  • Sometimes the address bar / url field is filled up with weird characters. How can I stop this?

    The address bar in Firefox fills sometimes completely with strange characters. There are no letters or numbers, but symbols / icons. Is this a bug or something that I can change? Here you can see how this looks.
    https://drive.google.com/file/d/0ByVixQ4OnwKfWGJBZTRxRGJZWVE/edit?usp=sharing

    Try disabling the RSS Icon 1.0.6 extension. A number of users have reported that extension as causing that type of display.

Maybe you are looking for