Unicode characters in file name

Hi,
I try to open a file (using UTL_FILE functionalities) whose name contains polish characters (e.g. 'test-ś.txt').
In return, I get error message:
ORA-29283: invalid file operation
ORA-06512: at "SYS.UTL_FILE", line 633
ORA-29283: invalid file operation
Error is not due to missing rights on file/directory because when I replace the polish character by a latin one, file is opened successfuly.
I also tried to rename a file (using UTL_FILE.FRENAME) from latin to polish characters (e.g 'test-s.txt' -> 'test-ś.txt').
File is renamed but polish characters are lost (final result is something like 'test-Å›.txt').
What's wrong with my environment or code?
Thanks in advance for your help,
Arnaud
Here's my environment description, PL/SQL code and results.
Environment:
OS Windows in Polish for client box
* code page ACP=1250
* NLS_LANG=POLISH_POLAND.EE8MSWIN1250
OS Windows in US/English for database server
* code page ACP=1252
* NLS_LANG=AMERICAN_AMERICA.AL32UTF8
Oracle 10.2.0.5
* NLS_CHARACTERSET=AL32UTF8
* NLS_NCHAR_CHARACTERSET=AL16UTF16
Tests are executed from SQL Developer on client box.
The file I'm trying to open is located on database server.
So, Oracle Directory path used in FOPEN procedure is something like '\\server\directory'.
PL/SQL code:
SET SERVEROUTPUT ON;
declare
Message varchar2(1000);
Filename varchar2(1000); -- nvarchar2(1000);
FileHandler UTL_FILE.FILE_TYPE;
OraDir varchar2(30) := 'SGINSURANCE_DIR_SOURCE';
begin
dbms_output.enable(10000);
--Filename := 'test-s.txt';
Filename := 'test-ś.txt';
Message := 'Opening file ['||Filename||']';
dbms_output.put_line(Message);
--FileHandler := UTL_FILE.FOPEN_NCHAR(OraDir, Filename, 'r');
FileHandler := UTL_FILE.FOPEN(OraDir, Filename, 'r');
Message := 'Closing file';
dbms_output.put_line(Message);
UTL_FILE.FCLOSE(FileHandler);
exception
when others then
Message := 'Error: '||SQLERRM;
dbms_output.put_line(Message);
if UTL_FILE.IS_OPEN(FileHandler) then
Message := 'Closing file ['||Filename||']';
dbms_output.put_line(Message);
UTL_FILE.FCLOSE(FileHandler);
end if;
end;
Results:
Test with polish characters -> error ORA-29283: invalid file operation
anonymous block completed
Opening file [test-ś.txt]
Error: ORA-29283: invalid file operation
ORA-06512: at "SYS.UTL_FILE", line 536
ORA-29283: invalid file operation
Test without polish characters -> no error
anonymous block completed
Opening file [test-s.txt]
Closing file
-----------------------------------------------------------

Hello,
I tested this issue on Oracle-10-XE on Windows-XP with different Language settings.
It seems to me that UTL_FILE doesn't use wide character Windows API functions like _wfopen,
but simply old fopen based on 8-bit character strings.
Looks like UTL_FILE.FOPEN do not any character conversion
on the filename, but pass this filename "as is" directly to the operating system,
for example for a string "teść" with polish characters the following char codes are passed:
SELECT dump( 'teść', 16 ) from dual;
DUMP('TEŚĆ',16)
Typ=96 Len=6: 74,65,c5,9b,c4,87ś - is : c5, 9b
ć - is : c4 87
In windows API functions based on on 8-bit char * strings are interpreted as being in the system code page
- look at this thread -> [ http://stackoverflow.com/questions/480849/windows-codepage-interactions-with-standard-c-c-filenames]
So if your code page is a Windows ANSI 1252, these characters are treated as:
ś -> c5 is "Å" , 9b is "›" --> Å›
ć -> c4 -> Ä, 87 -> ‡ --> Ä‡
so instead of a 'teść', Windows converts it to 'teÅ˜Ä‡' ;)
Here is a table of codes of CP-1252 -> [http://en.wikipedia.org/wiki/Windows-1252]
CP 1525 doesn't support polish characters, the only Windows ANSII code page that supports them is CP 1250
I've changed the system code page to 1250 on the server side, and this have worked fine:
declare
fh UTL_FILE.FILE_TYPE;
strbuffer NVARCHAR2(1000);
begin
fh := UTL_FILE.FOPEN_NCHAR( 'DIR_USER_FILES', CONVERT('teść.txt', 'EE8MSWIN1250' ), 'w' );
utl_file.put_line_nchar( fh, 'chrząszcz brzmi w trzcinie');
utl_file.put_line_nchar( fh, 'teść żócał mięśńęm');
utl_file.fclose( fh );
fh := UTL_FILE.FOPEN_NCHAR( 'DIR_USER_FILES', CONVERT('teść.txt', 'EE8MSWIN1250' ), 'r' );
LOOP
    BEGIN
      utl_file.get_line_nchar( fh, strbuffer );
      dbms_output.put_line( strbuffer );
    EXCEPTION
      WHEN OTHERS THEN
        EXIT;
    END;
END LOOP;
utl_file.fclose( fh );
END;
/I've leaved untouched the users locale as "English (United States), only *the system locale* has been changed to "Polish"
- there are two different locales, look at this thread for details [http://mihai-nita.net/2005/06/11/setting-the-user-and-system-locales/]
If you change the server's system locale, this will affect all other non-unicode programs running on this server,
so something other may stop running properly.

Similar Messages

Sqlldr does not understand unicode characters in file names

Hello,
I am trying to call sqlldr from a .net application on Windows to bulk load some data. The parameter, control, data, log files used by sqlldr, are all located in the C:\Configuración directory (note the unicode character in the directory name).
Here is my parfile:
control='C:\Configuración\SystemResource.ctl'
direct=true
errors=0
log='C:\Configuración\SystemResource.log'
userid=scott/tiger@orasrv
When I make a call as
sqlldr -parfile='C:\Configuración\SystemResource.par'I am getting
SQL*Loader-100: Syntax error on command-line
If I run it as
sqlldr -parfile='C:\Config~1\SystemResource.par'I am getting
SQL*Loader-522: lfiopn failed for file (C:\Configuraci├│n\SystemResource.log)
If I remove the log= parameter from the parameter file, I am getting
SQL*Loader-500: Unable to open file (C:\Configuraci├│n\SystemResource.ctl)
SQL*Loader-553: file not found
SQL*Loader-509: System error: The system cannot find the file specified.
Can anyone suggest a way to handle unicode/extended ASCII characters in file names?
Thanks,
Alex.

Werner, thank you for replying to my post.
In my real application, I actually store the files in %TEMP%, which on Spanish and Portuguese Windows has "special" characters (e.g. '...\Administrador\Configuración local\Temp\'). In addition, you can have a user with the "special" characters in the name which will become part of %TEMP%.
Another problem is that 8.3 name creation may be disabled on NTFS partitions.
Problem #3 is that the short file names that have "special" characters are not converted correctly by GetShortPathName windows API, e.g. "Configuración" will be converted to "Config~1", but for "C:\ración.txt" the api will return the same "C:\ración.txt", even though dir /x displays "RACIN~1.TXT". Since I am creating the parameter and control files programmatically from a .net application, I have to PInvoke GetShortPathName.
Any other ideas?
Thanks,
Alex.

Java.io.File and non-unicode characters in file name

Unix filesystem object names are byte sequences. These byte sequences are not required to correspond to any character sequence in the current or any locale. How do I open a file if it has characters that do not corrospond to a valid unicode encoding for some current locale? Unless I am missing something, if I do a list on a parent directory that has some file names like this, those file names do not get added to the list. Hmmm....
R.

OK, create.c is a program that will create a file whose name is not a character in the 'ja' locale.
Lister.java defines a class that lists files in the current directory. For each file, it spits out the 'toString()' version of the file, the char array of the name as hex, and the 'getBytes' byte array of the name.
So, what you can do is compile and run create.c, which will create a file whose name is a single byte whose hex value is 99. Then compile and run Lister.java, which will give you the following output (shown for two different locales:
$ export LANG=
$ java Lister
name:?; chars:99,; bytes:99,
$ export LANG=ja
$ java Lister
name:?; chars:fffd,; bytes:3f,
---------------------------------------------Note that when running in the JA locale, there is no character corresponding to byte value 0x99. So, Java uses the replacement character 0xFFFD, and the '?' character 0x3F, as a replacement.
The point is that there are files which Java cannot uniquely represent as a straight String. I suppose we could get the filename via JNI, do the conversion ourselves, and then use the private-use area of Unicode to encode all our strings, but ugh.
//create.c
#include <stdio.h>
int main()
   const char* name = "\x99";
   FILE* file = fopen( name, "w" );
   if( file == NULL )
      printf( "could not open file %s\n", name );
      return 1;
   fclose( file );
   return 0;
// Lister.java
import java.io.*;
public class Lister
    public static void main( String[] args )
        new Lister().run();
    public void run()
        try
            doRun();
        catch( Exception e )
            System.out.println( "Encountered exception: " + e );
    private void doRun() throws Exception
        File cwd = new File( "." );
        String[] children = cwd.list();
        for( int i = 0; i < children.length; ++i )
            printName( children[ i ] );
    private void printName( String s )
        System.out.print( "name:" );
        System.out.print( s );
        System.out.print( "; chars:" );
        printCharsAsHex( s );
        System.out.print( "; bytes:" );
        printBytesAsHex( s );
        System.out.println();
    private void printCharsAsHex( String s )
        for( int i = 0; i < s.length(); ++i )
            char ch = s.charAt( i );
            System.out.print( Integer.toHexString( ch ) + "," );
    private void printBytesAsHex( String s )
        byte[] bytes = s.getBytes();
        for( int i = 0; i < bytes.length; ++i )
            byte b = bytes[ i ];
            System.out.print( Integer.toHexString( unsignedExtension( b ) ) + "," );
    private int unsignedExtension( byte b )
        return (int)b & 0xFF;
}

BIG PROBLEM: CSS files not loading because of international characters in file name

I have Muse Beta 7 in Spanish.
The program creates a css file called master_a-página-maestra.css in css folder. It is referenced in the resulting HTML code here:

<link rel="stylesheet" type="text/css" href="css/site_global.css?3951792836"/>
<link rel="stylesheet" type="text/css" href="css/master_a-p%c3%a1gina-maestra.css?fileVersionPlaceholder"/>
<link rel="stylesheet" type="text/css" href="css/index.css?3948175564"/>
When you work locally in your Windows formatted har drive, everything looks fine, but when you upload your files to a server, everything is screwed up. The server doesn't recognizes the URL and returns an error page, resulting in style errors in the entire site.
This can also happens with html files if the title of the page contains international characters or with images if the image file name in your Hard Drive contains international characters.
I already pointed out long time ago Muse was generating file names with international characters like á, ñ, etc but nobody cared about it. Too bad. I had to activate file name customization but I think that Muse should replace automatically this characters in the same way that it replaces other conflictive characters like commas or ampersands.
This is not a fault of the FTP client. Accented characters are not web safe regardless of the FTP client you are using. It is a server side issue. Some servers support it some others don't. I don't mind if it works in Adobe Catalyst server because the final website is going to be in another server and maybe next year when paid hosting is ended the client may move it to another server.
It makes more sense to replace accented characters in file names by their not accented equivalents ('a' instead of 'á', 'N' instead of 'Ñ', etc) and avoid all this problem.

Zak, It is funny you mention it, because the site I am talking about is hosted in 1and1. try this: http://www.artofwalls.com/rosannawalls/biografía.html
As you can see, the offending "í" in the file name causes 1and1 server to throw a "page not found" error. And this has happened with many other servers I have tried since.
Muse boast of generating code fully compatible with all major web browsers but by using international characters in file names it ggenerates code suitable only for very few web servers. International characters have been always a no-no for internet URLs. Internet was designed by people who didn't care about ascii codes beyond 127 so using international characters in html file names is just call for problems.
"to work with your hosting provider to determine how to enable Unicode encoded (UTF-8) file names and HTML files on their servers" is not a viable solution most of the time unless you are a Very Important Client of your host provider. If not, making changes means money for them and if you are the only one who complains, they are going to just tell you to not use international characters in your names.

Mail.app hang if mail attachment have special characters in file name

Hello,
I have upgraded to 10.5.1 from last version of Tiger.
After upgrade is not possible send message with attachment which have a czech special characters in file name (like á, í, é, ý, ú, ů, ě, š, č, ř, ž) with Mail.app. If I try it, CPU is overloaded and I must kill Mail.app process.
The same problem have my friends who are running a clear installation of Leopard 10.5.1.
Thanks for correction.

I have the same problem with Hungarian chars. I tried out Tomas suggestion and it really worked. It is so dissaponting to have this kind of bug in a system like this. They never tested against this situation.
I hope somebody at apple read this discussion and somehow this will be fixed in the near future.

How to Make Itunes Recognize Foreign Characters in file names?

How to Make Itunes Recognize Foreign Characters in file names?
Any Body, please
DELL Windows XP Pro

That's not how it's supposed to work according to this: http://www.griffintechnology.com/support/italkpro/
By default, a playlist will be created in iTunes called "Voice Memos" and those files will be transferred there automatically. The files themselves can be found on your computer in your iTunes Music folder in Unknown Artist > Unknown Album.
It may be worth working through any trouble shooting articles on that site.
Regards,
Colin R.

French Characters in File Names

Hello,
Despite my instance not use French and/or other special
characters in files names, we have inherited a project (approx 2500
topics) with French characters in the file name. When we generated
the project and post it on an Apache server, the topics do not
display as the server replaces the characters with symbols.
RX5, during the generation process, replaced the French
characters with underscores, RX 7.0.1 keeps the characters within
the file name. In RX 7.0.1 is it possible to replace the characters
with underscores during the generation process?
Regards,

If this is the other thread, then you did get a response, several in fact. It may not have been the answer you wanted but that's another matter.
http://forums.adobe.com/message/129875#129875
See www.grainge.org for RoboHelp and Authoring tips
@petergrainge

Maximum characters in file name of links?

Is there a maximum number of characters in file names of links that InDesign can handle?
If so, what is the character count?
What about Mac OS X?
Thanks in advance.

One thing to bear in mind on the Mac is that the method used to connect to a server also influences how many characters a file name can have before being truncated. This may not be relevant if you work with files on your local drive but we used to connect to our (Windows) server via AFP and occasionally we'd receive Quark or ID docs and links (images) with long file names. Either you couldn't copy the files to the server as the name was too long, or if it had been copied to the server on another computer the doc would have broken links as it expects the link to have full names but the names of the files on the server (or how they appeared when connected by AFP) were truncated, so different. For those instances we'd connect via SMB which allows longer names.
Now we connect via SMB across the board.
Iain

Remove LF characters from file names

I have a folder full of files with filenames that contain LF character (ASCII code 10). I want to use Automator's "Replace Text" funcion to remove these non printing characters from file names. Is there a way to do it?
If automator is not able to do this task, I will take a bash script or applescript solution as well...

Take a look at: http://stackoverflow.com/questions/4417588/sed-command-to-fix-filenames-in-a-dir ectory
(I changed tr -d "\r\n" to tr -d "\n", but try both)
for f in ~/Desktop/*
do
    new="$(printf %s "$f" | tr -d "\n")"
    if [ "$f" != "$new" ]; then
        mv "$f" "$new"
    fi
done

Language Characters in File Name

Hi All,
Need your help to solve a problem which I face in my current interfac. I am currently working on SFTP to File scenario where we use Advantco SFTP Adapter. The problem is SFTP sender adapter doesn't process the files which have Language characters in the File Name. No errors are thrown and no processing of the file happens in Message Monitoring. One probable option which I already tried was to use the text encoding but that works on the content and not the file name. Just wanted to check if any body can help in suggesting some solution. Also, even for content is there a need to change the encoding for Language characters or they should be left as Binary Type. PI is SAP PI 7.11.
How does the standard SAP PI File Adapter handle scenario with File Name having language characters. Is it correctly processed or how do we handle that for correct processing, may be the same can be applied for SFTP adapter.
Best Regards,
Pratik

Hi Pratik,
Under certain operating system platforms, such as Solaris, the
APIs used by the Java Runtime (JRE) are not Unicode-aware.
Consequently, the JRE needs to be configured to correctly
interpret the character set it receives from the operating
system.
This is configured through the "file.encoding" system property as
well as the "LANG" environment variable.
Make sure you set "file.encoding" to a character set (such as
ISO-8859-1) that supports the special characters you would like
to process. This system property can be configured by appending
"-Dfile.encoding=adm user: setenv LANG de.ISO8859-1. For additional details
on 'How to Work with Character Encodings in PI' the following
guide can also be followed:
https://www.sdn.sap.com/irj/servlet/prt/portal/prtroot/com.sap.km
.cm.docs/library/uuid/502991a2-45d9-2910-d99f-8aba5d79fb42
I hope this helps.
Best Regards,
Gábor

Special Characters in File Names

Hello!
I'm experiencing some difficulties with the file names
RoboHelp is producing. I have imported my Frame files by reference
and am generating Web Help.
RoboHelp apparently does not like hyphens/em dashes in file
names - when I generate help, I get files names with unusual
characters in the file names (where the hyphen/em is supposed to
go).
For example:
- A Frame heading 1 is this: Agent HTTP tunneling
—proxy server
- The RoboHelp topic generated is this: agent_http_tunneling
%E2%80%94proxy_server.htm
These characters are breaking my nightly builds. Is there a
way to configure how RoboHelp deals with em dashes? If so, where
would I do that?
Thanks very much for the help!
diane

Maybe
this
post will help.

Find and replace characters in file names

I need to transfer much of my user folder (home) to a non-mac computer. My problem is that I have become too used to the generous file name allowances on the Mac. Many of my files have characters such as "*" "!" "?" and "|". I know these are problems because they are often wild cards (except the pipe). Is there a way that I can do a find and replace for these characters?
For example, search for all files with an "*" and replace the "*" in the file name with an "@" or a letter? I don't mind having to use the terminal for this (I suspect it will be easier).
Is this possible? Does anyone have any suggestions?
Thank you in advance for any help you may be able to provide.
Mac OS X (10.4.8)

Yep.
"A Better Finder Rename" is great for batch file renaming.
http://www.versiontracker.com/dyn/moreinfo/macosx/11366
Renamer4mac may be all you need.
Best check out VersionTracker. In fact everybody should have this site bookmarked and visited daily.
http://www.versiontracker.com/macosx/

Problem w/ combination of non-Western and accented characters in file names

*Description of problem*
- iTunes U .m4v files preview and play perfectly from iTunes U server, but once downloaded locally on a Mac (via Get Tracks or Subscribe), they no longer play in iTunes software, or stand-alone in QuickTime player or Pro.
- If you drag these files out of iTunes, they appear to not copy (actually, they do copy but the files are made invisible).
*What we know:*
- problem only on macs (same behavior on OS X 4.x and OS X.5.4)
- files sync and play fine on iPod Classic 80gb
- files will not sync to iPhone 1st gen.
- same behavior on both iTunes 7 and 8
- same behavior on QuickTime 7.5.0 and 7.5.5
- not happening on any Windows machines
- appears to be caused by the file name (particularly the combination of non-Western, 2-byte characters (Japanese, Chinese) plus accented roman letters (e.g. 大/dà/big; large).
Either of these alone (not combined) in the title work fine.
- if file is renamed with no character or accented letter or given a name at least 26 characters in length, the file suddenly will play (the files can be renamed and fixed this way either in the iTunes software or at the file level itself, within Music/iTunes/Movies/, and also at the Upload/Manage interface of iTunes U).

Yep.
"A Better Finder Rename" is great for batch file renaming.
http://www.versiontracker.com/dyn/moreinfo/macosx/11366
Renamer4mac may be all you need.
Best check out VersionTracker. In fact everybody should have this site bookmarked and visited daily.
http://www.versiontracker.com/macosx/

Portuguese Accent characters in file name

Hi,
I have a created web role with a web application containing HTML files. The HTML file names contains portuguese accent characters. When i publish from visual studio to azure the accent characters change to some different symbol. How do i make visual studio
to publish the file names with accent characters correctly.
Thank You

hi,
Did you check your accent characters file from instances? I suggest you could login on RDP and check the file name whether is same as your local. You could find file follow this way:
login on VM==>>open the local Disk==>>siteroots>>0
Also, you could add the Portuguese language into your isntacnes language list. And change your VM page name as your local page name . And then try to visit using the
accent characters file name. Please try it.
Also, you could try to install the language pack for azire (http://social.technet.microsoft.com/Forums/windowsazure/en-US/75d5c1c8-31bb-4431-97ae-a8b80098a9c6/net-framework-45-language-pack-for-azure-web-role-os-family-3-windows-server-2012?forum=windowsazuretroubleshooting ).
Any update info, please let me know free.
Regards,
Will
We are trying to better understand customer views on social support experience, so your participation in this interview project would be greatly appreciated if you have time. Thanks for helping make community forums a great place.
Click
HERE to participate the survey.

Replace not latin characters from file name.

Dear friends,
I tried a lot but even after reading the documentation about all the available CF functions I could not find a way to accomplish this:
As you already know from my previous post I'm trying to finish an application which manages multiple images. I have reached a really good point and I was ready to implement the solutions you suggest on this post when the client did something marvellous. He uploaded some images named "Εικόνα 123.jpg" (Image 123.jpg in English) which actually broke the Flex application that retrieves the images because the firewall does not allow high bit characters to go through. I now need to add a function that will evaluate every character in the file name individually and it will remove or replace (I do not care) all the characters that are not latin or numbers (spaces, greek characters, special characters, etc). As I've already said no known function is able to do that (as far as I know of course) and I guess that the solution could be hiding in regular expressions which is not my strong point. So I need your help here.
Thank you in advance,
John

Thanks Ian,
I cannot figure out how to use asc() function for this. I will have to run a test in every single character and replace the invalid ones but I will never now the actual string length (how many characters each image name will have) and I will possibly end up destroying the extensions (.jpg) as well. To make it a little more complicated let me tell you that I will have to run this twice. Once for the full image path (ie d:\company\aptown\images\Εικόνα 123.jpg) in which I will have to change only the ...Εικόνα 123... part and not anything else, and once for a comma separated list of the image names (ie Εικόνα 123.jpg, Εικόνα 124.jpg, Εικόνα 125.jpg, Εικόνα 126.jpg). And don't be misleaded from the pattern. The customer may upload an image named with a complitely different way using invalid characters though, for example "Αντίγραφο της Εικόνας 123.jpg" (Copy of the image 123.jpg in English).
Seems to be impossible ,
I hope it is not.
Yannis

Unicode characters in file name

Similar Messages

Maybe you are looking for