How to Determine Text File Encoding is UNICODE

Hi Gurus,
How to determine whether the file is a UNICODE format or not?
I have the file stored as a BLOB column in a table
Thanks,
Sombit

That's a rather hard problem. You would, realistically, either have to make a bunch of simplifying assumptions based on the data or you would want to buy a commercial tool that does character set detection.
There are a number of different ways to encode Unicode (UTF-8, UTF-16, UTF-32, USC-2, etc.) and a number of different versions of the Unicode standard. UTF-8 is one of the more common ways to encode Unicode. But it is popular precisely because the first 127 characters (which is the majority of what you'd find in English text) are encoded identically to 7-bit ASCII. Depending on the size and contents of the document, it may not be possible to determine whether the data is encoded in 7-bit ASCII, UTF-8, or one of the various single-byte character sets that are built off of 7-bit ASCII (ISO 8859-15, Windows-1252, ISO 8859-1, etc).
Depending on how many different character sets you are trying to distinguish between, you'd have to look for binary values that are valid in one character set and not in another.
Justin

Similar Messages

Convert Text file encoding in perticular format(Unicode)

Hi Expert,
I have requirement of transfering text file (encoding) in perticular file format to Application server ,by default SAP system generates in ANSI ,is it possible to convert it to Unicode format like UTF-8.If possible then how to generate the text file in unicode.
Thanks,
Regards

Check
Note 752835 - Usage of the file interfaces in Unicode systems
Markus

OPEN DATASET file FOR OUTPUT IN TEXT MODE ENCODING NON-UNICODE

Hi There,
I also have the similar issue. I am able to write the data into appliaction server in Chinese Characters using :OPEN DATASET datei FOR OUTPUT IN TEXT MODE ENCODING DEFAULT or OPEN DATASET datei FOR OUTPUT IN TEXT MODE ENCODING UTF-8. But when i save that file into my presentation server manually, all the chinese characters are showing as Junk.
When i use OPEN DATASET datei FOR OUTPUT IN TEXT MODE ENCODING NON-UNICODE, giving runtime error and when i use OPEN DATASET datei FOR OUTPUT IN TEXT MODE ENCODING NON-UNICODE IGNORING CONVERSION ERRORS, No error but application server output itself showing as Junk characters.
Could you please suggest me what you have done?
Regards,
Chaitanya A

Hi,
Use this
OPEN DATASET File_path FOR OUTPUT IN TEXT MODE ENCODING NON-UNICODE
WITH SMART LINEFEED
it will definitely work.
Regards,
Manesh. R

Difference between IN LEGACY TEXT MODE & TEXT MODE ENCODING NON-UNICODE

Hi,
We're upgrading to ECC5 and the 'open dataset' command needs amending if the program is flagged for Unicode (which usually occurrs in user/fm exits). Therefore is ECC5 this command is no longer valid:
"open dataset DSN in text mode"
We currently interface with systems that may not have unicode enabled. Yet we have not enabled unicode in our own system just yet.
So we think these two commands are the most approriate for replacing the 'old' open dataset command:
"open dataset DSN for input in TEXT MODE encoding NON-UNICODE"
"open dataset DSN in LEGACY TEXT MODE for input"
However we're not really sure what the difference between these two commands is?
Has anyone worked with these commands?
Could you offer some help as to their differences and when each should be used?
Many thanks!

Hi Robert,
   Here is an excerpt from sap documentation.
... TEXT MODE ENCODING {DEFAULT|UTF-8|NON-UNICODE}
Effect:
The addition IN TEXT MODE opens the file as a text file. The addition ENCODING defines how the characters are represented in the text file. When writing in a text file, the content of a data object is converted to the representation entered after ENCODING, and transferred to the file. If the data type is character-type and flat, trailing blanks are cut off. In the data type string, trailing blanks are not cut off. The end-of-line marking of the relevant platform is applied to the transferred data by default. When reading from a text file, the content of the file is read until the next end-of-line marking, converted from the format specified after ENCODING into the current character format, and transferred to a data object.
The end-of-line marking depends on the operating system of the application server. In the MS Windows operating systems, the markings "CRLF" and " LF" are possible, while under Unix, only "LF" is used. If, when using Windows, an existing file is opened without the TYPE addition (see os_addition), the first end-of-line marking is found and used for the whole file. If a new file is created without the TYPE addition, the content of the profile parameter abap/NTfmode is used. If the profile parameter is not set, "CRLF" is used. If a file with the TYPE addition is opened and a valid value is contained in attr, this value is used.
In Unicode programs, only the content of character-type data objects can be transferred to text files and read from text files. The addition ENCODING must be specified in Unicode programs, and can only be omitted in non-Unicode programs.
The additions after ENCODING determine in which character representation the content of the file is handled.
DEFAULT
In a Unicode system, the designation DEFAULT corresponds to the designation UTF-8, and the designation NON-UNICODE in a non-Unicode system.
UTF-8
The characters in the file are handled according to the Unicode character representation UTF-8.
NON-UNICODE
In a non-Unicode system, the data is read or written without being converted. In a Unicode system,the characters in the file are handled according to the non-Unicode-codepage that would be assigned to the current text environment according to the database table TCP0C, at the time of reading or writing in a non-Unicode system.
If the addition ENCODING is not specified in non-Unicode programs, the addition NON-UNICODE is used implicitly.
... LEGACY TEXT MODE [{BIG|LITTLE} ENDIAN] [CODE PAGE cp]
Effect:
Opening a Legacyfile. The addition IN LEGACY TEXT MODE opens the file as a legacy text file. As with legacy binary files, the byte order and the codepage with which the content of the file should be handled can also be specified. The syntax and meaning of {BIG|LITTLE} ENDIAN and CODE PAGE cp are the same as for legacy binary files.
In contrast to legacy binary files, the trailing blanks in a legacy file are cut off when writing character-type flat data objects in a legacy text file. As for a text file, an end-of-line marking is also applied to the transferred data. In contrast to text files opened with the addition INTEXT MODE, Unicode programs do not check whether the data objects used for reading or writing are character-type. Furthermore, the LENGTH additions of the statements READ DATASET and TRANSFER are used for counting in bytes in legacy text files and in the units of a character represented in the memory for text files.
Note:
As with legacy binary files, text files that have been written in a non-Unicode system can be accessed in Unicode systems as legacy text files, and the content is converted accordingly.
Example
A file test.dat is created as a text file, filled with data, changed, and exported. As every TRANSFER statement applies end-of-line marking to written content, after the change, the content of the file has two lines. The first line contains "12ABCD". The second line contains "890". The character "7" has been overwritten by the end-of-line marking of the first line.
DATA: file   TYPE string VALUE `test.dat`,
      result TYPE string.
OPEN DATASET file FOR OUTPUT IN TEXT MODE ENCODING DEFAULT.
TRANSFER `1234567890` TO file.
CLOSE DATASET file.
OPEN DATASET file FOR UPDATE IN TEXT MODE ENCODING DEFAULT
                             AT POSITION 2.
TRANSFER `ABCD` TO file.
CLOSE DATASET file.
OPEN DATASET file FOR INPUT IN TEXT MODE ENCODING DEFAULT.
WHILE sy-subrc = 0.
READ DATASET file INTO result.
WRITE / result.
ENDWHILE.
CLOSE DATASET file.
Regards,
Ravi

How to read text file line by line...?

how to read text file line by line, but the linefeed is defined by user, return list of string, each line of file is a item of list?
please help me.
Thanks very much

Brynjar wrote:
In Groovy, you would do something like:
linefeed = "\n" //or "\r\n" if the user chose so
lines = new File('pathtofile').text.split("${linefeed}")This is one of the things that has always annoyed me about Sun's sdk, i.e. the lack of easy ways to do things like that. You always end up making your own utilities or use something like Apache's commons.io. Same goes for jdbc and xml - I'll wait for appropriate topics to show how easy that is in Groovy :)I generally agree, but what I really don't like about the Groovy text-file handling niceties: They don't care about encoding/always use the default encoding. And as soon as you want to specify the encoding, it gets a lot more complex (granted, it's still easier than in Java).

What determines the file encoding for ${C:file.txt} = 'abc' ?

What determines the file encoding for
${C:file.txt} = 'abc'
I'm always getting ASCII as the encoding for file.txt after executing that assignment.

Thanks so much. I'll keep looking for the MSFT doc on this. I scanned Bruce Payette's book and did not find anything there.
It turns out to be one of those "by rote" things you have to learn about PowerShell.
My concern about the lack of documentation is that MSFT might change the underlying code in the future to use Unicode and that might break some existing code. If there was some MSFT provided documentation declaring ASCII as the intended encoding they
might provide plenty of warning if they do a switch in encoding.
I note also that if you try to write characters outside the ASCII set (see example below) that character substitution happens to find an ASCII character to use in place of the one outside the ASCII set. In the example below a 'v' is substituted for
the '√' character:
${C:xo.txt} = '√'

Text File Encoding used by TextEdit/OS X

Hi all folks,
does someone know the code page are used with the text file encoding "Western (EBCDIC US)"
available from the "Customize Encodings List" in the TextEdit "Plain Text File Encoding" Preferences.
The text file encoding "Western (EBCDIC Latin 1)" works well, but "EBCDIC US" does not,
the character set is very limited.
Thanks for any help,
Lutz

Yeah unfortunately they're all listed as 0kb files. I guess that means the faulty hard drive didn't transfer them properly, even though the Mac did the copy confirmation sound.
Hundreds of folio files... all gone. ;___;

How to determine the file system on Solaris

Friends,
How to determine which file system I have installed UFS or ZFS on Solaris
Thanks

Other methods would include looking at the /etc/vfstab if it's in there or fstyp(1M):
System Administration Commands fstyp(1M)
NAME
fstyp - determine file system type
SYNOPSIS
fstyp [-a | -v] special [:logical-drive]

How to write text file in Shockwave?

Does anybody know how to write text file in Shockwave to
user's disk?
Thanks in advance.

Those Xtras can wreak to much havoc when used with the wrong
intent.
What you can do is write with setpref and store a list of
saves and the
saves itself seperatly. Then you'd have to build your own
save/open
dialog to let the user:
* pick a previously saved file to load or overwrite
* have the user type the name of a new file to save.
Only thing that remains is that the user cannot decide where
the files
are saved.
Manno
SiuLinda wrote:
> Thanks a lot for your reply.
> Yes, cookies is good but I have to write a program to
save the text file in
> where the user wants, user can open these files later if
they like, like using
> Filextra and Fileio, but I found all of these xtras seem
to be not supported in
> shockwave.
>
Manno Bult
[email protected]

How to load text file data to Oracle Database table?

By using Oracle Forms, how to load text file data to Oracle Database table?

Metalink note 33247.1 explains how to use text_io as suggested by Robin to read the file into a Multi-Row block. However, that article was written for forms 4.5 and uses CREATE_RECORD in a loop. There was another article, 91513.1 describing the more elegant method of 'querying' the file into the block by transactional triggers. Unfortunately this more recent article has disappeared without trace and Oracle deny its existence. I know it existed as I have a printed copy in front of me, and very useful it is too.

How to load text files in GUI

plz tell me .. how to load and compare two text files using file popup's . example file i have attached..
Attachments:
testW_FF.txt ‏2 KB

I don't understand whether your question is on how to load text files or how to show them on a panel or how to compare them... or all aspects together!
The first operation (loading the file) can be accomplished with functions included in the Formatting and I/O Library like OpenFile, ReadFile and so on; with a file like yours even FileToArray could be an option.
How to show the data on screen is heavily dependent on what you intend to do with them: data can be shown in textboxes, listboxes, tables or graphs so... what do you want to do?
The same applies with comparison: without additional details is difficult to give you the proper hint.
Proud to use LW/CVI from 3.1 on.
My contributions to the Developer Zone Community
If I have helped you, why not giving me a kudos?

Step by Step"How JSP read text file :

Hi ,
Any one know or have a good site to show step by step how JSP read text file.
TQ.

There is no difference Between reading a text file from JSP and reading a text file from Java.
Just follow the same steps for JSP also.

How to call text file using Script in Data Integrator

Dear All,
Can any one assit me in how to call a text file using script with the help of Data Integrator.
and one question ?
M having 32 csv files i want to club thos 32 csv files into one table with the help of Data Integrator, can
any one assist me.

mary,
since you knew the file name ,when clicked in name send to server,read the file and write to servlet outputstream.
I think this would help you.
If anything wrong in mycode ..forums will help you further
BufferedInputStream bis=null;
BufferedOutputStream bos=null;
int bytesRead=0;
byte buff[]=new byte[1024];
File f=new File(test.txt);
try{
     bis= new BufferedInputStream(new FileInputStream(f));
     bytesRead=bis.read(buff,0,buff.length);
     if(bytesRead!=-1){
          // create a BufferedOutputStream from ServletOutputStream
          bos=new BufferedInputStream(response.getOutputStream());
          do{
               bos.write(buff,0,bytesRead);
          }while((bytesRead=bis.read(buff,0,buff.length))!=-1)
}catch(Exception e){
     ////error handling
     }

How to set the file.encoding in jvm?

I have some error in showing Chinese by used servlet,
some one tole me that I can change the file.encoding in jvm to zh_CN, how can I do that?

Add the java argument in your servlet engine.
e.g
java -Dfile.encoding=ISO8859-1
garycafe

How to read text file content in portal application?

Hi,
How do we read text file content in portal application?
Can anyone forward the code to do do?
Regards,
Anagha

Check the code below. This help you to know how to read the text file content line by line. You can display as you require.
IUser user = WPUMFactory.getServiceUserFactory().getServiceUser("cmadmin_service");
IResourceContext resourceContext = new ResourceContext(user);
String filePath = "/documents/....";
RID rid = RID.getRID(filePath);
IResource resource = ResourceFactory.getInstance().getResource(rid,resourceContext);
InputStream inputStream = resource.getContent().getInputStream();
BufferedReader reader = new BufferedReader(new InputStreamReader(inputStream));
String line = reader.readLine();
while(line!=null) {
line = reader.readLine();
//You can append in string buffer to get file content as string object//
Regards,
Yoga

How to Determine Text File Encoding is UNICODE

Similar Messages

Maybe you are looking for