Write Japanese characters to CSV

In my JSP I am doing the following
<head>
      <script type="text/javascript">
      var newWindow;
      function checkForCSVExport()
            <%
                  String results = (String) session.getAttribute("EXCEL_FB_DATA") ;
                  String URL = (String) Form.makeUrl(request,"/custom/jsp/search/load_csv.jsp");
                  if(null != results)
                        out.println(
                                    "newWindow=window.open(\""
                                    + URL
                                    + "\", \"export\", \"toolbar=no,menubar=no,width=200,height=200,resizable=no\");");
            %>
</script>
</head>the String results contain comma seperated values with Japanese characters in them. We need to write these Japanese characters to CSV. For this we have a JSP page called load_csv.jsp. Here are its contents
<%@ page contentType="application/csv; charset=UTF-8" %>
<%@ page import="com.components.search.SearchResultsFeedback"%>
<%
      String strFBResults = (String)session.getAttribute("EXCEL_FB_DATA");
      String strFBHdr = (String)session.getAttribute("EXCEL_FB_HEADER");
      response.setHeader("Content-Disposition", "attachment;filename=searchresult.csv");
%>
<%=strFBHdr%>
<%=strFBResults%>this works fine for English characters but Japanese characters are displayed as junk characters in CSV. Please help.

Thanks for replying. I am using UTF-8 throughout.
Also, there is an interesting thing that I observed. If I right click on the CSV file and select "Open with --> Word" then the word application asks me
"Select the encoding that makes your document readable". If I select "Unicode (UTF-8)", Japanese characters appear perfectly fine. But why doesn't this happen in excel as that is also UTF-8 compliant since I can type japanese characters into the same excel file.

Similar Messages

How to write Japanese characters (kanji) with OS X 10.5.8

How can I write text in Japanese characters (kanji), switching from English or Dutch and back in OS X 10.5.8 ?

walter Van Geyt wrote:
Maybe there is a way to input Hiragana, after which a choice of Kanji is offered, from which to choose the right one since many Kanji are pronounced the same way but have a different meaning (and Hiragana and Katakana are phonetic).
Of course, that is exactly what everyone does with Kotoeri. Perhaps you did not see Page 2 of the site?
http://redcocoon.org/cab/j4mactyping.html
You hit the space bar to bring up the Kanji choices.

Cannot write in Japanese characters on Firefox 7.0 using Mac OSX 10.5.8

Ever since I upgraded to Firefox 7.0, I cannot input Japanese characters into Firefox. I use a Mac OSX 10.5.8. I have my international settings such that I can write in this and other languages, but none come up other than English.
I can write in Japanese on Microsoft Word, Safari, and a slew of other applications, so it's definitely a Firefox issue.

The following worked for me:
http://support.mozilla.com/en-US/questions/684867
"Problem: I had the same problem but with mac osx 10.5.8, Firefox 3.6.6 sometimes prints and sometimes does not, but the preview was always blank and I could not save to pdf file.
Solution: I downloaded Firefox 3.5.10, installed it on my desktop and opened it, tried print preview and it worked, quit Firefox 3.5.10 and ran 3.6.6 and voila! print preview is working fine and I can save to pdf fine.
I guess running 3.5.10 fixes the profile in a certain way. I have no idea how that happened but it worked."

Oracle Report Server Issue with Japanese Characters

We are trying to setup a Oracle Report Server to print the Japanese characters in the PDF format.
We have separate Oracle Report servers for printing English, Chinese and Vietnamese characters in PDF formats using Oracle Reports in the production which are running properly with Unix AIX version 5.3. Now we have a requirement to print the Japanese characters. Hence we tried to setup the new server for the same and the configurations are done as same as Chinese/Vietnamese report servers. But we are not able to print the Japanese characters.
I am providing the details which we followed to configure this new server.
1.     We have modified the reports.sh to map the proper NLS_LANG (JAPANESE_AMERICA.UTF8) and other Admin folder settings.
2.     We have configured the new report server via OPMN admin.
3.     We have copied the arialuni.ttf to Printers folder and we have converted this same .ttf file in AFM format. This AFM file has been copied to $ORACLE_HOME/guicommon/gk/JP_Admin/AFM folder.
4.     We have modified the uifont.ali (JP_admin folder) file for font subsetting.
5.     We have put an entry in JP_admin/PPD/datap462.ppd as *Font ArialUnicodeMS: Standard "(Version 1.01)" Standard ROM
6.     We have modified the Tk2Motif.rgb (JP_admin folder) file for character set mapping (Tk2Motif*fontMapCs: iso8859-1=UTF8) as we have enabled this one for other report servers as well.
Environment Details:-
Unix AIX version : 5300-07-05-0831
Oracle Version : 10.1.0.4.2
NLS_LANG : JAPANESE_AMERICA.UTF8
Font Mapping : Font Sub Setting in uifont.ali
Font Used for Printing : arialuni.ttf (Font Name : Arial Unicode MS)
The error thrown in the rwEng trace (rwEng-0.trc) file is as below
[2011/9/7 8:11:4:488] Error 50103 (C Engine): 20:11:04 ERR REP-3000: Internal error starting Oracle Toolkit.
The error thrown when trying to execute the reports is…
REP-0177: Error while running in remote server
Engine rwEng-0 crashed, job Id: 67
Our investigations and findings…
1.     We disabled the entry Tk2Motif*fontMapCs: iso8859-1=UTF8 in Tk2Motif.rgb then started the server. We found that no error is thrown in the rwEng trace file and we are able to print the report also in PDF format… (Please see the attached japarial.pdf for your verification) but we are able to see only junk characters. We verified the document settings in the PDF file for ensuring the font sub set. We are able to see the font sub setting is used.
2.     If we enable the above entry then the rwEng trace throwing the above error (oracle toolkit error) and reports engine is crashed.
It will be a great help from you if you can assist us to resolve this issue…

Maybe 7zip or another tool has workarounds for broken file names, you could try that.
Or you could try to go over the files in the zip archive one-by-one and write it to files out-1, out-2, ..., out-$n without concerning yourself with the file names. You could get file endings back via the mimetype.
This script might work:
#include <stdio.h>
#include <zip.h>
static const char *template = "./out-%04d.bin";
int main(int argc, char**argv)
int err = 0;
zip_t *arc = zip_open((const char*)argv[1], ZIP_RDONLY, &err);
if(arc == NULL)
printf("Failed to open ZIP, error %d\n", err);
return -1;
zip_int64_t n = zip_get_num_entries(arc, 0);
printf("%s: # of packed files: %d\n", argv[1], n);
for(int i = 0; i < n; i++)
zip_stat_t stat;
zip_stat_index(arc, i, ZIP_FL_UNCHANGED, &stat);
char buf[stat.size];
char oname[sizeof(template)];
zip_file_t *f = zip_fopen_index(arc, (zip_int64_t)i, ZIP_FL_UNCHANGED);
zip_fread(f, (void*)&buf[0], stat.size);
snprintf(&oname[0], sizeof(template), template, i);
FILE *of = fopen(oname, "wb");
fwrite(&buf[0], stat.size, 1, of);
printf("%s: %s => %lu bytes\n", argv[1], oname, stat.size);
zip_fclose(f);
fclose(of);
zip_close(arc);
return 0;
Compile with
gcc -std=gnu99 -O3 -o unzip unzip.c -lzip
and run as
./unzip $funnyzipfile
You should get template-named, numbered output files in the current directory.
Last edited by 2ion (2015-05-21 23:09:29)

Specify File Encoding(Japanese Characters) for UTL_FILE in Oracle 10g

Hi All,
I am creating a text file using the UTL_FILE package. The database is Oracle 10G and the charset of DB is UTF-8.
The file is created on the DB Server machine itself which is a Windows 2003 machine with Japanese OS. Further, some tables contain Japanese characters which I need to write to the file.
When these Japanese characters are written to the text file they occupy 3 bytes instead of 1 and distort the format of the file, I need to stick to.
Can somebody suggest, is there a way to write the Japanese character in 1 byte or change the encoding of the file type to something else viz. ShiftJIS etc.
Thanking in advance,
Regards,
Tushar

Are you using the UTL_FILE.FOPEN_NCHAR function to open the files?
Cheers, APC

Saving a file in a with a file name containing Japanese Characters

Hi,
I hope some genius out there comes up with the solution to this problem
Here it is :
I am trying to save some files using a java program from the command console , and the file name contains japanese characters. The file names are available to me as string values from an InputStream. When I try to write to a File Object containing the japanese characters , I get something like ?????.txt . I found out that I am able to save the files using the unicode value of the java characters.
So I realize that the trick is to convert the streaming japanese characters , character by character into their respective unicode value and then Create A File Object with that name. The problem is -> I cant find any standard method to convert these characters into their unicode values. Does anyone have a better solution ? Remember , its not writing japanese characters to a file , but creating a file with japanese characters in the file name !!!!
Regards
Chandu

retrive a byte array out of the input Stream and store the values in String using the condtructor
String(byte [] bytes, String enc)
where encoding would be Shift_Jis for japanese I guess.
Now to understand this concept basically all the Strings are unicode however when you are passing a byte array String has no means to know what is the encoding of the byte array, which is being used to instantiate the String value so if no encoding is specified it takes the System value which is mostly iso-8859-1. This leads to displaying ?
However in case you know the encoding of the array specifying that in the constructor would be a real help.

Issue with Japanese characters in files/filenames in terminal.

I recently downloaded a zip file with Japanese characters in the archive and in the files within the archive. The name of the archive is "【批量下载】パノプティコン労働歌第一等.zip"
The characters are properly displayed in firefox, chrome, and other applications, but in my terminal some of the characters appear corrupted. Screenshot: https://i.imgur.com/4R22m0D.png
Additionally, this leads to corruption of the files in the archive. When I try to extract the files, this is what happens:
% unzip 【批量下载】パノプティコン労働歌第一等.zip
Archive: 【批量下载】パノプティコン労働歌第一等.zip
extracting: +ii/flac/Let's -+-ʦ1,000,000-.flac bad CRC 5f603d51 (should be debde980)
extracting: +ii/flac/+ѦѾP++ -instrumental-.flac bad CRC 78b93a2d (should be 3501d555)
extracting: +ii/flac/----.flac bad CRC ddeb1d3e (should be c05ae84f)
extracting: +ii/flac/+ѦѾP++.flac bad CRC 0ccf2725 (should be be2b58f1)
extracting: +ii/flac/Let's -+-ʦ1,000,000--instrumental-.flac bad CRC 67a39f8e (should be ece37917)
extracting: +ii/flac/.flac bad CRC f90f3aa0 (should be 41756c2c)
extracting: +ii/flac/ -instrumental-.flac bad CRC 3be03344 (should be 0b7a9cea)
extracting: +ii/flac/---- -instrumental-.flac bad CRC 569b6194 (should be adb5d5fe)
I'm not sure what could be the cause of this. I'm using uxterm with terminus as my main font and IPA gothic (a Japanese font) as my secondary font. I have a Japanese locale set up and have tried setting LANG=ja_JP.utf8 before, but the results never change.
Also, this issue isn't just with this file. This happens with nearly all archives that have Japanese characters associated with it.
Has anyone encountered this issue before or knows what might be wrong?
Last edited by Sanbanyo (2015-05-21 03:12:56)

Maybe 7zip or another tool has workarounds for broken file names, you could try that.
Or you could try to go over the files in the zip archive one-by-one and write it to files out-1, out-2, ..., out-$n without concerning yourself with the file names. You could get file endings back via the mimetype.
This script might work:
#include <stdio.h>
#include <zip.h>
static const char *template = "./out-%04d.bin";
int main(int argc, char**argv)
int err = 0;
zip_t *arc = zip_open((const char*)argv[1], ZIP_RDONLY, &err);
if(arc == NULL)
printf("Failed to open ZIP, error %d\n", err);
return -1;
zip_int64_t n = zip_get_num_entries(arc, 0);
printf("%s: # of packed files: %d\n", argv[1], n);
for(int i = 0; i < n; i++)
zip_stat_t stat;
zip_stat_index(arc, i, ZIP_FL_UNCHANGED, &stat);
char buf[stat.size];
char oname[sizeof(template)];
zip_file_t *f = zip_fopen_index(arc, (zip_int64_t)i, ZIP_FL_UNCHANGED);
zip_fread(f, (void*)&buf[0], stat.size);
snprintf(&oname[0], sizeof(template), template, i);
FILE *of = fopen(oname, "wb");
fwrite(&buf[0], stat.size, 1, of);
printf("%s: %s => %lu bytes\n", argv[1], oname, stat.size);
zip_fclose(f);
fclose(of);
zip_close(arc);
return 0;
Compile with
gcc -std=gnu99 -O3 -o unzip unzip.c -lzip
and run as
./unzip $funnyzipfile
You should get template-named, numbered output files in the current directory.
Last edited by 2ion (2015-05-21 23:09:29)

[Bug Report] CR4E V2: Exported PDF displays Japanese characters incorrectly

We now plan to transport a legacy application from VB to Java with Crystal Reports for Eclipse. It is required to export report as PDF file, but result PDFs display Japanese characters incorrectly for field with some mostly used Japanese fonts (MS Gothic & Mincho).
Here is our sample Crystal Reports project: [download related resources here|http://sites.google.com/site/cr4eexportpdf/example-of-cr4e-export-pdf]
1. PDFExportSample.rpt located under ..\src contains fields with different Japanese fonts.
2. Run SampleViewerFrameClient#main(..) to open a Java Report Viewer:
    a) At zoom rate 100%, everything is ok.
    b) Change zoom rate to 200% or 50%, some fields in Japanese font collapse.
    c) Export to PDF file,
         * Fonts "MS Gothic & Mincho": both ASCII & Japanese characters failed.
         * Fonts "Meiryo & HGKyokashotai": everything works well.
         * Open PDF properties, you will see all fonts are embedded with built-in encoding.
         * Interest to note that copy collapsed Japanese characters from Acrobat Reader, then
           paste them into a Notepad window, Notepad will show the correct Japanese characters anyway.
           It seems PDF export in CR4E mistaking to choose right typeface for Japanese characters
           from some TTF file.
3. Open PDFExportSample.rpt in Crystal Report 2008 Designer (trial version), and export it as PDF.
    The result PDF displays both ASCII & Japanese characters without any problem.
Test environment as below:
* Windows XP Professional SP3 (Japanese) with MS Office which including extra fonts (i.e. HGKyokashotai)
* Font version: MS Gothic, Mincho, Meiryo, all in Version 5.0
    You can download MS Meiryo from Microsoft's Site:
    http://www.microsoft.com/downloads/details.aspx?familyid=F7D758D2-46FF-4C55-92F2-69AE834AC928&displaylang=en)
* Eclipse 3.5.2
* Crystal Reports for Eclipse, V2, 12.2.207.r916
Can this problem be fixed? If yes how long will it take to release a patch?
We really looking forward to a solution before abandoning CR4E.
Thanks for any reply.

I have created a [simple PDF file|http://sites.google.com/site/cr4eexportpdf/inside-the-pdf/simple.pdf?attredirects=0&d=1] exported from CR4E. It is expected to display "漢字" (or in unicode as "\u6F22\u5B57"), but instead being rendered in different ones of "殱塸" (in unicode as "\u6BB1\u5878").
Look inside into this simple PDF file (you can just open it with your favorite text editor), here is its page content:
8 0 obj
<</Filter [ /FlateDecode ] /Length 120>>
stream ... endstream
endobj
Decode this stream, we get:
/DeviceRGB cs
/DeviceRGB CS
q
1 0 0 1 0 841.7 cm
13 -13 569.2 -815.7 re W n
BT
1 0 0 1 25.75 -105.6 Tm     <-- text position
0 Tr
/ttf0 10 Tf                 <-- apply font
0 0 0 sc
( !)Tj                      <-- show glyphs [20, 21], which index is to embedded TrueType font subset
ET
Q
The only embeded font subset is defined as:
9 0 obj /ttf0 endobj
10 0 obj /AAAAAA+MSGothic endobj
11 0 obj
<< /BaseFont /AAAAAA+MSGothic
/FirstChar 32
/FontDescriptor 13 0 R
/LastChar 33
/Subtype /TrueType
/ToUnicode 18 0 R                            <-- point to a CMap object
/Type /Font
/Widths 17 0 R >>
endobj
12 0 obj [ 0 -140 1000 859 ] endobj
13 0 obj
<< /Ascent 860
/CapHeight 1001
/Descent -141
/Flags 4
/FontBBox 12 0 R
/FontFile2 14 0 R                            <-- point to an embedded TrueType font subset
/FontName /AAAAAA+MSGothic
/ItalicAngle 0
/MissingWidth 1000
/StemV 0
/Type /FontDescriptor >>
endobj
The CMap object after decoded is:
18 0 obj
/CIDInit /ProcSet findresource begin 12 dict begin begincmap /CIDSystemInfo <<
/Registry (AAAAAB+MSGothic) /Ordering (UCS) /Supplement 0 >> def
/CMapName /AAAAAB+MSGothic def
1 begincodespacerange <20> <21> endcodespacerange
2 beginbfrange
<20> <20> <6f22>                         <-- "u6F22"
<21> <21> <5b57>                         <-- "u5B57"
endbfrange
endcmap CMapName currentdict /CMap defineresource pop end end
endobj
I can write out the embedded TrueType font subset (= "14 0 obj") to a file named "[embedded.ttc|http://sites.google.com/site/cr4eexportpdf/inside-the-pdf/embedded.ttf?attredirects=0&d=1]", which is really a tiny TrueType font file containing only the wrong typefaces for "漢" & "字". It seems everything OK except CR4E failed to choose right typefaces from the TrueType file (msgothic.ttc).
Is it any help? I am looking forward to any solution.

Create HTML file that can display unicode (japanese) characters

Hi,
Product:           Java Web Application
Operating system:     Windows NT/2000 server, Linux, FreeBSD
Web Server:          IIS, Apache etc
Application server:     Tomcat 3.2.4, JRun, WebLogic etc
Database server:     MySQL 3.23.49, MS-SQL, Oracle etc
Java Architecture:     JSP (presentation) + Java Bean (Business logic)
Language:          English, Japanese, chinese, italian, arabic etc
Through our java application we need to create HTML files that have to display unicode text. Our present works well with English and most of the european character set. But when we tried to create HTML files that will display unidoce text, say japanese, only ???? is getting displayed. Following is the code we have used. The out on the browser displays the japanese characters correctly. But the created file displays only ??? in place of japanese chars. Can anybody tell how can we do it?
<%
String s = request.getParameter( "txt1" );
out.println("Orignial Text " + s);
//for html output
String f_str_content="";
f_str_content = f_str_content +"<HTML><HEAD>";
f_str_content = f_str_content +"<META content=\"text/html; charset=utf-8\" http-equiv=Content-Type></HEAD>";
f_str_content = f_str_content +"<BODY> ";
f_str_content = f_str_content +s;
f_str_content = f_str_content +"</BODY></HTML>";
f_str_content = new String(f_str_content.getBytes("8859_9"),"Shift_JIS");
out.println("file = " + f_str_content);
          byte f_arr_c_buffer1[] = new byte[f_str_content.length()];
f_str_content.getBytes(0,f_str_content.length(),f_arr_c_buffer1,0);
          f_arr_c_buffer1 = f_str_content.getBytes();
FileOutputStream l_obj_fout; //file object
//file object for html file
File l_obj_f5 = new File("jap127.html");
if(l_obj_f5.exists()) //for dir check
l_obj_f5.delete();
l_obj_f5.createNewFile();
l_obj_fout = new FileOutputStream(l_obj_f5); //file output stream for writing
for(int i = 0;i<f_arr_c_buffer1.length;i++ ) //for writing
l_obj_fout.write(f_arr_c_buffer1);
l_obj_fout.close();
%>
thanx.

Try changing the charset attribute within the META tag from 'utf-8' to 'SHIFT_JIS' or 'utf-16'. One of those two ought to do the trick for you.
Hope that helps,
Martin Hughes

How to upload file containing Japanese characters in AL11

Hi All,
I'm trying to modify a program that extracts a text file from local drive then upload the file to sap directory. The input text file contains Japanese characters. When I view the file created, it looks like the one below:
#º#®#~#^#ì#q ¼ÓÔ¼·Ï·º
#ì#ç#ß#q ÐÅÐÁÂÞº
The code that I am fixing is below:
   open dataset pv_name for output in text mode encoding non-unicode
   ignoring conversion errors.
open dataset pv_name for output in legacy text mode code page '8000' ignoring conversion errors.
*OPEN DATASET pv_name FOR OUTPUT IN TEXT MODE ENCODING UTF-8 WITH BYTE-ORDER MARK.
    if sy-subrc = 0.
    LOOP AT pt_input.
      TRANSFER pt_input TO pv_name.
      IF SY-SUBRC NE 0.
        WRITE:/ 'Error writing file'(011), pv_name.
        STOP.
      ENDIF.
    ENDLOOP.
   endif.
Close dataset
    CLOSE DATASET pv_name.
Any suggestions on how to resolve this one?
Thanks a lot in advance.

I didnt said that this will resolve your errors. But using this is the same of ignoring comiler errors....
As you didnt said anything about codepages you're using no help is possible, you didnt mentioned if your SAP system is Unicode, Non-Unicode or even a MDMP system.
You need to figure out which codepage the file has on the presentation server and which codepage your SAP system is using. And it can be that no conversion is possible cause both systems do not have any character in common.

Japanese characters retrieved from UTF8 d/b from excel

Hi All,
I am generating a csv file(comma seperated) through a query from Oracle 9i database. There is one field which is Japanese. Our database is UTF8 enabled and when the csv file is opened from notepad/textpad then it is showing the Japanese characters properly but when we are opening the file from excel(which is our requirement) then the data is not coming properly.
I am copying the data below directly from the excel sheet. CLIENT_NAME_LOCAL(NVARCHAR2 field) is the field which captures japanese. It can be seen that the data for the FUND_CODE=811018 is correctly coming but for 809985 it is seen that the CLIENT_NAME_LOCAL and FUND_CODE columns are getting concatenated with a ・ sign in the middle and so in the FUND_CODE column the FROM_DATE value is coming, though the delimition ',' can be seen between the two fields when I'm opening the file from notepad. It is to be noted that I've used the CONVERT function in my query after this to change the CLIENT_NAME_LOCAL column to 'JA16SJIS' characterset but nothing got changed.
N.B- I've copy and paste the data from excel so in the html format it seems that the FUND_CODE and FROM_DATE values are on the same vertical line but it is not so.
==========================================================
TYPE CLIENT_NAME_LOCAL FUND_CODE FROM_DATE
AN 譚ｱ驍ｦ逑ｦ譁ｯ譬ｪ蠑丈ｼ夂､ｾ 811018 01/09/2005
AN 譚ｱ驍ｦ逑ｦ譁ｯ譬ｪ蠑丈ｼ夂､ｾ 811018 01/09/2005
AN 譚ｱ驍ｦ逑ｦ譁ｯ譬ｪ蠑丈ｼ夂､ｾ 811018 01/09/2005
AN 譚ｱ驍ｦ逑ｦ譁ｯ譬ｪ蠑丈ｼ夂､ｾ 811018 01/09/2005
AN 譚ｱ驍ｦ逑ｦ譁ｯ譬ｪ蠑丈ｼ夂､ｾ 811018 01/09/2005
AN 譚ｱ驍ｦ逑ｦ譁ｯ譬ｪ蠑丈ｼ夂､ｾ 811018 01/09/2005
AN 譚ｱ驍ｦ逑ｦ譁ｯ譬ｪ蠑丈ｼ夂､ｾ 811018 01/09/2005
AN 譚ｱ驍ｦ逑ｦ譁ｯ譬ｪ蠑丈ｼ夂､ｾ 811018 01/09/2005
AN 譚ｱ驍ｦ逑ｦ譁ｯ譬ｪ蠑丈ｼ夂､ｾ 811018 01/09/2005
AN 譬ｪ蠑丈ｼ夂､ｾ縲蝠・飴荳我ｺ・809985 01/09/2005
AN 譬ｪ蠑丈ｼ夂､ｾ縲蝠・飴荳我ｺ・809985 01/09/2005
AN 譬ｪ蠑丈ｼ夂､ｾ縲蝠・飴荳我ｺ・809985 01/09/2005
AN 譬ｪ蠑丈ｼ夂､ｾ縲蝠・飴荳我ｺ・809985 01/09/2005
AN 譬ｪ蠑丈ｼ夂､ｾ縲蝠・飴荳我ｺ・809985 01/09/2005
AN 譬ｪ蠑丈ｼ夂､ｾ縲蝠・飴荳我ｺ・809985 01/09/2005
AN 譬ｪ蠑丈ｼ夂､ｾ縲蝠・飴荳我ｺ・809985 01/09/2005
AN 譬ｪ蠑丈ｼ夂､ｾ縲蝠・飴荳我ｺ・809985 01/09/2005
AN 譬ｪ蠑丈ｼ夂､ｾ縲蝠・飴荳我ｺ・809985 01/09/2005
Data in notpad
====================
=======================================================
TYPE,CLIENT_NAME_LOCAL,FUND_CODE,FROM_DATE,
AN,東邦瓦斯株式会社,811018,01/09/2005,
AN,東邦瓦斯株式会社,811018,01/09/2005,
AN,東邦瓦斯株式会社,811018,01/09/2005,
AN,東邦瓦斯株式会社,811018,01/09/2005,
AN,東邦瓦斯株式会社,811018,01/09/2005,
AN,東邦瓦斯株式会社,811018,01/09/2005,
AN,東邦瓦斯株式会社,811018,01/09/2005,
AN,東邦瓦斯株式会社,811018,01/09/2005,
AN,東邦瓦斯株式会社,811018,01/09/2005,
AN,株式会社　商船三井,809985,01/09/2005,
AN,株式会社　商船三井,809985,01/09/2005,
AN,株式会社　商船三井,809985,01/09/2005,
AN,株式会社　商船三井,809985,01/09/2005,
AN,株式会社　商船三井,809985,01/09/2005,
AN,株式会社　商船三井,809985,01/09/2005,
AN,株式会社　商船三井,809985,01/09/2005,
AN,株式会社　商船三井,809985,01/09/2005,
AN,株式会社　商船三井,809985,01/09/2005,
Thanks & Regards,
Sudipta

You can open UTF-8 files in excel:
1. change file extension to .txt
2. in excel: Open/File -> point to your file
3. excel opens file convert dialog, in "file origin" field choose: "65001:Unicode (UTF-8)"
4. proceed with other setting - You got it!
This procedure work for sure in Excel 2003
Regards
Pawel

Collation issue with Japanese characters in Oracle8i

Hi,
I have japanese data in a varchar2 column in an Oracle8i instance which contains both single byte and multibyte Japanese characters. The encoding type of the instance is UTF-8.
I want to sort them in such a way that equivalent single byte and multibyte Japanese characters are treated as same. Also while selecting, if I specify the single byte characters in the where condition it should select both single and double byte characters and vice-versa.
The functionality I'm looking for is similar to the one which can be achieved by using Collator class in Java with FULL_DECOMPOSITION as decmposition mode.
Could anyone please let me know how can I do it ?
Thanks in advance.
Best Regards,
Sourav

Maybe 7zip or another tool has workarounds for broken file names, you could try that.
Or you could try to go over the files in the zip archive one-by-one and write it to files out-1, out-2, ..., out-$n without concerning yourself with the file names. You could get file endings back via the mimetype.
This script might work:
#include <stdio.h>
#include <zip.h>
static const char *template = "./out-%04d.bin";
int main(int argc, char**argv)
int err = 0;
zip_t *arc = zip_open((const char*)argv[1], ZIP_RDONLY, &err);
if(arc == NULL)
printf("Failed to open ZIP, error %d\n", err);
return -1;
zip_int64_t n = zip_get_num_entries(arc, 0);
printf("%s: # of packed files: %d\n", argv[1], n);
for(int i = 0; i < n; i++)
zip_stat_t stat;
zip_stat_index(arc, i, ZIP_FL_UNCHANGED, &stat);
char buf[stat.size];
char oname[sizeof(template)];
zip_file_t *f = zip_fopen_index(arc, (zip_int64_t)i, ZIP_FL_UNCHANGED);
zip_fread(f, (void*)&buf[0], stat.size);
snprintf(&oname[0], sizeof(template), template, i);
FILE *of = fopen(oname, "wb");
fwrite(&buf[0], stat.size, 1, of);
printf("%s: %s => %lu bytes\n", argv[1], oname, stat.size);
zip_fclose(f);
fclose(of);
zip_close(arc);
return 0;
Compile with
gcc -std=gnu99 -O3 -o unzip unzip.c -lzip
and run as
./unzip $funnyzipfile
You should get template-named, numbered output files in the current directory.
Last edited by 2ion (2015-05-21 23:09:29)

While loading through External Tables, Japanese characters wrong load

Hi all,
I am loading a text file through External Tables. While loading, japanese characters are loading as junk characters. In text file, the characters are showing correctly.
My spool file
SET ECHO OFF
SET VERIFY OFF
SET Heading OFF
SET LINESIZE 600
SET NEWPAGE NONE
SET PAGESIZE 100
SET feed off
set trimspool on
spool c:\SYS_LOC_LOGIC.txt
select CAR_MODEL_CD||',' || MAKER_CODE||',' || CAR_MODEL_NAME_CD||',' || TYPE_SPECIFY_NO||',' ||
     CATEGORY_CLASS_NO||',' || SPECIFICATION||',' || DOOR_NUMBER||',' || RECOGNITION_TYPE||',' ||
     TO_CHAR(SALES_START,'YYYY-MM-DD') ||',' || TO_CHAR(SALES_END,'YYYY-MM-DD') ||',' || LOGIC||',' || LOGIC_DESCRIPTION
from Table where rownum < 100;
spool off
My External table load script
CREATE TABLE SYS_LOC_LOGIC
     CAR_MODEL_CD                         NUMBER               ,
     MAKER_CODE                              NUMBER,
     CAR_MODEL_NAME_CD                    NUMBER,
     TYPE_SPECIFY_NO                         NUMBER               ,
     CATEGORY_CLASS_NO                    NUMBER               ,
     SPECIFICATION                         VARCHAR2(300),
     DOOR_NUMBER                              NUMBER,
     RECOGNITION_TYPE                    VARCHAR2(30),
     SALES_START                          DATE ,
     SALES_END                               DATE ,
     LOGIC                                   NUMBER,
     LOGIC_DESCRIPTION                    VARCHAR2(100)
ORGANIZATION EXTERNAL
TYPE ORACLE_LOADER
DEFAULT DIRECTORY XMLTEST1
ACCESS PARAMETERS
RECORDS DELIMITED BY NEWLINE
FIELDS TERMINATED BY ','
MISSING FIELD VALUES ARE NULL
                    CAR_MODEL_CD,MAKER_CODE,CAR_MODEL_NAME_CD,TYPE_SPECIFY_NO,
                    CATEGORY_CLASS_NO,SPECIFICATION,DOOR_NUMBER,RECOGNITION_TYPE,
                    SALES_START date 'yyyy-mm-dd', SALES_END      date 'yyyy-mm-dd',
                    LOGIC, LOGIC_DESCRIPTION
LOCATION ('SYS_LOC_LOGIC.txt')
--location ('products.csv')
REJECT LIMIT UNLIMITED;
How to solve this.
Thanks in advance,
Pal

Just so I'm clear, user1 connects to the database server and runs the spool to generate a flat file from the database. User2 then uses that flat file to load that data back in to the same database? If the data isn't going anywhere, I assume there is a good reason to jump through all these unload and reload hoops rather than just moving the data from one table to another...
What is the NLS_LANG set in the client's environment when the spool is generated? Note that the NLS_CHARACTERSET is a database setting, not a client setting.
What character set is the text file? Are you certain that the text file is UTF-8 encoded? And not encoded using the operating system's local code page (assuming the operating system is capable of displaying Japanese text)
There is a CHARACTERSET parameter for the external table definition, but that should default to the character set of the database.
Justin

Japanese Characters Reading Errors on WinXP - Japanese language mode ...

I have an application that reads japanese characters encoded in Shift-JIS from a web page.
I use the following method :
BufferedReader dis = new BufferedReader(
new InputStreamReader( urlConnection.getInputStream(),"SJIS"));
On WinXP, Win98, Win2000 - english version - there are no problems, the characters are read and display CORRECTLY. Also the program work correctly.
BUT, on WinXP - japanese version - THE SAME APPLICATION give the following ERRORS :
Warning : Default charset MS932 not supported, using ISO-8859-1 instead.
java.io.UnsupportedEncodingException: SJIS
Because of this erros(I suppose) the colours of swing components are changed(I mean there are a lot of strange colors instead of the colors I set in the program - for example red is change to green, yellow to blue ...)
Can you help me ?
Regards,
Cata

I have written a java program that can write japanese+english letters data to a tab delimited .xls file.
using SJIS as the encoding scheme. I can see japanese data in browser however the same appears
as junk characters when I view the .xls file using Microsoft Excel 2000 Application on my WINDOWS2000 machine.
What am I missing here ... ?
Thanks and Regards,
Kumar.

Japanese characters, outputstreamwriter, unicode to utf-8

Hello,
I have a problem with OutputStreamWriter's encoding of japanese characters into utf-8...if you have any ideas please let me know! This is what is going on:
static public String convert2UTF8(String iso2022Str) {
   String utf8Str = "";
   try {
      //convert string to byte array stream
      ByteArrayInputStream is = new     ByteArrayInputStream(iso2022Str.getBytes());
      ByteArrayOutputStream os = new ByteArrayOutputStream();
      //decode iso2022Str byte stream with iso-2022-jp
      InputStreamReader in = new InputStreamReader(is, "ISO2022JP");
      //reencode to utf-8
      OutputStreamWriter out = new OutputStreamWriter(os, "UTF-8");
      //get each character c from the input stream (will be in unicode) and write to output stream
      int c;
      while((c=in.read())!=-1) out.write(c);
      out.flush();
     //get the utf-8 encoded output byte stream as string
     utf8Str = os.toString();
      is.close();
      os.close();
      in.close();
      out.close();
   } catch (UnsupportedEncodingException e1) {
      return    e1.toString();
   } catch (IOException e2) {
      return e2.toString();
   return utf8Str;
}I am passing a string received from a database query to this function and the string it returns is saved in an xml file. Opening the xml file in my browser, some Japanese characters are converted but some, particularly hiragana characters come up as ???. For example:
屋台骨田家は時間目離れ拠り所那覇市矢田亜希子ナタハアサカラマ楢葉さマヤア
shows up as this:
屋�?�骨田家�?�時間目離れ拠り所那覇市矢田亜希�?ナタ�?アサカラマ楢葉�?�マヤア
(sorry that's absolute nonsense in Japanese but it was just an example)
To note:
- i am specifying the utf-8 encoding in my xml header
- my OS, browser, etc... everything is set to support japanese characters (to the best of my knowledge)
Also, I ran a test with a string, looking at its characters' hex values at several points and comparing them with iso-2022-jp, unicode, and utf-8 mapping tables. Basically:
- if I don't use this function at all...write the original iso-2022-jp string to an xml file...it IS iso-2022-jp
- I also looked at the hex values of "c" being read from the InputStreamReader here:
while((c=in.read())!=-1) out.write(c);and have verified (using character value mapping table) that in a problem string, all characters are still being properly converted from iso-2022-jp to unicode
- I checked another table (http://www.utf8-chartable.de/) for the unicode values received and all of them have valid mappings to a utf-8 value
So it appears that when characters are written to the OutputStreamWriter, not all characters can be mapped from Unicode to utf-8 even though their Unicode values are correct and there should be utf-8 equivalents. Instead they are converted to (hex value) EF BF BD 3F EF BF BD which from my understanding is utf-8 for "I don't know what to do with this one".
The characters that are not working - most hiragana (thought not all) and a few kanji characters. I have yet to find a pattern/relationship between the characters that cannot be converted.
If I am missing some....or someone has a clue....oh...and I am developing in Eclipse but really don't have a clue about it beyond setting up a project, editing it and hitting build/run. It is possible that I may have missed some needed configuration??
Thank you!!

It's worse than that, Rene; the OP is trying to create a UTF-8 encoded string from a (supposedly) iso-2022 encoded string. The whole method would be just an expensive no-op if it weren't for this line: utf8Str = os.toString(); That converts the (apparently valid) UTF-8 encoded byte array to a string, using the system default encoding (which seems to be iso-2022-jp, BTW). Result: garbage.
@meggomyeggo, many people make this kind of mistake when they first start dealing with encodings and charset conversions. Until you gain a good understanding of these matters, a few rules of thumb will help steer you away from frustrating dead ends.
* Never do charset conversions within your application. Only do them when you're communicating with an external entity like a filesystem, a socket, etc. (i.e., when you create your InputStreamReaders and OutputStreamWriters).
* Forget that the String/byte[] conversion methods (new String(byte[]), getBytes(), etc.) exist. The same advice applies to the ByteArray[Input/Output]Stream classes.
* You don't need to know how Java strings are encoded. All you need to know is that they always use the same encoding, so phrases like "iso-2022-jp string" or "UTF-8 string" (or even "UTF-16 string") are meaningless and misleading. Streams and byte arrays have encodings, strings do not.
You will of course run into situations where one or more of these rules don't apply. Hopefully, by then you'll understand why they don't apply.

Write Japanese characters to CSV

Similar Messages

Maybe you are looking for