(Internationalization) - Unicode and Other ... Encoding Schemes

Hello,
I am developing a application that requires multiple languages
(Chinese/Japanese/English, French/German) support.
I plan to use utf-8 encoding, and not individual encoding for each language
like SHIFT_JIS for Japanese, BIG5 for Chinese etc.
This is more so because i would need to display multiple languages on the
same page, and allow the user to enter data in any language he/she chooses.
1. So, is the assumption that nothing but utf-8 can be used here, correct ?
2. If this is the case, why do people go for SHIFT_JS for Japanese or BIG5
for Chinese at all ? After the advent of Unicode, why cant they just use
utf-8.
3. I am using Weblogic 6. And my app is composed of JSPs alone at the
moment. It is working fine with utf-8 encoding, without me setting anything
at all in properties files etc. anywhere. I am getting data entered by user
in forms (in chinese/japanese etc) fine, and able to insert it into the
database and get it back too, without any problems.
So, why is it that people are talking of parameters to be set in properties
files to tell the app abt encoding being used etc.
4. My resource bundles are ASCII text files (.properties) which have name
value pairs. Hex Unicode numbers of the form /uXXXX represent the value. And
this works fine.
For example :
UserNameLabel = \u00e3\ufffd\u2039\u00e3
instead of -
UserNameLabel = ãf¦ãf¼ã
If the properties files have the original characters where values shud be
present, my java code is not able to read the name value pairs in Resource
Bundle.
Am i following the right approach ?
The problem with the current approach is after i create the Resource
Bundles, i must use native2ascii compiler to convert the characters into
their equivalent Hex Code values.
Thanks
JSB

charllescuba1008 wrote:
Unicode states that each character is assigned a number which is unique, this number is called code point. Right.
The relationship between characters and code points is 1:1.Uhm .... let's assume "yes" for the moment. (Note that the relationship between the Java type char and code point is not 1:1 and there are other exceptions ...)
Eg: the String *"hello"* (which is sequence of character literals) can be represent by the following Code Points
*\u0065 \u0048 \u006c \u006c \u006f*Those are the Java String unicode escapes. If you want to talk about Unicode Codepoints, then the correct notation for "Hello" would be
U+0048 U+0065 U+006C U+006C U+006F
Note that you swapped the H and e.
I also read that a certain character code point must be recognized by a specific encoding or else a question mark (?) is output in place of the character.This one is Java specific. If Java tries to translate some unicode character to bytes using some encoding that doesn't support that character then it will output the byte(s) for "?" instead.
Not all code points can be recognized by an encoding.Some encodings (such as UTF-8) can encode all codepoints, others (such as ISO-8859-*, EBCDIC or UCS-2) can not.
So, the letter *ל* would not be recognized by all encodings and should be replaced by a question mark (?) right?Only in a very specific case in Java. This is not a genral Unicode-level rule.
(disclaimer: the HTML code presented was using decimal XML entities to represent the unicode characters).
What you are seing is possibly the replacement character that your text rendering system uses to represent characters that it knows, but can't display (possibly because the current font has no character for them).

Similar Messages

  • Issue with Data flow between Unicode and Non Unicode systems

    Hello,
    I have scenario as below,
    We have  a Unicode – ECC 6.0 and a UTF 7 – Legacy system.
    A message flow between Legacy system to ECC 6.0 system and the data is of 700 KB size.
    Will there be any issue in this as one is Unicode and other is non Unicode?
    Kindly let me know.
    Thanks & Regards
    Vivek

    Hi,
    To add to Mike's post...
    You indicate that your legacy system is non-Unicode and the ERP system is Unicode.  You also said that the data flow is only <i>from</i> the legacy system <i>to</i> the ERP system.  In this case, you should have no data issues, since the Unicode system is the receiving system.  There <b>are</b> data issues when the data flow is in the other direction: <i>from</i> a Unicode system <i>to</i> a non-Unicode system.  Here, the non-Unicode system can only process characters that exist on its codepage and care must be taken from sending systems to ensure that they only send characters that are on the receiving system's codepage (as Mike says above).
    Best Regards,
    Matt

  • Standard XML schema for Vendor data exchange between SAP and other system

    Is there a SAP standard way of XML schema that we exchange between SAP and other system? Please let me know.
    Thanks.

    See SAP Interface Repository (http://ifr.sap.com).
    My proposal is to leave old SAP connectors staff and use SAP Exchange Infrastructure. There is a support of industry XML standards in XI 3.0 like xCBL.

  • Convert smart quotes and other high ascii characters to HTML

    I'd like to set up Dreamweaver CS4 Mac to automatically convert smart quotes and other high ASCII characters (m-dashes, accent marks, etc.) pasted from MS Word into HTML code. Dreamweaver 8 used to do this by default, but I can't find a way to set up a similar auto-conversion in CS 4.  Is this possible?  If not, it really should be a preference option. I code a lot of HTML emails and it is very time consuming to convert every curly quote and dash.
    Thanks,
    Robert
    Digital Arts

    I too am having a related problem with Dreamweaver CS5 (running under Windows XP), having just upgraded from CS4 (which works fine for me) this week.
    In my case, I like to convert to typographic quotes etc. in my text editor, where I can use macros I've written to speed the conversion process. So my preferred method is to key in typographic letters & symbols by hand (using ALT + ASCII key codes typed in on the numeric keypad) in my text editor, and then I copy and paste my *plain* ASCII text (no formatting other than line feeds & carriage returns) into DW's DESIGN view. DW displays my high-ASCII characters just fine in DESIGN view, and writes the proper HTML code for the character into the source code (which is where I mostly work in DW).
    I've been doing it this way for years (first with GoLive, and then with DW CS4) and never encountered any problems until this week, when I upgraded to DW CS5.
    But the problem I'm having may be somewhat different than what others have complained of here.
    In my case, some high-ASCII (above 128) characters convert to HTML just fine, while others do not.
    E.g., en and em dashes in my cut-and-paste text show as such in DESIGN mode, and the right entries
        &ndash;
        &mdash;
    turn up in the source code. Same is true for the ampersand
        &amp;
    and the copyright symbol
        &copy;
    and for such foreign letters as the e with acute accent (ALT+0233)
        &eacute;
    What does NOT display or code correctly are the typographic quotes. E.g., when I paste in (or special paste; it doesn't seem to make any difference which I use for this) text with typographic double quotes (ALT+0147 for open quote mark and ALT+0148 for close quote mark), which should appear in source code as
        &ldquo;[...]&rdquo;
    DW strips out the ASCII encoding, displaying the inch marks in DESIGN mode, and putting this
        &quot;[...]&quot;
    in my source code.
    The typographic apostrophe (ALT+0146) is treated differently still. The text I copy & paste into DW should appear as
        [...]&rsquo;[...]
    in the source code, but instead I get the foot mark (both in DESIGN and CODE views):
    I've tried adjusting the various DW settings for "encoding"
        MODIFY > PAGE PROPERTIES > TITLE/ENCODING > Encoding:
    and for fonts
        EDIT > PREFERENCES > FONTS
    but switching from "Unicode (UTF-8)" to "Western European" hasn't solved the problem (probably because in my case many of the higher ASCII characters convert just fine). So I don't think it's the encoding scheme I use that's the problem.
    Whatever the problem is, it's caused me enough headaches and time lost troubleshooting that I'm planning to revert to CS4 as soon as I post this.
    Deborah

  • Unicode and non-unicode

    WHAT IS DIFFRENTS BETWEEN UNICODE AND NON UNICODE ?
    BRIEFLY EXPLAIN ABOUT UNICODE?
                                                            THANKS IN ADVANCES

    A 16-bit character encoding scheme allowing characters from Western European, Eastern European, Cyrillic, Greek, Arabic, Hebrew, Chinese, Japanese, Korean, Thai, Urdu, Hindi and all other major world languages, living and dead, to be encoded in a single character set. The Unicode specification also includes standard compression schemes and a wide range of typesetting information required for worldwide locale support. Symbian OS fully implements Unicode. A 16-bit code to represent the characters used in most of the world's scripts. UTF-8 is an alternative encoding in which one or more 8-bit bytes represents each Unicode character. A 16-bit character set defined by ISO 10646. A code similar to ASCII, used for representing commonly used symbols in a digital form. Unlike ASCII, however, Unicode uses a 16-bit dataspace, and so can support a wide variety of non-Roman alphabets including Cyrillic, Han Chinese, Japanese, Arabic, Korean, Bengali, and so on. Supporting common non-Roman alphabets is of interest to community networks, which may want to promote multicultural aspects of their systems.
    ABAP Development under Unicode
    Prior to Unicode the length of a character was exactly one byte, allowing implicit typecasts or memory-layout oriented programming. With Unicode this situation has changed: One character is no longer one byte, so that additional specifications have to be added to define the unit of measure for implicit or explicit references to (the length of) characters.
    Character-like data in ABAP are always represented with the UTF-16 - standard (also used in Java or other development tools like Microsoft's Visual Basic); but this format is not related to the encoding of the underlying database.
    A Unicode-enabled ABAP program (UP) is a program in which all Unicode checks are effective. Such a program returns the same results in a non-Unicode system (NUS) as in a Unicode system (US). In order to perform the relevant syntax checks, you must activate the Unicode flag in the screens of the program and class attributes.
    In a US, you can only execute programs for which the Unicode flag is set. In future, the Unicode flag must be set for all SAP programs to enable them to run on a US. If the Unicode flag is set for a program, the syntax is checked and the program executed according to the rules described in this document, regardless of whether the system is a US or a NUS. From now on, the Unicode flag must be set for all new programs and classes that are created.
    If the Unicode flag is not set, a program can only be executed in an NUS. The syntactical and semantic changes described below do not apply to such programs. However, you can use all language extensions that have been introduced in the process of the conversion to Unicode.
    As a result of the modifications and restrictions associated with the Unicode flag, programs are executed in both Unicode and non-Unicode systems with the same semantics to a large degree. In rare cases, however, differences may occur. Programs that are designed to run on both systems therefore need to be tested on both platforms.
    Refer to the below related threads
    Re: Why the select doesn't run?
    what is unicode
    unicode
    unicode
    Regards,
    Santosh

  • Applet and parameters encoding

    HI everyone. I decided to post this hoping that someone will explain me how internationalization works in java so I and other people can benifit from this.
    For last five days I have done a bit of reaserch on Java Internationalization. I've read tuturials that Sun provides and searched java Furums for various question. So far I have gain a lot. Now I have this question and I can not find the answer.
    I understand that String is always in UNICODE. The problem I have is this
    Imagine html page that is encoded in Western european(ISO).
    The applet is located on this page and it tries to read parametes
    The value of one parameter is in English and other in French.
    The question is this. If applet reads those parameters will String contains correct Unicode format. If I will try to display these strings in label will french be displayed correctly. ?
    second question applies to Japanese. will applet read japanese values of parameters correctly?
    Basically I want to now how java handle language in this situation
    Thanks

    I found the problem it was related to "value" of parameter in HTML file ,I must set a specific value instead of "" in html tag like the following :
    <param name="Field" value="1">

  • Unicode and non-unicode string data types Issue with 2008 SSIS Package

    Hi All,
    I am converting a 2005 SSIS Package to 2008. I have a task which has SQL Server as the source and Oracle as the destination. I copy the data from a SQL server view with a field nvarchar(10) to a field of a oracle table varchar(10). The package executes fine
    on my local when i use the data transformation task to convert to DT_STR. But when I deploy the dtsx file on the server and try to run from an SQL Job Agent it gives me the unicode and non-unicode string data types error for the field. I have checked the registry
    settings and its the same in my local and the server. Tried both the data conversion task and Derived Column task but with no luck. Pls suggest me what changes are required in my package to run it from the SQL Agent Job.
    Thanks.

    What is Unicode and non Unicode data formats
    Unicode : 
    A Unicode character takes more bytes to store the data in the database. As we all know, many global industries wants to increase their business worldwide and grow at the same time, they would want to widen their business by providing
    services to the customers worldwide by supporting different languages like Chinese, Japanese, Korean and Arabic. Many websites these days are supporting international languages to do their business and to attract more and more customers and that makes life
    easier for both the parties.
    To store the customer data into the database the database must support a mechanism to store the international characters, storing these characters is not easy, and many database vendors have to revised their strategies and come
    up with new mechanisms to support or to store these international characters in the database. Some of the big vendors like Oracle, Microsoft, IBM and other database vendors started providing the international character support so that the data can be stored
    and retrieved accordingly to avoid any hiccups while doing business with the international customers.
    The difference in storing character data between Unicode and non-Unicode depends on whether non-Unicode data is stored by using double-byte character sets. All non-East Asian languages and the Thai language store non-Unicode characters
    in single bytes. Therefore, storing these languages as Unicode uses two times the space that is used specifying a non-Unicode code page. On the other hand, the non-Unicode code pages of many other Asian languages specify character storage in double-byte character
    sets (DBCS). Therefore, for these languages, there is almost no difference in storage between non-Unicode and Unicode.
    Encoding Formats: 
    Some of the common encoding formats for Unicode are UCS-2, UTF-8, UTF-16, UTF-32 have been made available by database vendors to their customers. For SQL Server 7.0 and higher versions Microsoft uses the encoding format UCS-2 to store the UTF-8 data. Under
    this mechanism, all Unicode characters are stored by using 2 bytes.
    Unicode data can be encoded in many different ways. UCS-2 and UTF-8 are two common ways to store bit patterns that represent Unicode characters. Microsoft Windows NT, SQL Server, Java, COM, and the SQL Server ODBC driver and OLEDB
    provider all internally represent Unicode data as UCS-2.
    The options for using SQL Server 7.0 or SQL Server 2000 as a backend server for an application that sends and receives Unicode data that is encoded as UTF-8 include:
    For example, if your business is using a website supporting ASP pages, then this is what happens:
    If your application uses Active Server Pages (ASP) and you are using Internet Information Server (IIS) 5.0 and Microsoft Windows 2000, you can add "<% Session.Codepage=65001 %>" to your server-side ASP script.
    This instructs IIS to convert all dynamically generated strings (example: Response.Write) from UCS-2 to UTF-8 automatically before sending them to the client.
    If you do not want to enable sessions, you can alternatively use the server-side directive "<%@ CodePage=65001 %>".
    Any UTF-8 data sent from the client to the server via GET or POST is also converted to UCS-2 automatically. The Session.Codepage property is the recommended method to handle UTF-8 data within a web application. This Codepage
    setting is not available on IIS 4.0 and Windows NT 4.0.
    Sorting and other operations :
    The effect of Unicode data on performance is complicated by a variety of factors that include the following:
    1. The difference between Unicode sorting rules and non-Unicode sorting rules 
    2. The difference between sorting double-byte and single-byte characters 
    3. Code page conversion between client and server
    Performing operations like >, <, ORDER BY are resource intensive and will be difficult to get correct results if the codepage conversion between client and server is not available.
    Sorting lots of Unicode data can be slower than non-Unicode data, because the data is stored in double bytes. On the other hand, sorting Asian characters in Unicode is faster than sorting Asian DBCS data in a specific code page,
    because DBCS data is actually a mixture of single-byte and double-byte widths, while Unicode characters are fixed-width.
    Non-Unicode :
    Non Unicode is exactly opposite to Unicode. Using non Unicode it is easy to store languages like ‘English’ but not other Asian languages that need more bits to store correctly otherwise truncation will occur.
    Now, let’s see some of the advantages of not storing the data in Unicode format:
    1. It takes less space to store the data in the database hence we will save lot of hard disk space. 
    2. Moving of database files from one server to other takes less time. 
    3. Backup and restore of the database makes huge impact and it is good for DBA’s that it takes less time
    Non-Unicode vs. Unicode Data Types: Comparison Chart
    The primary difference between unicode and non-Unicode data types is the ability of Unicode to easily handle the storage of foreign language characters which also requires more storage space.
    Non-Unicode
    Unicode
    (char, varchar, text)
    (nchar, nvarchar, ntext)
    Stores data in fixed or variable length
    Same as non-Unicode
    char: data is padded with blanks to fill the field size. For example, if a char(10) field contains 5 characters the system will pad it with 5 blanks
    nchar: same as char
    varchar: stores actual value and does not pad with blanks
    nvarchar: same as varchar
    requires 1 byte of storage
    requires 2 bytes of storage
    char and varchar: can store up to 8000 characters
    nchar and nvarchar: can store up to 4000 characters
    Best suited for US English: "One problem with data types that use 1 byte to encode each character is that the data type can only represent 256 different characters. This forces multiple
    encoding specifications (or code pages) for different alphabets such as European alphabets, which are relatively small. It is also impossible to handle systems such as the Japanese Kanji or Korean Hangul alphabets that have thousands of characters."<sup>1</sup>
    Best suited for systems that need to support at least one foreign language: "The Unicode specification defines a single encoding scheme for most characters widely used in businesses around the world.
    All computers consistently translate the bit patterns in Unicode data into characters using the single Unicode specification. This ensures that the same bit pattern is always converted to the same character on all computers. Data can be freely transferred
    from one database or computer to another without concern that the receiving system will translate the bit patterns into characters incorrectly.
    https://irfansworld.wordpress.com/2011/01/25/what-is-unicode-and-non-unicode-data-formats/
    Thanks Shiven:) If Answer is Helpful, Please Vote

  • What is alignment in unicode and what are restrictions

    what is alignment in unicode and what are restrictions, dont give about unicode i want only about alignment in unicode
    Points will be awarded if usefull

    Hi,
    Check the following Threads,
    what is internal and external encoding in unicode
    Unicode
    UNICODE
    Regards,
    Padmam.

  • Unicode and Java

    Hi
    As we all know Java treat character literals as Unicode characters. I have been studying Unicode and the way they treat characters and I have a doubt which is not specific to Java code but specific to Unicode.
    Unicode states that each character is assigned a number which is unique, this number is called code point.
    The relationship between characters and code points is 1:1.
    Eg: the String *"hello"* (which is sequence of character literals) can be represent by the following Code Points
    *\u0065 \u0048 \u006c \u006c \u006f*
    I also read that a certain character code point must be recognized by a specific encoding or else a question mark (?) is output in place of the character. Not all code points can be recognized by an encoding.
    So, the letter *&#1500;* would not be recognized by all encodings and should be replaced by a question mark (?) right?
    The interesting is that this code point represents a different character and not a *"?"* in other encodings. It should print the same character
    This is the HTML code I used for tests (save it in your hard disk and open using your navigator, then select the following encodings: UTF16, ISO-8859-1)
    <html>
    <body>
    &#1502;&#1506;&#1497;&#1500; &#1488;&#1495;&#1491; &#1489;&#1490;&#1513;&#1501;, &#1500;&#1497;&#1500;&#1492; &#1513;&#1500; &#1488;&#1508;&#1512;&#1497;&#1500;
    &#1504;&#1508;&#1514;&#1495; &#1499;&#1502;&#1493; &#1506;&#1504;&#1503;, &#1493;&#1512;&#1506;&#1501; &#1488;&#1494; &#1502;&#1488;&#1497;&#1512;
    &#1502;&#1506;&#1497;&#1500; &#1488;&#1495;&#1491; &#1489;&#1490;&#1513;&#1501;, &#1500;&#1497;&#1500;&#1492; &#1495;&#1501; &#1493;&#1511;&#1512;
    &#1504;&#1508;&#1512;&#1505; &#1499;&#1502;&#1493; &#1495;&#1493;&#1508;&#1492;, &#1493;&#1502;&#1514;&#1495;&#1514; &#1488;&#1504;&#1497; &#1513;&#1512;
    &#1502;&#1506;&#1497;&#1500; &#1488;&#1495;&#1491; &#1489;&#1490;&#1513;&#1501;, &#1512;&#1496;&#1493;&#1489;, &#1500;&#1502;&#1497; &#1488;&#1499;&#1508;&#1514;
    &#1488;&#1504;&#1497; &#1500;&#1488; &#1506;&#1500; &#1492;&#1488;&#1512;&#1509;, &#1488;&#1497;&#1514;&#1498; &#1500;&#1502;&#1506;&#1500;&#1492; &#1513;&#1496;
    &#1512;&#1493;&#1495; &#1489;&#1508;&#1504;&#1497;&#1501;, &#1496;&#1497;&#1508;&#1493;&#1514; &#1492;&#1490;&#1513;&#1501; &#1492;&#1488;&#1495;&#1512;&#1493;&#1504;&#1493;&#1514;
    &#1504;&#1493;&#1490;&#1506;&#1493;&#1514; &#1489;&#1500;&#1495;&#1497;&#1497;&#1501;, &#1489;&#1508;&#1504;&#1497;&#1497;&#1498; &#1502;&#1513;&#1495;&#1511;&#1493;&#1514;
    &#1488;&#1502;&#1510;&#1506; &#1492;&#1512;&#1495;&#1493;&#1489;, &#1499;&#1493;&#1500;&#1501; &#1499;&#1489;&#1512; &#1497;&#1513;&#1504;&#1497;&#1501;
    &#1492;&#1497;&#1497;&#1514;&#1492; &#1506;&#1491;&#1492; &#1492;&#1512;&#1493;&#1495; &#1493;&#1506;&#1493;&#1491; &#1513;&#1504;&#1497; &#1499;&#1493;&#1499;&#1489;&#1497;&#1501;
    &#1488;&#1502;&#1510;&#1506; &#1492;&#1512;&#1495;&#1493;&#1489;, &#1499;&#1493;&#1500;&#1501; &#1499;&#1489;&#1512; &#1497;&#1513;&#1504;&#1497;&#1501;,
    &#1492;&#1497;&#1497;&#1514;&#1492; &#1506;&#1491;&#1492; &#1492;&#1512;&#1493;&#1495; &#1493;&#1506;&#1493;&#1491; &#1513;&#1504;&#1497; &#1499;&#1493;&#1499;&#1489;&#1497;&#1501;
    &#1512;&#1488;&#1497;&#1514;&#1497; &#1494;&#1493;&#1490; &#1506;&#1497;&#1504;&#1497;&#1497;&#1501;, &#1502;&#1505;&#1512;&#1489;&#1493;&#1514; &#1500;&#1492;&#1497;&#1508;&#1514;&#1495;
    &#1510;&#1493;&#1500;&#1500;&#1514; &#1488;&#1500; &#1506;&#1510;&#1502;&#1498; &#1506;&#1502;&#1493;&#1511; &#1489;&#1497;&#1501; &#1513;&#1500;&#1498;,
    &#1502;&#1491;&#1497; &#1508;&#1506;&#1501; &#1488;&#1514; &#1506;&#1493;&#1500;&#1492;, &#1500;&#1493;&#1511;&#1495;&#1514; &#1511;&#1510;&#1514; &#1488;&#1493;&#1497;&#1512;
    &#1500;&#1488; &#1512;&#1493;&#1510;&#1492; &#1500;&#1492;&#1497;&#1505;&#1495;&#1507;, &#1502;&#1499;&#1497;&#1512;&#1492; &#1488;&#1514; &#1492;&#1502;&#1495;&#1497;&#1512;
    &#1488;&#1502;&#1510;&#1506; &#1492;&#1512;&#1495;&#1493;&#1489;, &#1499;&#1493;&#1500;&#1501; &#1499;&#1489;&#1512; &#1497;&#1513;&#1504;&#1497;&#1501;...
    </body>
    </html>I would appreciate if you correct me in case I am wrong!
    Edited by: charllescuba1008 on Mar 31, 2009 2:08 PM

    charllescuba1008 wrote:
    Unicode states that each character is assigned a number which is unique, this number is called code point. Right.
    The relationship between characters and code points is 1:1.Uhm .... let's assume "yes" for the moment. (Note that the relationship between the Java type char and code point is not 1:1 and there are other exceptions ...)
    Eg: the String *"hello"* (which is sequence of character literals) can be represent by the following Code Points
    *\u0065 \u0048 \u006c \u006c \u006f*Those are the Java String unicode escapes. If you want to talk about Unicode Codepoints, then the correct notation for "Hello" would be
    U+0048 U+0065 U+006C U+006C U+006F
    Note that you swapped the H and e.
    I also read that a certain character code point must be recognized by a specific encoding or else a question mark (?) is output in place of the character.This one is Java specific. If Java tries to translate some unicode character to bytes using some encoding that doesn't support that character then it will output the byte(s) for "?" instead.
    Not all code points can be recognized by an encoding.Some encodings (such as UTF-8) can encode all codepoints, others (such as ISO-8859-*, EBCDIC or UCS-2) can not.
    So, the letter *&#1500;* would not be recognized by all encodings and should be replaced by a question mark (?) right?Only in a very specific case in Java. This is not a genral Unicode-level rule.
    (disclaimer: the HTML code presented was using decimal XML entities to represent the unicode characters).
    What you are seing is possibly the replacement character that your text rendering system uses to represent characters that it knows, but can't display (possibly because the current font has no character for them).

  • Url and File Encoding

    I have the following scenario:
    I have a directory which has directory names and filenames encoded in what I believe is utf-8 (the content is html). The files are derivates of DMOZ/World so they are in various languages. I can see accent marks and cryllic/greek, etc in the windows file explorer.
    I need to zip this directory up (using java) and then upload it to a server and then unzip it using php. I am uncertain what encoding the server is using.
    Do I UTF-8 Urlencode the file names and file paths and then zip and upload?
    If so then do I need to also urlencode the links within the html?
    thanks

    It's not a bug, it's a lack of a feature.
    The design of the Zip format requires a filename to be stored as bytes in the archive, but doesn't specify what encoding should be used to do that. Back in the days when the format was invented, that didn't matter because you could only use ASCII characters in filenames anyway.
    Then when Unicode started infiltrating file systems, there was nobody powerful enough to fix the format by specifying an encoding, and the big players in the archiving field didn't care because the way they did it was de facto correct anyway as far as they were concerned.
    I don't know how Microsoft and WinZip encode their filenames these days, but at any rate Java's zip classes don't even provide the option to specify an encoding. I am pretty sure that several bug reports have been filed in regard to this missing feature but I don't believe anything has been done yet.

  • Unicode and converter

    Hi there all, Im Chrno, and i have now a question... well exactly what im trying to do now, that's imposible to me lol... Hope you guys can help
    OKey so this the question...
    I want to convert a string like this (in Vietnamese) "t&aacute; l&#7843;" or something like "ch&uacute;ng t&ocirc;i lu&ocirc;n ch&agrave;o &#273;&oacute;n b&#7841;n" to a string like this " *t&aacute; l & # 7 8 4 3;* "
    I make the first string with Unicode and dont know how to convert it in wellform like the second string...
    Ths for read it and hope u can help me out :)
    Edited by: ChrnoLove on Apr 24, 2009 9:41 AM

    yet all their ID tags were edited in iTunes when I had them on my PC
    I think the problem there is that on a PC you can have them a legacy Japanese encoding while the Mac only accepts Unicode, and the Mac and PC also use slightly different forms of Unicode. But you are right, I don't see why, if all were ok in Windows, only some would be ok on the Mac.
    There is a Japanese version of these forums where you might ask if you or a colleague knows Japanese well:
    http://discussions.info.apple.co.jp/

  • Unicode and Chinese

    This is driving me nuts.
    Created a page where there is a mix of English and Chinese,
    used unicode
    and worked fine.
    But then created another page exactly the same and now the
    unicode is
    not being converted..
    First link is fine
    http://www.destinationcdg.com/Bonaparte/BonaparteC.cfm
    But this link is all screwed up.
    http://www.destinationcdg.com/Bonaparte/areaC.cfm
    Any ideas please.
    DW8.02 CFMX7 and Apache2

    Hi guys
    I've just realised that the solution here isn't totally complete. If you are still interested in helping I would be really grateful.
    Quick re-cap:
    The problem was Java was mis-calculating the length of unicode strings.
    e.g. ...
    String nihao = "??";  //Should read 2 chinese characters, may display here as ??
    System.out.println(nihao.length()) ;... would print 6 or something, but not 2 as it should.
    I was recommended to use a parameter when invoking javac which fixed this problem.
    javac -encoding UTF-8 ClassName.javaNow, this solved the problem so far.
    However!!!! What I assumed would work and didn't test until now is this:
    System.out.println(nihao);But it doesn't work.
    So in a nutshell. If I have a Class which contains unicode strings out of the usual latin set and encode that text file as unicode, use a -encoding UTF-8 parameter when compiling, Java still prints out ?? to the command line.
    Is it my shell or is it Java?
    I'm using the Bash shell.
    If I had a file called ??.txt (should be 2 chinese chars) and used ls then ?? (should be 2 chiense chars) would not display properly. I would get ??.txt.
    To get the file name to display properly I would need to use ls -v. This -v flag makes things work.
    I've tried it with the java command but java doesn't like it.
    This is really doing my head in. If anyone has any ideas please help.
    Thanks.
    Chinese characters don't seem to be uploading to this website so it makes this post difficult. Where you are supposed to see chinese I have said so. It might display as ??. There are places where I wanted to write ??.
    I can't award Duke Dollars to this post as I did it already. I have posted a fresh version of this problem in the Java Programming forum. I have allocated Duke Dollars to that post so best to reply there if you have any ideas :)
    Message was edited by:
    stanton_ian

  • About unicode and non-unicode

    Hi experts,
    can anybody tell me
    what is unicode and non-unicode in interview point of view.Just 2 or 3 sentences....
    Thanks in advance

    unicode is for multilingual capability in SAP system,
    apart from that important unicode t.code is uccheck if you give report name we will get different error codes,in genaral we get an error of structures miss match,obsolute statemnts,open data set,describe ststment and so on
    more over you can say we delete all obsolute function modules, look at the following it may help you
    Before the Unicode error
       lt_hansp = lt_0201-endda0(4) - lt_0002-gbdat0(4).
    Solution.
    data :abc type i,
          def type i.
    move  lt_0201-endda+0(4) to abc.
    move    lt_0002-gbdat+0(4) to def.
    lt_hansp-endda = abc - def.
    Before the Unicode error:
       WRITE: /1 'CO:',CO(110).
    Solution.
    FIELD-SYMBOLS: <fs_co> type any.
    assign co to <fs_co>.
    WRITE: /1 'CO:',<fs_co>(110).
    DESCIBE002     In Unicode, DESCRIBE LENGTH can only be used with the IN BYTE MODE or IN  CHARACTER MODE addition.
    Before the Unicode error:
        describe field <tab_feld> length len.
    Solution.
    describe field <tab_feld> length len IN character mode.
    Before the Unicode error:
        DESCRIBE FIELD DOWNTABLA LENGTH LONG.
    Solution.
    DESCRIBE FIELD DOWNTABLA LENGTH LONG IN byte MODE.
    DO 002     Could not specify the access range automatically. This means that you need a  RANGE addition     
    Before the Unicode error:
    DO 7 TIMES VARYING i FROM aktuell(1) NEXT aktuell+1(1)
    Solution.
      DO 7 TIMES VARYING i FROM aktuell(1) NEXT aktuell+1(1) RANGE aktuell .
    Before the Unicode error:
    DO 3 TIMES VARYING textfeld FROM gtx_line1 NEXT gtx_line2.
    Solution.
    DATA: BEGIN OF text,
            gtx_line1 TYPE rp50m-text1,
            gtx_line2 TYPE rp50m-text2,
            gtx_line3 TYPE rp50m-text3,
          END OF text.
    DO 3 TIMES VARYING textfeld FROM gtx_line1 NEXT gtx_line2 RANGE text..
    Before the Unicode error:
    DO ev_restlen TIMES
        VARYING ev_zeichen FROM ev_hstr(1) NEXT ev_hstr+1(1).
    Solution.
      DO ev_restlen TIMES
         VARYING ev_zeichen FROM ev_hstr(1) NEXT ev_hstr+1(1) range ev_hstr.
    MESSAGEG!2     IT_TBTCO and "IT_ALLG" are not mutually convertible. In Unicode programs, "IT_TBTCO" must have the same structure layout as "IT_ALLG", independent of  the length of a Unicode character.     
    Before the Unicode error:
             IT_TBTCO = IT_ALLG.
    Solution.
    IT_TBTCO-header = IT_ALLG-header.
    MESSAGEG!3     FIELDCAT_LN-TABNAME and "WA_DISP" are not mutually convertible in a Unicode program     
    Before the Unicode error:
         IF GEH_TA15(73) NE RETTER15(73).
    Solution.
          FIELD-SYMBOLS: <GEHTA> TYPE ANY.
                        <RETTER1> TYPE ANY.
          ASSIGN: GEH_TA TO <GEHTA>,
                  RETTER TO <RETTER>.
    IF <GEHTA>15(73) NE <RETTER>15(73).
    Before the Unicode error:
           IMP_EP_R3_30 = RECRD_TAB-CNTNT.
    Solution.
        FIELD-SYMBOLS:  <imp_ep_r3_30> TYPE X,
                          <recrd_tab-cntnt> TYPE X.
          ASSIGN IMP_EP_R3_30 TO <imp_ep_r3_30> CASTING.
          ASSIGN RECRD_TAB-CNTNT TO <recrd_tab-cntnt> CASTING.
           <imp_ep_r3_30> = <recrd_tab-cntnt>.
    Before the Unicode error:
                and    pernr  = gt_pernr
    Solution.
                  and    pernr  = gt_pernr-pernr
    MESSAGEG!7     EBC_F0 and "EBC_F0_255(1)" are not comparable in Unicode programs.     
    Before the Unicode error:
       IF CHARACTER NE LINE_FEED.
    Solution.
        IF CHARACTER NE LINE_FEED-X.
    MESSAGEG!A     A line of "IT_ZMM_BINE" and "OUTPUT_LINES" are not mutually convertible. In a  Unicode program "IT_ZMM_BINE" must have the same structure layout as  "OUTPUT_LINES" independent of the length of a Unicode character.     
    Before the Unicode error:
    *data: lw_wpbp  type pc206.
    Solution.
    data: lw_wpbp  type pc205.
    Before the Unicode error:
       LOOP AT seltab INTO  ltx_p0078.
    Solution.
    DATA: WA_SELTAB like line of SELTAB.
    CLEAR WA_SELTAB.
    MOVE-CORRESPONDING ltx_p0078 to wa_seltab.
    move-corresponding wa_seltab to ltx_p0078.
    MESSAGEG?Y     The line type of "DTAB" must be compatible with one of the types "TEXTPOOL".     
    Before the Unicode error:
    DATA:
       BEGIN OF dtab OCCURS 100.
         text(100),
    include structure textpool.
       End of changes
    SET TITLEBAR '001' WITH dtab-text+9.
    Solution.
    the following declaration should be mentioned in the declaration of the textpool.
    DATA:
       BEGIN OF dtab OCCURS 100.
      text(100),
    include structure textpool.
       End of changes
      SET TITLEBAR '001' WITH dtab-entry.
    MESSAGEG@1     TFO05_TABLE cannot be converted to a character-type field.     
    Before the Unicode error:
    WRITE: / PA0015, 'Fehler bei MODIFY'.
    Solution.
    WRITE: / PA0015+0, 'Fehler bei MODIFY'.
    MESSAGEG@3     ZL-C1ZNR must be a character-type data object (data type C, N, D, T or  STRING) .     
    Before the Unicode error:
         con_tab  TYPE x VALUE '09',
    Solution.
           con_tab  TYPE string VALUE '09',
    Before the Unicode error:
    data:   g_con_ascii_tab(1)  type x   value '09'.
    Solution.
       data:   g_con_ascii_tab  type STRING   value '09'.
    MESSAGEG@E     HELP_ANLN0 must be a character-type field (data type C, N, D, or T). an open  control structure introduced by "INTERFACE".
    Before the Unicode error:
    WRITE SATZ-MONGH TO SATZ-MONGH CURRENCY P0008-WAERS.
    WRITE SATZ-JAH55 TO SATZ-JAH55 CURRENCY P0008-WAERS.
    WRITE SATZ-EFF55 TO SATZ-EFF55 CURRENCY P0008-WAERS.
    WRITE SATZ-SOFE_EREU TO SATZ-SOFE_EREU CURRENCY P0008-WAERS.
    WRITE SATZ-SOFE_ERSF TO SATZ-SOFE_ERSF CURRENCY P0008-WAERS.
    WRITE SATZ-SOFE_ERSP TO SATZ-SOFE_ERSP CURRENCY P0008-WAERS.
    WRITE SATZ-SOFE_EIN TO SATZ-SOFE_EIN CURRENCY P0008-WAERS.
    WRITE SATZ-SOFE_EREU TO SATZ-SOFE_EREU CURRENCY P0008-WAERS.
    WRITE SATZ-ERHO_ERR TO SATZ-ERHO_ERR CURRENCY P0008-WAERS.
    WRITE SATZ-ERHO_EIN TO SATZ-ERHO_EIN CURRENCY P0008-WAERS.
    WRITE SATZ-JAH55_FF TO SATZ-JAH55_FF CURRENCY P0008-WAERS.
    Solution.
      DATA: SATZ1_MONGH(16),
            SATZ_JAH551(16),
            SATZ_EFF551(16),
            SATZ_SOFE_EREU1(16),
            SATZ_SOFE_ERSF1(16),
            SATZ_SOFE_ERSP1(16),
            SATZ_SOFE_EIN1(16),
            SATZ_ERHO_ERR1(16),
            SATZ_ERHO_EIN1(16),
            SATZ_JAH55_FF1(16).
      WRITE SATZ-MONGH TO SATZ1_MONGH CURRENCY P0008-WAERS.
      WRITE SATZ-JAH55 TO SATZ_JAH551 CURRENCY P0008-WAERS.
      WRITE SATZ-EFF55 TO SATZ_EFF551 CURRENCY P0008-WAERS.
      WRITE SATZ-SOFE_EREU TO SATZ_SOFE_EREU1 CURRENCY P0008-WAERS.
      WRITE SATZ-SOFE_ERSF TO SATZ_SOFE_ERSF1 CURRENCY P0008-WAERS.
      WRITE SATZ-SOFE_ERSP TO SATZ_SOFE_ERSP1 CURRENCY P0008-WAERS.
      WRITE SATZ-SOFE_EIN TO SATZ_SOFE_EIN1 CURRENCY P0008-WAERS.
      WRITE SATZ-ERHO_ERR TO SATZ_ERHO_ERR1 CURRENCY P0008-WAERS.
      WRITE SATZ-ERHO_EIN TO SATZ_ERHO_EIN1 CURRENCY P0008-WAERS.
      WRITE SATZ-JAH55_FF TO SATZ_JAH55_FF1 CURRENCY P0008-WAERS.
      SATZ-MONGH = SATZ1_MONGH.
      SATZ-JAH55 = SATZ_JAH551.
      SATZ-EFF55 = SATZ_EFF551.
      SATZ-SOFE_EREU = SATZ_SOFE_EREU1.
      SATZ-SOFE_ERSF = SATZ_SOFE_ERSF1.
      SATZ-SOFE_ERSP = SATZ_SOFE_ERSP1.
      SATZ-SOFE_EIN = SATZ_SOFE_EIN1.
      SATZ-ERHO_ERR = SATZ_ERHO_ERR1.
      SATZ-ERHO_EIN = SATZ_ERHO_EIN1.
      SATZ-JAH55_FF = SATZ_JAH55_FF1.
    MESSAGEG-0     VESVR_EUR must be a character-type data object (data type C, N, D, T or  STRING).     
    Before the Unicode error:
                TRANSLATE vesvr_eur USING '.,'.
                TRANSLATE espec_eur USING '.,'.
                TRANSLATE fijas_eur USING '.,'.
    Solution.
                 data: vesvreur(16),
                       especeur(16),
                       fijaseur(16).
                 vesvreur = vesvr_eur.
                 especeur = espec_eur.
                 fijaseur = fijas_eur.
                 TRANSLATE vesvreur USING '.,'.
                 TRANSLATE especeur USING '.,'.
                 TRANSLATE fijaseur USING '.,'.
                 vesvr_eur = vesvreur.
                 espec_eur = especeur.
                 fijas_eur = fijaseur.
    MESSAGEG-D     LT_0021 cannot be converted to the line incompatible. The line type must have  the same structure layout as "LT_0021" regardless of the length of a Unicode .     
    Before the Unicode error:
    data: lt_0021    like p0021 occurs 0 with header line.
    Solution.
        DATA: LT_0021 LIKE PA0021 OCCURS 0 WITH HEADER LINE.
    Before the Unicode error:
             append sim_data to p0007.
    Solution.
           DATA:wa_p0007 type p0007.
           move-corresponding sim_data to wa_p0007.
              append wa_p0007 to p0007.
    MESSAGEG-F     The structure "CO(110)" does not start with a character-type field. In Unicode  programs in such cases, offset/length declarations are not allowed      
    Before the Unicode error:
         TRANSFER COBEZ+8 TO DSN.
    Solution.
                FIELD-SYMBOLS:<fs_cobez> type any.
                TRANSFER <fs_COBEZ>+8 TO DSN.
    Before the Unicode error:
         WRITE: /1 COBEZ+16.
    Solution.
    FIELD-SYMBOLS <F_COBEZ> TYPE ANY.
    ASSIGN COBEZ TO <F_COBEZ>.
          WRITE: /1 <F_COBEZ>+16.
    MESSAGEG-G     The length declaration "171" exceeds the length of the character-type start  (=38) of the structure. This is not allowed in Unicode programs.
    Before the Unicode error:
       write: /1 '-->',
                 pa0201(250).
    Solution.
    field-symbols <fs_pa0201> type any.
    ASSIGN pa0201 TO <fs_pa0201>.
        write: /1 '-->',
                  <fs_pa0201>(250).
    MESSAGEG-H     The offset declaration "160" exceeds the length of the character-type start  (=126) of the structure. This is not allowed in Unicode programs . allowed.     
    Before the Unicode error:
    WRITE:/ SATZ(80),
             / SATZ+80(80),
             / SATZ+160(80),
             / SATZ+240(80),
             / SATZ+320(27).
    Solution.
      FIELD-SYMBOLS <FS_SATZ> TYPE ANY.
      ASSIGN SATZ TO <FS_SATZ>.
      WRITE:/ <FS_SATZ>(80),
              / <FS_SATZ>+80(80),
              / <FS_SATZ>+160(80),
              / <FS_SATZ>+240(80),
              / <FS_SATZ>+320(27).
    MESSAGEG-I     The sum of the offset and length (=504) exceeds the length of the start (=323) of the structure. This is not allowed in Unicode programs .     
    Before the Unicode error:
              /5 PARAMS+80(80),
    Solution.
        FIELD-SYMBOLS: <PARAMS> TYPE ANY.
        ASSIGN PARAMS TO <PARAMS>.
               /5 <PARAMS>+80(80),
    MESSAGEGWH     P0041-DAR01 and "DATE_SPEC" are type-incompatible.     
    Before the Unicode error:
       DO 5 TIMES VARYING I0008 FROM P0008-LGA01 NEXT P0008-LGA02.
    Solution.
        DO 5 TIMES VARYING I0008-LGA FROM P0008-LGA01 NEXT P0008-LGA02. "D07K963133
    Before the Unicode error:
       DO VARYING ls_data_aux FROM p0041-dar01 NEXT p0041-dar02.
    Solution.
         DO VARYING ls_data_aux-dar01 FROM p0041-dar01 NEXT p0041-dar02.
    MESSAGEGY/     The type of the database table and work area (or internal table) "P0050" are  not Unicode-convertible      
    Before the Unicode error:
    select * from  pa9705 client specified
            into ls_9705
    Solution.
       select * from  pa9705 client specified
              into corresponding fields of  ls_9705
    Before the Unicode error:
         select        * from  pa0202 client specified
                into ls_0202
    Solution.
           select  * from  pa0202 client specified
                  into corresponding fields of ls_0202
    OPEN   001     One of the additions "FOR INPUT", "FOR OUTPUT", "FOR APPENDING" or "FOR UPDATE" was expected.     
    Before the Unicode error:
    OPEN DATASET FICHERO IN TEXT MODE.
    Solution.
      OPEN DATASET FICHERO IN TEXT MODE FOR INPUT ENCODING NON-UNICODE.
    OPEN   002     IN... MODE was expected.     
    Before the Unicode error:
    OPEN DATASET P_OUT FOR OUTPUT IN TEXT MODE.
    Solution.
        OPEN DATASET P_OUT FOR OUTPUT IN TEXT MODE ENCODING non-unicode.
    OPEN   004     In "TEXT MODE" the "ENCODING" addition must be specified.     
    Before the Unicode error:
       open dataset dat for output in text mode.
    Solution.
         open dataset dat for output in text mode ENCODING NON-UNICODE.
    UPLO     Upload/Ws_Upload and Download/Ws_Download are obsolete, since they are not  Unicode-enabled; use the class cl_gui_frontend_services
    Before the Unicode error:
    move p_filein to disk_datei.
      CALL FUNCTION 'WS_UPLOAD'
           EXPORTING
                filename        = disk_datei
                FILETYPE        = FILETYPE
           TABLES
                DATA_TAB        = DISK_TAB
           EXCEPTIONS
                FILE_OPEN_ERROR = 1
                FILE_READ_ERROR = 2.
    Solution.
    DATA: file_name type string.
    move p_filein to file_name.
    CALL METHOD CL_GUI_FRONTEND_SERVICES=>GUI_UPLOAD
      EXPORTING
        FILENAME                = file_name
        FILETYPE                = 'ASC'
        HAS_FIELD_SEPARATOR     = 'X'
       HEADER_LENGTH           = 0
       READ_BY_LINE            = 'X'
       DAT_MODE                = SPACE
       CODEPAGE                = SPACE
       IGNORE_CERR             = ABAP_TRUE
       REPLACEMENT             = '#'
       VIRUS_SCAN_PROFILE      =
    IMPORTING
       FILELENGTH              =
       HEADER                  =
      CHANGING
        DATA_TAB                = disk_tab[]
      EXCEPTIONS
        FILE_OPEN_ERROR         = 1
        FILE_READ_ERROR         = 2
        NO_BATCH                = 3
        GUI_REFUSE_FILETRANSFER = 4
        INVALID_TYPE            = 5
        NO_AUTHORITY            = 6
        UNKNOWN_ERROR           = 7
        BAD_DATA_FORMAT         = 8
        HEADER_NOT_ALLOWED      = 9
        SEPARATOR_NOT_ALLOWED   = 10
        HEADER_TOO_LONG         = 11
        UNKNOWN_DP_ERROR        = 12
        ACCESS_DENIED           = 13
        DP_OUT_OF_MEMORY        = 14
        DISK_FULL               = 15
        DP_TIMEOUT              = 16
        NOT_SUPPORTED_BY_GUI    = 17
        ERROR_NO_GUI            = 18
        others                  = 19
    Before the Unicode error:
       CALL FUNCTION 'WS_DOWNLOAD'
            EXPORTING
                 filename = fich_dat
                 filetype = typ_fich
            TABLES
                 data_tab = t_down.
    Solution.
    data: filename1 type string,
          filetype1(10).
    move fich_dat to filename1.
    move typ_fich to filetype1.
    CALL METHOD CL_GUI_FRONTEND_SERVICES=>GUI_DOWNLOAD
      EXPORTING
        FILENAME                  = filename1
        FILETYPE                  = filetype1
        WRITE_FIELD_SEPARATOR     = 'X'
      CHANGING
        DATA_TAB                  = t_down[].
    Before the Unicode error:
    *CALL FUNCTION 'UPLOAD'
        TABLES
             DATA_TAB                =  datos
       EXCEPTIONS
            CONVERSION_ERROR        = 1
            INVALID_TABLE_WIDTH     = 2
            INVALID_TYPE            = 3
            NO_BATCH                = 4
            UNKNOWN_ERROR           = 5
            GUI_REFUSE_FILETRANSFER = 6
            OTHERS                  = 7
    Solution.
    DATA: file_table type table of file_table,
          filetable type file_table,
          rc type i,
          filename type string.
    CALL METHOD CL_GUI_FRONTEND_SERVICES=>FILE_OPEN_DIALOG
      CHANGING
        FILE_TABLE              = file_table
        RC                      = rc
    EXCEPTIONS
       FILE_OPEN_DIALOG_FAILED = 1
       CNTL_ERROR              = 2
       ERROR_NO_GUI            = 3
       NOT_SUPPORTED_BY_GUI    = 4
       others                  = 5
    READ table file_table into filetable index 1.
    move filetable to filename.
    CALL METHOD CL_GUI_FRONTEND_SERVICES=>GUI_UPLOAD
      EXPORTING
        FILENAME                = filename
        FILETYPE                = 'ASC'
        HAS_FIELD_SEPARATOR     = 'X'
       HEADER_LENGTH           = 0
       READ_BY_LINE            = 'X'
       DAT_MODE                = SPACE
       CODEPAGE                = SPACE
       IGNORE_CERR             = ABAP_TRUE
       REPLACEMENT             = '#'
       VIRUS_SCAN_PROFILE      =
    IMPORTING
       FILELENGTH              =
       HEADER                  =
      CHANGING
        DATA_TAB                = datos[]
      EXCEPTIONS
        FILE_OPEN_ERROR         = 1
        FILE_READ_ERROR         = 2
        NO_BATCH                = 3
        GUI_REFUSE_FILETRANSFER = 4
        INVALID_TYPE            = 5
        NO_AUTHORITY            = 6
        UNKNOWN_ERROR           = 7
        BAD_DATA_FORMAT         = 8
        HEADER_NOT_ALLOWED      = 9
        SEPARATOR_NOT_ALLOWED   = 10
        HEADER_TOO_LONG         = 11
        UNKNOWN_DP_ERROR        = 12
        ACCESS_DENIED           = 13
        DP_OUT_OF_MEMORY        = 14
        DISK_FULL               = 15
        DP_TIMEOUT              = 16
        NOT_SUPPORTED_BY_GUI    = 17
        ERROR_NO_GUI            = 18
        others                  = 19
    IF SY-SUBRC <> 0.
    MESSAGE ID SY-MSGID TYPE SY-MSGTY NUMBER SY-MSGNO
                WITH SY-MSGV1 SY-MSGV2 SY-MSGV3 SY-MSGV4.
    ENDIF.
    Before the Unicode error:
    CALL FUNCTION 'DOWNLOAD'
       EXPORTING
         filename            = p_attkit
         filetype            = 'ASC'
       TABLES
         data_tab            = tb_attrkit
       EXCEPTIONS
         invalid_filesize    = 1
         invalid_table_width = 2
         invalid_type        = 3
         no_batch            = 4
         unknown_error       = 5
         OTHERS              = 6.
    Solution.
               DATA : lv_filename    TYPE string,
                       lv_filen       TYPE string,
                       lv_path        TYPE string,
                       lv_fullpath    TYPE string.
                DATA: Begin of wa_testata,
                      lv_var(10) type c,
                      End of wa_testata.
                DATA: testata like standard table of wa_testata.
                OVERLAY p_attkit WITH lv_filename.
                CALL METHOD cl_gui_frontend_services=>file_save_dialog
                  EXPORTING
                   WINDOW_TITLE         =
                   DEFAULT_EXTENSION    =
                     default_file_name    = lv_filename
                   WITH_ENCODING        =
                   FILE_FILTER          =
                   INITIAL_DIRECTORY    =
                   PROMPT_ON_OVERWRITE  = 'X'
                  CHANGING
                    filename             = lv_filen
                    path                 = lv_path
                    fullpath             = lv_fullpath
                  USER_ACTION          =
                  FILE_ENCODING        =
                  EXCEPTIONS
                    cntl_error           = 1
                    error_no_gui         = 2
                    not_supported_by_gui = 3
                    OTHERS               = 4
                IF sy-subrc <> 0.
                  MESSAGE ID sy-msgid TYPE sy-msgty NUMBER sy-msgno
                             WITH sy-msgv1 sy-msgv2 sy-msgv3 sy-msgv4.
                ENDIF.
                CALL FUNCTION 'GUI_DOWNLOAD'
                        EXPORTING
                        BIN_FILESIZE                    =
                          filename                        = lv_fullpath
                          filetype                        = 'ASC'
                        APPEND                          = ' '
                        WRITE_FIELD_SEPARATOR           = ' '
                        HEADER                          = '00'
                        TRUNC_TRAILING_BLANKS           = ' '
                        WRITE_LF                        = 'X'
                        COL_SELECT                      = ' '
                        COL_SELECT_MASK                 = ' '
                        DAT_MODE                        = ' '
                        CONFIRM_OVERWRITE               = ' '
                        NO_AUTH_CHECK                   = ' '
                        CODEPAGE                        = ' '
                        IGNORE_CERR                     = ABAP_TRUE
                        REPLACEMENT                     = '#'
                        WRITE_BOM                       = ' '
                        TRUNC_TRAILING_BLANKS_EOL       = 'X'
                        WK1_N_FORMAT                    = ' '
                        WK1_N_SIZE                      = ' '
                        WK1_T_FORMAT                    = ' '
                        WK1_T_SIZE                      = ' '
                      IMPORTING
                        FILELENGTH                      =
                        TABLES
                          data_tab                        = tb_attrkit
                          fieldnames                      = testata
                       EXCEPTIONS
                         file_write_error                = 1
                         no_batch                        = 2
                         gui_refuse_filetransfer         = 3
                         invalid_type                    = 4
                         no_authority                    = 5
                         unknown_error                   = 6
                         header_not_allowed              = 7
                         separator_not_allowed           = 8
                         filesize_not_allowed            = 9
                         header_too_long                 = 10
                         dp_error_create                 = 11
                         dp_error_send                   = 12
                         dp_error_write                  = 13
                         unknown_dp_error                = 14
                         access_denied                   = 15
                         dp_out_of_memory                = 16
                         disk_full                       = 17
                         dp_timeout                      = 18
                         file_not_found                  = 19
                         dataprovider_exception          = 20
                         control_flush_error             = 21
                         OTHERS                          = 22
                IF sy-subrc <> 0.
                  MESSAGE ID sy-msgid TYPE sy-msgty NUMBER sy-msgno
                          WITH sy-msgv1 sy-msgv2 sy-msgv3 sy-msgv4.
                ENDIF.

  • Using the Soap Encoding schema within an XSD

    Hi,
    I am trying to create some XSDs based on the types from the auto-generated WSDLs in JDeveloper. However, I find that while the Array type works fine in the WSDL, I cannot seem to get it validate correctly in my XML schemas, possibly due to some namespace error.
    The current code in the XSD is given as:
    <xsd:schema
    xmlns:xsd="http://www.w3.org/2001/XMLSchema"
    xmlns:wsdl="http://schemas.xmlsoap.org/wsdl/"
    xmlns:SOAP-ENC="http://schemas.xmlsoap.org/soap/encoding/">
    <xsd:import namespace="http://schemas.xmlsoap.org/soap/encoding/"/>
    <xsd:import namespace="http://schemas.xmlsoap.org/wsdl/"/>
    <xsd:complexType name="VesselQueryResult">
    <xsd:all>
    <xsd:element name="vesselNames" type="ArrayOfjava_lang_String"/>
    </xsd:all>
    </xsd:complexType>
    <xsd:complexType name="ArrayOfjava_lang_String">
    <xsd:complexContent>
    <xsd:restriction base="SOAP-ENC:Array">
    <xsd:attribute ref="SOAP-ENC:arrayType" wsdl:arrayType="xsd:string[]"/>
    </xsd:restriction>
    </xsd:complexContent>
    </xsd:complexType>
    When I tried to add this XSD using as a user schema, however, below validation error is being displayed:
    Error: Line 0, Column 0: Invalid reference: 'http://schemas.xmlsoap.org/soap/encoding/:Array'
    I tried to qualify the SOAP-ENC namespace with the schemaLocation, but another error was displayed:
    <xsd:import namespace="http://schemas.xmlsoap.org/soap/encoding/"
    schemaLocation="http://schemas.xmlsoap.org/soap/encoding/"/>
    oracle.xml.parser.schema.XSDException: Server returned HTTP response code: 407 for URL: http://schemas.xmlsoap.org/soap/encoding/
    I am wondering how JDeveloper supports the SOAP-ENC schema for use in XSDs. Do I need to add the Soap Encoding schema as a user schema, or is there a way I might reference it similar to the way that auto-generated WSDLs seem to deal with it? Any information regarding this would be greatly appreciated.
    Thank you and regards.

    I deleted the two lines of coding below:
    utl_http.set_proxy('www-proxy', NULL);
    utl_http.set_persistent_conn_support(TRUE);
    and then I set the utl_http.set_persistent_conn_support() to False to get that error message. Seems like the website in question doesnt have the webservice anymore because the error has to do with the parsing of the WSDL file.... which probrably doesnt exist.
    Im just trying to get a working example of using a third party webservice to return a value to be displayed in the database.... know of any good examples? The ones im using seem to be pretty out dated... the barnes and nobles example just times out....

  • Using TeX's cmex10.pfb and other odd-coded fonts in Java

    I can use Font.createFont on COmputer Modern (maths) fonts like cmmi and cmex postscript type1 files and get a font. But as best I can discern many of the glyphs therein do not get found by the mapping via Unicode and I have been having trouble finding how to access them. Eg cmex10 reports it has 130 glyphs in but only 20 unicode codepoints admit to being populated and most of those do not display anything for me. Can anybody tell me how to select a physical font like that and access the glyphs by the raw unmapped codes that TeX gave them? Sample code would be nice but any pointers to what to do would perhaps help. Thanks! Arthur

    Further investigation: Java up to 1.3 or 1.4 had font property config files but those seen no longer present and do not cover fonts loaded from resources well.
    If I use the t1disasm disassembler on the type1 font and edit the postscript to change the names of some of the postscript procedures used to draw fonts to A, B, C etc I can then see those glyphs. So Java seems to be using the names of the items in the font encoding vector to try to map chars onto Unicode and just discarding glyphs it can not cope with ????
    So maybe I will be able to cope by making a trick version of cmex that names all its glyphs as if they are things in the Unicode space that Java knows about! But that seems pretty gross to me and a better solution that gave me raw access to the fonts by uninterpreted character code would be nicer.

Maybe you are looking for

  • Organizer won't open in Photoshop Elements 10. Says it is in use by another program.

    Organizer won't open in Photoshop Elements 10. Says it is in use by another program. This just started happening.... When I open Adobe Photoshop Elements 10 in Windows 7 and click on Organizer to open my catalog, I get a message that reads, " The cat

  • Video in button Selected state?

    I'm rusty in DVDSP - been a couple of years since my last DVD menu project, but I have one now and the client has provided the menu art and description of what they want - but there are 2 items I'm not sure if I can include the way they want, and bef

  • What are the problems associated with Minecraft game and Apple computers

    What are the problems with Minecraft game and Apple computers.  I've read that the two are incompatible.

  • Can we override Exception Class in Java ?

    I have done a project in structs 1.2 . I have used String variable without checking for null. so I get a null pointer exception. String str = null; str.equals("jayaraj"); //exception raised can i make a customized null pointer exception ?

  • Resaving pdfs and image quality

    I have clients submit ads as pdfs but sometimes I need to recreate the pdf because the client forgot to include all printer marks, or I need to convert a spot to process, or whatever. The way I do it is to create a new InDesign CS3 file, place the pd