Character vs. Byte

Hello ppl,
I will appreciate if someone helps me fix this bugging problem. I have code that has created a text file using DataOutputStream dos = new DataOutputStream(new FileOutputStream(new File("file.txt"), true));
dos.writeChars(str.toString());Now my problem is whenever I open text file created like above in editors like 'vi', 'emacs' etc., they show up as binary (which is expected since I used DataOutputStream to write bytes).
My questions is this:
I want to write another small piece of code that reads this text file and writes them back as plain characters and not binary data. I tried using
DataInputStream dis = new DataInputStream(new FileInputStream(new File("file.txt")));
PrintWriter pwriter = new PrintWriter(new FileWriter(new File("new-File.txt")));
while(true)
pwriter.print(dis.readChar());I also tried using BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(new File("file.txt"))));and then using str = br.readLine();
pwriter.print(str.toCharArray());I failed each time. Any suggestions on how to obtain text file with plain Ascii characters and not binary data.
Thanks
- Mitesh.

Hi malcolm,
I tried what you suggested. Its not working. All I got in the output file was characters like ????? Everything was a question mark. I tried different encodings. When I tried "UTF-8" I got the original file as output. The code I had used was exactly what you posted. Here is the code that I tried after several attempts of debugging and this works. It might be helpful to others:
BufferedReader br = null;
Writer out = null;
char chr = ' ';
int count = 0;
    try {
      br = new BufferedReader(new InputStreamReader(new FileInputStream("file.txt")));
      out = new FileWriter("fileConverted.txt");
      while(count != -1) { //this denotes end of file
        count = br.read();
        if (count == 0 || count == -1) // loop will break here if EOF detected
          break; // count = 0 indicates unused portion of Unicode character in the file.
        chr = (char)count; //else cast the ASCII value of int to char
        out.write(chr);
        count = br.read(); //Essentially we skip every alternate character in the input file.
// The portion skipped is unused portion of Unicode character.
      br.close();
      out.close();
      System.out.println("Done");
    catch(EOFException ex) {
      ex.printStackTrace();
    catch(IOException ex) {
      ex.printStackTrace();
    catch(NullPointerException ex) {
      ex.printStackTrace();
}

Similar Messages

Coversion from byte stream to character & character to byte stream.

This is a test program.
import java.io.*;
class Test
     public static void main(String[] args)
          System.out.println("Hello World!");
          try
               BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(new File("C:/javafiles/testing/Picture.jpg"))));
               PrintWriter bw = new PrintWriter(new OutputStreamWriter(new FileOutputStream(new File("C:/javafiles/testing/Copy.jpg"))));
               String in = br.readLine();
               while (in != null)
                    bw.println(in);
                    in = br.readLine();
               br.close();
               bw.close();
          catch (Exception e)
               System.out.println("\n\n**********Exception Occured**************\n" + e);
**First of all I read jpg file convert it into character stream from byte stream.
**and copy it back and convert it back into byte steam from character stream.
**The file copied back is corrupted.(getting open but display has fully scattered.)
**That means data has been lost in conversion.
**is it due to the encoding, I havn't specified while writing, or it always happens when we convert from byte stream to character and vice versa.
**What encoding should I use while writing jpg files.
please comment.

**First of all I read jpg file convert it into
character stream from byte stream.
**and copy it back and convert it back into byte
steam from character stream.First of all this operation is meaningless. A JPEG file doesn't contain anything that can be converted to characters. Delete the 'conversions' and all will be well.
**What encoding should I use while writing jpg files.None. Don't use an encoding. Don't use Readers and Writers. Use InputStreams and OutputStreams only.

Character or byte mode?

Hi
when will we use IN CHARACTER MODE & IN BYTE MODE in DESCRIBE querry?

Hi,
It is used to determine the length of of a data object.
when you find the length in byte go for BYTE mode
when you find the length in character go for CHARACTER mode.
REMEMBER: both mode comes only with length addtion.
REF: http://help.sap.com/saphelp_nw04/helpdata/en/fc/eb3145358411d1829f0000e829fbfe/content.htm
REGARDS,
ANIRBAN

Converting from Single Byte to Multi Byte character set

Hello,
I'm trying to migrate one schema, including data, from a 10g (10.1.0.2.0) DB with IW8ISO8859P8 character set, to a 10g (10.2.0.1.0) DB with AL32UTF8 character set.
The original tables are using VARCHAR2 columns, including some VARCHAR2(1) columns.
I'm trying to use exp and imp for the task, but during import I'm receiving errors like:
IMP-00019: row rejected due to ORACLE error 12899
IMP-00003: ORACLE error 12899 encountered
ORA-12899: value too large for column "SHAMAUT"."TIKIM"."GAR_SET" (actual: 2, maximum: 1)These errors are not limited to the one-character columns only.
Is there a way to export/import the data with AL32UTF8 in mind, so the system will automatically convert the data properly?
Thanks for the help,
Arie.

It's not a true conversion problem that you have but more a space problem. Tables columns are created by default with the init. parameter NLS_LENGTH_SEMANTICS character semantics:
If NLS_LENGTH_SEMANTICS = BYTE
then 1 character = 1 byte whatever the db character set
If NLS_LENGTH_SEMANTICS = CHAR
then 1 character = 1 character size for the db character set.
If this parameter is changed it is only taken into account for newly created tables or columns: existing columns are not changed.
See http://download-uk.oracle.com/docs/cd/B10501_01/server.920/a96529/ch2.htm#104327
The only solution I see is to enlarge your VARCHAR2 columns before running the import...
Message was edited by:
Pierre Forstmann

Regardibg double byte data type in Xi(japanese character)

hi am giving japanese character(double byte) as a input data types, will you please tell me how to give whether as a string or constant ..etc. and please give information generally about double byte data type
regards,
S.K.Karthikeyan.

Hi Stefan,
I got your point it's really helpful for me.
I have one more doubt;
Is there any equivalent type for double byte char in XI ?
regards,
S.K.Karthikeyan.

Euro character display problem

Hi,
I stored the euro sign (� ) (press alt+0 1 2 8 on Windows to get that sign) in an Oracle database table in varchar2 type column. when I do 'select * from table_euro;', I get the euro sign displayed properly in SQL prompt.
SQL> select * from table_euro
2 /
NAME
�
Now, I programatically read it using Java Resultset object, but it displays as � (which is Alt + 1 2 8 in key press). This' how I fetch it programmatically:
dbStatement = dbCon.createStatement(ResultSet.TYPE_FORWARD_ONLY,
ResultSet.CONCUR_READ_ONLY);
dbDataReader = dbStatement.executeQuery(query);
while(until result set is read fully)
Object fieldData = dbDataReader.getObject(fieldIndex);
System.out.println(fieldData);
For the euro sign, the above prints � . Why is that? Can anyone please throw some light on it?
Thanks.

BIJ001 wrote:
? usually creeps in when encoding decoding bytes with an encoding not capable to encode decode a given character,
which gets replaced with the question mark.:)
Two different character encodings can indeed give different sequence of bytes on the same character. Decoding will fail when the charset used to decode the bytes to a character is different from the charset used to encode the character to bytes.

Replacing a special character in a string with another string

Hi
I need to replace a special character in a string with another string.
Say there is a string - "abc's def's are alphabets"
and i need to replace all the ' (apostrophe) with &apos& ..which should look like as below
"abc&apos&s def&apos&s are alphabets" .
Kindly let me know how this requirement can be met.
Regards
Sukumari

REPLACE
Syntax Forms
Pattern-based replacement
1. REPLACE [{FIRST OCCURRENCE}|{ALL OCCURRENCES} OF]
pattern
          IN [section_of] dobj WITH new
          [IN {BYTE|CHARACTER} MODE]
          [{RESPECTING|IGNORING} CASE]
          [REPLACEMENT COUNT rcnt]
          { {[REPLACEMENT OFFSET roff]
             [REPLACEMENT LENGTH rlen]}
          | [RESULTS result_tab|result_wa] }.
Position-based replacement
2. REPLACE SECTION [OFFSET off] [LENGTH len] OF dobj WITH new
                  [IN {BYTE|CHARACTER} MODE].
Effect
This statement replaces characters or bytes of the variable dobj by characters or bytes of the data object new. Here, position-based and pattern-based replacement are possible.
When the replacement is executed, an interim result without a length limit is implicitly generated and the interim result is transferred to the data object dobj. If the length of the interim result is longer than the length of dobj, the data is cut off on the right in the case of data objects of fixed length. If the length of the interim result is shorter than the length of dobj, data objects of fixed length are filled to the right with blanks or hexadecimal zeroes. Data objects of variable length are adjusted. If data is cut off to the right when the interim result is assigned, sy-subrc is set to 2.
In the case of character string processing, the closing spaces are taken into account for data objects dobj of fixed length; they are not taken into account in the case of new.
System fields
sy-subrc Meaning
0 The specified section or subsequence was replaced by the content of new and the result is available in full in dobj.
2 The specified section or subsequence was replaced in dobj by the contents of new and the result of the replacement was cut off to the right.
4 The subsequence in sub_string was not found in dobj in the pattern-based search.
8 The data objects sub_string and new contain double-byte characters that cannot be interpreted.
Note
These forms of the statement REPLACE replace the following obsolete form:
REPLACE sub_string WITH
Syntax
REPLACE sub_string WITH new INTO dobj
        [IN {BYTE|CHARACTER} MODE]
        [LENGTH len].
Extras:
1. ... IN {BYTE|CHARACTER} MODE
2. ... LENGTH len
Effect
This statement searches through a byte string or character string dobj for the subsequence specified in sub_string and replaces the first byte or character string in dobj that matches sub_string with the contents of the data object new.
The memory areas of sub_string and new must not overlap, otherwise the result is undefined. If sub_string is an empty string, the point before the first character or byte of the search area is found and the content of new is inserted before the first character.
During character string processing, the closing blank is considered for data objects dobj, sub_string and new of type c, d, n or t.
System Fields
sy-subrc Meaning
0 The subsequence in sub_string was replaced in the target field dobj with the content of new.
4 The subsequence in sub_string could not be replaced in the target field dobj with the contents of new.
Note
This variant of the statement REPLACE will be replaced, beginning with Release 6.10, with a new variant.
Addition 1
... IN {BYTE|CHARACTER} MODE
Effect
The optional addition IN {BYTE|CHARACTER} MODE determines whether byte or character string processing will be executed. If the addition is not specified, character string processing is executed. Depending on the processing type, the data objects sub_string, new, and dobj must be byte or character type.
Addition 2
... LENGTH len
Effect
If the addition LENGTH is not specified, all the data objects involved are evaluated in their entire length. If the addition LENGTH is specified, only the first len bytes or characters of sub_string are used for the search. For len, a data object of the type i is expected.
If the length of the interim result is longer than the length of dobj, data objects of fixed length will be cut off to the right. If the length of the interim result is shorter than the length of dobj, data objects of fixed length are filled to the right with blanks or with hexadecimal 0. Data objects of variable length are adapted.
Example
After the replacements, text1 contains the complete content "I should know that you know", while text2 has the cut-off content "I should know that".
DATA:   text1      TYPE string       VALUE 'I know you know',
        text2(18) TYPE c LENGTH 18 VALUE 'I know you know',
        sub_string TYPE string       VALUE 'know',
        new        TYPE string       VALUE 'should know that'.
REPLACE sub_string WITH new INTO text1.
REPLACE sub_string WITH new INTO text2.

ABAP code for character search

Hello Gurus
I have a variable e.g. lv_txtsh which is meant to store a string. Now I want to parse through lv_txtsh and find if it contains a character "x". The length of the string stored in lv_txtsh is going to be dynamic. Can you please help me to code this scenario in ABAP OO (the new version of ABAP)
Any help is appreciated and points will be assigned.
Thanks,
Rishi

Hi,
Pease go through this. You will get an idea.
FIND
Syntax
FIND [{FIRST OCCURRENCE}|{ALL OCCURRENCES} OF] pattern
IN [section_of] dobj
[IN {BYTE|CHARACTER} MODE]
[{RESPECTING|IGNORING} CASE]
[MATCH COUNT mcnt]
{ {[MATCH OFFSET moff]
     [MATCH LENGTH mlen]}
| [RESULTS result_tab|result_wa] }
[SUBMATCHES s1 s2 ...].
Extras:
1. ... {FIRST OCCURRENCE}|{ALL OCCURRENCES} OF
2. ... IN {BYTE|CHARACTER} MODE
3. ... {RESPECTING|IGNORING} CASE
4. ... MATCH COUNT mcnt
5. ... MATCH OFFSET moff
6. ... MATCH LENGTH mlen
7. ... RESULTS result_tab|result_wa
8. ... SUBMATCHES s1 s2 ...
Effect
: The data object dobj is searched for the byte or character sequence specified by the search string pattern. The addition OCCURRENCE[S] determines whether only the first, or all occurrences are searched. The addition section_of can be used to restrict the search range. The addition CASE is used to determine whether upper/lower case is taken into account in the search. The additions MATCH, SUBMATCHES, and RESULTS are used to determine the number, position, and length of the found sequence(s).
The search is ended when the search string is found for the first time or when all the search strings in the search range have been found, or when the end of the search range is reached. The user is informed of the search result by setting sy-subrc.
In character string processing, the closing blanks are taken into account in data objects dobj of fixed length.
Note
The statement FIND IN TABLE is available for searching in internal tables.
System fields
sy-subrc Meaning
0 The search string was found at least once in the search range.
4 The search string was not found in the search range.
8 The search string contains an invalid double-byte character in character string processing.
Addition 1
... {FIRST OCCURRENCE}|{ALL OCCURRENCES} OF
The optional addition {FIRST OCCURRENCE}|{ALL OCCURRENCES} OF determines whether program only searches for the first occurrence, or all occurrences of the search string. If the addition FIRST OCCURENCE, or none of the additions is specified, only the first occurrence is found. Otherwise, all occurrences are found.
If sub_string is an empty string in the pattern or is of type c, d, n or t and only contains blank characters, when searching for the first occurrence, the space in front of the first character or byte of the search range is found. If searching for all occurrences, in this case the exception CX_SY_FIND_INFINITE_LOOP is triggered.
If regex contains a regular expression in pattern that matches the empty character string, the search for one occurrence also finds the space before the first character. When searching for all occurrences, in this case, the search finds the space before the first character, all intermediate spaces that are not within a match, and the space after the last character.
Addition 2
... IN {BYTE|CHARACTER} MODE
Effect
: The optional addition IN {BYTE|CHARACTER} MODE determines whether byte or character string processing takes place. If the addition is not specified, character string processing is performed. Depending on the processing type, dobj and sub_string in pattern must be byte-like or character-type. If regular expressions are used in pattern, only character string processing is permitted.
Addition 3
... {RESPECTING|IGNORING} CASE
Effect
: This addition is only permitted for character string processing. It determines whether upper/lower case is taken into account in pattern and dobj when searching. If RESPECTING CASE is specified, the text is case-sensitive, and if IGNORING CASE is specified, the text is not case-sensitive. If neither of the additions is specified, RESPECTING CASE is used implicitly. If a regular expression is entered for pattern as an object of the class CL_ABAP_REGEX, this addition is not permitted. Instead, the properties of the object are taken into account in the search.
Addition 4
... MATCH COUNT mcnt
Effect
: If the search string pattern is found in the search range, the addition MATCH COUNT stores the number of found locations in the data object mcnt. If FIRST OCCURRENCE is used, this value is always 1 if the search is successful. For mcnt, a variable of the data type i is expected. If the search is unsuccessful, mcnt is set to 0.
Addition 5
... MATCH OFFSET moff
Effect:
If the search string pattern is found in the search range, the addition MATCH OFFSET stores the offset of the last found location in relation to the data object dobj in the data object moff. If FIRST OCCURRENCE is used, this is the offset of the first found location. For moff, a variable of the data type i is expected. If the search is not successful, moff contains its previous value.
Note
: The system field sy-fdpos is not supplied by FIND.
Addition 6
... MATCH LENGTH mlen
Effect:
If the search string pattern is found in the search range, the addition MATCH LENGTH stores the length of the last found substring in the data object mlen. If using FIRST OCCURRENCE, this is the length of the first found location. For mlen, a variable of data type i is expected. If the search is not successful, mlen contains its previous value.
Addition 7
... RESULTS result_tab|result_wa
Effect:
If the search string pattern is found in the search range, the addition RESULTS stores the offsets of the found locations, the lengths of the found substrings, and information on the registers of the subgroups of regular expressions, either in an internal table result_tab or in a structure result_wa.
The internal table result_tab must have the table type MATCH_RESULT_TAB, and the structure result_wa must have the type MATCH_RESULT from the ABAP Dictionary. The line type of the internal table is also MATCH_RESULT.
When an internal table is entered, this is initialized before the search and a line is inserted in the table for every match found. When a structure is entered, this is assigned the values of the last found location. If FIRST OCCURRENCE is used and the search is successful, only one line is inserted in the internal table.
The line or structure type MATCH_RESULT has the following components:
OFFSET of type INT4 for the offset of the substring
LENGTH of type INT4 for the length of the substring
SUBMATCHES of table type SUBMATCH_RESULT_TAB with the line type SUBMATCH_RESULT for the offset and length of the substrings of the current found locations that are stored in the registers of the subgroups of a regular expression.
The lines of result_tab are sorted according to the columns OFFSET and LENGTH. An additional component LINE is only important in the variant FIND IN TABLE.
Following an unsuccessful search, the content of an internal table result_tab is initial, while a structure result_wa contains its previous value.
Note
The addition RESULTS is particularly suitable for use with the addition ALL OCCURRENCES when specifying a table, and for use with the FIRST OCCURRENCE when specifying a structure.
Example:
The following search for a regular expression finds the two substrings "ab" at offset 0 and "ba" at offset 2, and fills the internal table result_tab with two rows accordingly. As the regular expression contains three subgroups, the component submatches contains three lines in each case. The first line of submatches refers to the outermost bracket, the second line refers to the first internal bracket, and the third line refers to the second internal bracket. For the first found location, the first and second lines contains the offset and length while the third line is undefined. For the second found location, the first and third lines contains the offset and length, while the second line is undefined.
DATA: result_tab TYPE match_result_tab.
FIND ALL OCCURRENCES OF REGEX `((ab)|(ba))`
     IN 'abba'
     RESULTS result_tab.
Addition 8
... SUBMATCHES s1 s2 ...
Effect:
This addition is only permitted if a regular expression is used in pattern. The current contents of the register of subgroups of the regular expression for the current found location are written to the variables s1, s2, ..., for which a character-type data type is expected. When ALL OCCURRENCES is used, the last found location is evaluated. If more variables s1, s2, ... are listed than subgroups are available, the superfluous variables are initialized. If fewer variables s1, s2, ... are listed than subgroups are available, the superfluous subgroups are ignored.
Example
: The regular expression after REGEX has two subgroups. The search finds the substring from offset 0 of length 14. The content of the register of the subgroups is "Hey" and "my".
DATA: text TYPE string,
      moff TYPE i,
      mlen TYPE i,
      s1   TYPE string,
      s2   TYPE string.
text = `Hey hey, my my, Rock and roll can never die`.
FIND REGEX `(\w)\W\1\W(\w)\W+\2`
     IN text
     IGNORING CASE
     MATCH OFFSET moff
     MATCH LENGTH mlen
     SUBMATCHES s1 s2.
Reward points if helpful.
Thanks and Regards.

Inserting broken pipe character in DB from Java creating problem

Hello experts,
We want to insert a broker pipe character in oracle 10g from java.
But it get changed to pipe character.
We are having japanese client for which we want to maintain this kind of character.
If someone know the solution, please update.

Hi,
I understand your explanation.
It is not possible to put actual code, that's complex and coupled with other things.
I like to share the scenario where it failing,
1. Broken Pipe character(double byte (\u00fa\u0055)) is put along with other japanese long text in a *.txt file.
2. A java file reads the above *.txt file and inserts the file text in a table column in oracle.
The .txt file is saved in Shift_JIS encoding.
In java file, the same *.txt file is read using 'CP943C' charset and written into database.
So as per your suggestion I have to compare literal first and than if I found broken bar in file, I need to change it with specified unicode.
Is that so?
--Saurabh.

Cisco ISE NTP MD5 hash is 20-Bytes?

When attempting to configure an NTP authentication-key in the Cisco ISE CLI I noticed that it will not accept an md5 hash of 32 characters (16 bytes). Instead it is expecting a 40 character (20 bytes) hash. That is in line with a SHA-1 hash, not an MD5 hash even though there is no SHA-1 keyword, only an MD5 keyword.
What's the deal?
Cisco ISE Version: 1.1.2.145 (Update 3)
ise/user(config)# ntp authentication-key 75 ?
md5 MD5 authentication
ise/user(config)# ntp authentication-key 75 md5 hash ?
<WORD> Hashed key for authentication (Max Size - 40)
ise/user(config)# ntp authentication-key 75 md5 hash 12345678901234567890123456789012
% ERROR: Bad hashed key.
ise/user(config)# ntp authentication-key 75 md5 plain test
ise/user(config)# do show run | i md5
ntp authentication-key 75 md5 hash 97dc37c94236ec1b4c56871c2e482cbd6f56bd33
That's not an MD5 hash as it's 40 characters long (20 bytes).

Hmm, that is an interesting observation. I am guessing that it is a typo and should be "sha-1" because 40 characters is definitely not MD5 :)
I would suggest you open a case with Cisco TAC and report this. If you get a bug ID or a different answer please let us know.
Thank you for rating helpful posts!

Oracle 10g express - accent character

Hi,
We have a problem with the 10g Express database for Windows. If you create a column VARCHAR(2) for example, you can't insert a string like 'éé'. You can only insert one character with accent.
This problem doesn't exist in Oracle 10g Standard Edition. Is there a mean to change this behaviour ?
Thank you

The behavior here doesn't depend on the edition of Oracle (express and standard, in other words, will behave the same), but on the character set and NLS_LENGTH_SEMANTICS initialization parameter.
By default, when you declare a VARCHAR2(2), you are allocating 2 bytes of storage. If you are using a single-byte character set (i.e. Windows-1252), all characters require one byte of storage. If you are using a variable-length character set (i.e. UTF-8), some characters require 1 byte of storage, some require 2 bytes, and some will require 3 bytes. I'll wager that your XE database is using the UTF-8 character set, so accented characters will generally require 2 bytes of storage, hence you can only add one to a VARCHAR2(2).
One option you always have is to explicitly specify whether you want character or byte semantics when you create a table. That is, a VARCHAR2(2 BYTE) is equivelent to the default declaration of VARCHAR2(2) but a declaration of VARCHAR2(2 CHAR) allocates space for two characters in the current character set.
If you want the default when you declare a new column/ variable to be that Oracle should use character semantics rather than byte semantics, you can also set the initialization parameter NLS_LENGTH_SEMANTICS to CHAR from the default of BYTE.
Justin

How to count the number of occurences of a character

hi
wat command is used to count the number of occurences of a charcter in a line?
i have to count the number of '.' in a line

FIND
Searches for patterns.
Syntax
FIND <p> IN [SECTION OFFSET <off> LENGTH <len> OF] <text>
            [IGNORING CASE|RESPECTING CASE]
            [IN BYTE MODE|IN CHARACTER MODE]
            [MATCH OFFSET <o>] [MATCH LENGTH <l>].
The system searches the field <text> for the pattern <p>. The SECTION OFFSET <off> LENGTH <len> OF addition tells the system to search only from the <off> position in the length <len>. IGNORING CASE or RESPECTING CASE (default) specifies whether the search is to be case-sensitive. In Unicode programs, you must specify whether the statement is a character or byte operation, using the IN BYTE MODE or IN CHARACTER MODE (default) additions. The MATCH OFFSET and MATCH LENGTH additions set the offset of the first occurrence and length of the search string in the fields <p> and <l>.

About applet and 4-bytes characters

Hi,
I have attempted to display 4-bytes characters in textfield, but to no avail. Here is part of my coding:
int b[] = { 131096, 19985, 131160};
String a = new String(b, 0 , 3); // use code point to define the string
TextField hwtext = new TextField(a, 5) ;
hwtext.setFont(some font instance here)
hwtext.setEditable(true);
this.add("text1", hwtext)
Only the middle character (2-bytes) can be seen.
I used similar approach but put it to Graphics, something like:
g.setFont(textFont);
g.drawString(buffer.toString(), 40, 100);
(details omitted)
It works successfully. Any suggest?
Dave

I've tried your code but I've got the same result in both cases, ie only the middle character is displayed. So it seems to me that the problem is due to the font rather than other things.
If you say your applet could display correctly the characters, you could try to use the same font to both. What font are you using by the way?

Character Semantics multilingual

Thanks in advance
Now we are using Byte Semantics to support multilingual, If i change Byte Semantics to Character semantics then it will support multilingual?. I want to store all japanese, thai, english, German, chinese languge chracters. please reply me asap.
With Best Regards,
Prabakaran K

Character vs byte length semantics has nothing to do with what characters you can store.
The characters you can store depend on the database character set (for CHAR and VARCHAR2 columns) and the national character set (for NCHAR and NVARCHAR2 columns). Assuming you intend to store multlingual characters in CHAR & VARCHAR2 columns, your database character set would need to be AL32UFT8 (or UTF8 in older versions).
Once your database character set supports multilingual character sets, the choice of character or byte length semantics is a question of programmer simplicity. Character length semantics tends to be easier to deal with for PL/SQL programs.
Justin

Inserted value too large for column error while scheduling a job

Hi Everyone,
I am trying to schedule a PL SQL script as a job in my Oracle 10g installed and running on Windows XP.
While trying to Submit the job I get the error as "Inserted value too large for column:" followed by my entire code. The code is correct - complies and runs in Oracle ApEx's SQL Workshop.
The size of my code is 4136 character, 4348 bytes and 107 lines long. It is a code that sends an e-mail and has a +utl_smtp.write_data([Lots of HTML])+
There is no insert statement in the code whatsoever, the code only queries the database for data...
Any idea as to why I might be getting this error??
Thanks in advance
Sid

The size of my code is 4136 character, 4348 bytes and 107 lines long. It is a code that sends an e-mail and has a utl_smtp.write_data(Lots of HTML)SQL variable has maximum size of 4000

(Internationalization) - Unicode and Other ... Encoding Schemes

Hello,
I am developing a application that requires multiple languages
(Chinese/Japanese/English, French/German) support.
I plan to use utf-8 encoding, and not individual encoding for each language
like SHIFT_JIS for Japanese, BIG5 for Chinese etc.
This is more so because i would need to display multiple languages on the
same page, and allow the user to enter data in any language he/she chooses.
1. So, is the assumption that nothing but utf-8 can be used here, correct ?
2. If this is the case, why do people go for SHIFT_JS for Japanese or BIG5
for Chinese at all ? After the advent of Unicode, why cant they just use
utf-8.
3. I am using Weblogic 6. And my app is composed of JSPs alone at the
moment. It is working fine with utf-8 encoding, without me setting anything
at all in properties files etc. anywhere. I am getting data entered by user
in forms (in chinese/japanese etc) fine, and able to insert it into the
database and get it back too, without any problems.
So, why is it that people are talking of parameters to be set in properties
files to tell the app abt encoding being used etc.
4. My resource bundles are ASCII text files (.properties) which have name
value pairs. Hex Unicode numbers of the form /uXXXX represent the value. And
this works fine.
For example :
UserNameLabel = \u00e3\ufffd\u2039\u00e3
instead of -
UserNameLabel = ãf¦ãf¼ã
If the properties files have the original characters where values shud be
present, my java code is not able to read the name value pairs in Resource
Bundle.
Am i following the right approach ?
The problem with the current approach is after i create the Resource
Bundles, i must use native2ascii compiler to convert the characters into
their equivalent Hex Code values.
Thanks
JSB

charllescuba1008 wrote:
Unicode states that each character is assigned a number which is unique, this number is called code point. Right.
The relationship between characters and code points is 1:1.Uhm .... let's assume "yes" for the moment. (Note that the relationship between the Java type char and code point is not 1:1 and there are other exceptions ...)
Eg: the String *"hello"* (which is sequence of character literals) can be represent by the following Code Points
*\u0065 \u0048 \u006c \u006c \u006f*Those are the Java String unicode escapes. If you want to talk about Unicode Codepoints, then the correct notation for "Hello" would be
U+0048 U+0065 U+006C U+006C U+006F
Note that you swapped the H and e.
I also read that a certain character code point must be recognized by a specific encoding or else a question mark (?) is output in place of the character.This one is Java specific. If Java tries to translate some unicode character to bytes using some encoding that doesn't support that character then it will output the byte(s) for "?" instead.
Not all code points can be recognized by an encoding.Some encodings (such as UTF-8) can encode all codepoints, others (such as ISO-8859-*, EBCDIC or UCS-2) can not.
So, the letter *ל* would not be recognized by all encodings and should be replaced by a question mark (?) right?Only in a very specific case in Java. This is not a genral Unicode-level rule.
(disclaimer: the HTML code presented was using decimal XML entities to represent the unicode characters).
What you are seing is possibly the replacement character that your text rendering system uses to represent characters that it knows, but can't display (possibly because the current font has no character for them).

Character vs. Byte

Similar Messages

Maybe you are looking for