Character encoding format in j2me
Hi,
I am new to this forum.
I have some queries regarding Character encoding support in j2me.
1. What is the character encoding format supported in j2me?
2. Is it varies from device to device or same in all j2me devices?
3. Whether all j2me devices support UTF-8 format?
4. Are some devices sopport UTF-16 (or UTF-16BE/UTF-16LE)?
5. If a device supports UTF-8 scheme, can we assume that it will support all the languages(i am somehow concerned for Chineese)?
Eagerly waiting for the feedbacks :)
Not all devices has support for even UTF-8.
Use this class I wrote for my projects. This is implementation of Reader so you may easy use it anywhere.
* Created on 12.03.2008 14:21
package misc;
import java.io.IOException;
import java.io.InputStream;
import java.io.Reader;
* Reader of UTF-8 encoded string from bytes
* @author RFK
public class UTF8InputStreamReader extends Reader{
InputStream is;
/** Creates a new instance of UTF8InputStreamReader */
public UTF8InputStreamReader(InputStream is) {
this.is=is;
public int read(char[] cbuf, int off, int len) throws IOException {
int b,b2,b3, r=0;
while (len>0){
b = is.read();
if (b<0) break;
if (b<128)
cbuf[off]= (char)b;
else if (b<224){
b2 = is.read();
cbuf[off]= (char)(((b& 0x1F) << 6) | (b2 & 0x3F));
} else {
b2 = is.read();
b3 = is.read();
cbuf[off]= (char)(((b & 0x0F) << 12) | ((b2 & 0x3F) << 6) | (b3 & 0x3F));
r++;
off++;
len--;
return r;
public void close() throws IOException {
is.close();
}Edited by: MagDel on Jun 1, 2009 10:33 PM
Similar Messages
-
File character encoding format conversion
Hi People,
uname -a
Linux abcd.us.com 2.6.32-400.21.1.el5uek #1 SMP Wed Feb 20 01:35:01 PST 2013 x86_64 x86_64 x86_64 GNU/Linux
I am trying to convert a file abcd.dat
# file abcd.dat
abcd.dat: binary Computer Graphics Metafile
#file -i abcd.dat
abcd.dat: application/octet-stream
I've tried dos2unix in vain.
dos2unix: converting file abcd.dat to UNIX format ...
dos2unix: converting file abc.dat to UNIX format ...
dos2unix: problems converting file abc.dat
I've used iconv successfully earlier with this command
iconv -f UTF-16 -t UTF-8 abcd.dat > abcd.abc
only, this time I do not know the "from" format of the file,
#iconv -l
The following list contain all the coded character sets known. This does
not necessarily mean that all combinations of these names can be used for
the FROM and TO command line parameters. One coded character set can be
listed with several different names (aliases).
437, 500, 500V1, 850, 851, 852, 855, 856, 857, 860, 861, 862, 863, 864, 865,
866, 866NAV, 869, 874, 904, 1026, 1046, 1047, 8859_1, 8859_2, 8859_3, 8859_4,
8859_5, 8859_6, 8859_7, 8859_8, 8859_9, 10646-1:1993, 10646-1:1993/UCS4,
ANSI_X3.4-1968, ANSI_X3.4-1986, ANSI_X3.4, ANSI_X3.110-1983, ANSI_X3.110,
ARABIC, ARABIC7, ARMSCII-8, ASCII, ASMO-708, ASMO_449, BALTIC, BIG-5,
BIG-FIVE, BIG5-HKSCS, BIG5, BIG5HKSCS, BIGFIVE, BS_4730, CA, CN-BIG5, CN-GB,
CN, CP-AR, CP-GR, CP-HU, CP037, CP038, CP273, CP274, CP275, CP278, CP280,
CP281, CP282, CP284, CP285, CP290, CP297, CP367, CP420, CP423, CP424, CP437,
CP500, CP737, CP775, CP803, CP813, CP819, CP850, CP851, CP852, CP855, CP856,
CP857, CP860, CP861, CP862, CP863, CP864, CP865, CP866, CP866NAV, CP868,
CP869, CP870, CP871, CP874, CP875, CP880, CP891, CP901, CP902, CP903, CP904,
CP905, CP912, CP915, CP916, CP918, CP920, CP921, CP922, CP930, CP932, CP933,
CP935, CP936, CP937, CP939, CP949, CP950, CP1004, CP1008, CP1025, CP1026,
CP1046, CP1047, CP1070, CP1079, CP1081, CP1084, CP1089, CP1097, CP1112,
CP1122, CP1123, CP1124, CP1125, CP1129, CP1130, CP1132, CP1133, CP1137,
CP1140, CP1141, CP1142, CP1143, CP1144, CP1145, CP1146, CP1147, CP1148,
CP1149, CP1153, CP1154, CP1155, CP1156, CP1157, CP1158, CP1160, CP1161,
CP1162, CP1163, CP1164, CP1166, CP1167, CP1250, CP1251, CP1252, CP1253,
CP1254, CP1255, CP1256, CP1257, CP1258, CP1361, CP1364, CP1371, CP1388,
CP1390, CP1399, CP4517, CP4899, CP4909, CP4971, CP5347, CP9030, CP9066,
CP9448, CP10007, CP12712, CP16804, CPIBM861, CSA7-1, CSA7-2, CSASCII,
CSA_T500-1983, CSA_T500, CSA_Z243.4-1985-1, CSA_Z243.4-1985-2,
CSA_Z243.419851, CSA_Z243.419852, CSDECMCS, CSEBCDICATDE, CSEBCDICATDEA,
CSEBCDICCAFR, CSEBCDICDKNO, CSEBCDICDKNOA, CSEBCDICES, CSEBCDICESA,
CSEBCDICESS, CSEBCDICFISE, CSEBCDICFISEA, CSEBCDICFR, CSEBCDICIT, CSEBCDICPT,
CSEBCDICUK, CSEBCDICUS, CSEUCKR, CSEUCPKDFMTJAPANESE, CSGB2312, CSHPROMAN8,
CSIBM037, CSIBM038, CSIBM273, CSIBM274, CSIBM275, CSIBM277, CSIBM278,
CSIBM280, CSIBM281, CSIBM284, CSIBM285, CSIBM290, CSIBM297, CSIBM420,
CSIBM423, CSIBM424, CSIBM500, CSIBM803, CSIBM851, CSIBM855, CSIBM856,
CSIBM857, CSIBM860, CSIBM863, CSIBM864, CSIBM865, CSIBM866, CSIBM868,
CSIBM869, CSIBM870, CSIBM871, CSIBM880, CSIBM891, CSIBM901, CSIBM902,
CSIBM903, CSIBM904, CSIBM905, CSIBM918, CSIBM921, CSIBM922, CSIBM930,
CSIBM932, CSIBM933, CSIBM935, CSIBM937, CSIBM939, CSIBM943, CSIBM1008,
CSIBM1025, CSIBM1026, CSIBM1097, CSIBM1112, CSIBM1122, CSIBM1123, CSIBM1124,
CSIBM1129, CSIBM1130, CSIBM1132, CSIBM1133, CSIBM1137, CSIBM1140, CSIBM1141,
CSIBM1142, CSIBM1143, CSIBM1144, CSIBM1145, CSIBM1146, CSIBM1147, CSIBM1148,
CSIBM1149, CSIBM1153, CSIBM1154, CSIBM1155, CSIBM1156, CSIBM1157, CSIBM1158,
CSIBM1160, CSIBM1161, CSIBM1163, CSIBM1164, CSIBM1166, CSIBM1167, CSIBM1364,
CSIBM1371, CSIBM1388, CSIBM1390, CSIBM1399, CSIBM4517, CSIBM4899, CSIBM4909,
CSIBM4971, CSIBM5347, CSIBM9030, CSIBM9066, CSIBM9448, CSIBM12712,
CSIBM16804, CSIBM11621162, CSISO4UNITEDKINGDOM, CSISO10SWEDISH,
CSISO11SWEDISHFORNAMES, CSISO14JISC6220RO, CSISO15ITALIAN, CSISO16PORTUGESE,
CSISO17SPANISH, CSISO18GREEK7OLD, CSISO19LATINGREEK, CSISO21GERMAN,
CSISO25FRENCH, CSISO27LATINGREEK1, CSISO49INIS, CSISO50INIS8,
CSISO51INISCYRILLIC, CSISO58GB1988, CSISO60DANISHNORWEGIAN,
CSISO60NORWEGIAN1, CSISO61NORWEGIAN2, CSISO69FRENCH, CSISO84PORTUGUESE2,
CSISO85SPANISH2, CSISO86HUNGARIAN, CSISO88GREEK7, CSISO89ASMO449, CSISO90,
CSISO92JISC62991984B, CSISO99NAPLPS, CSISO103T618BIT, CSISO111ECMACYRILLIC,
CSISO121CANADIAN1, CSISO122CANADIAN2, CSISO139CSN369103, CSISO141JUSIB1002,
CSISO143IECP271, CSISO150, CSISO150GREEKCCITT, CSISO151CUBA,
CSISO153GOST1976874, CSISO646DANISH, CSISO2022CN, CSISO2022JP, CSISO2022JP2,
CSISO2022KR, CSISO2033, CSISO5427CYRILLIC, CSISO5427CYRILLIC1981,
CSISO5428GREEK, CSISO10367BOX, CSISOLATIN1, CSISOLATIN2, CSISOLATIN3,
CSISOLATIN4, CSISOLATIN5, CSISOLATIN6, CSISOLATINARABIC, CSISOLATINCYRILLIC,
CSISOLATINGREEK, CSISOLATINHEBREW, CSKOI8R, CSKSC5636, CSMACINTOSH,
CSNATSDANO, CSNATSSEFI, CSN_369103, CSPC8CODEPAGE437, CSPC775BALTIC,
CSPC850MULTILINGUAL, CSPC862LATINHEBREW, CSPCP852, CSSHIFTJIS, CSUCS4,
CSUNICODE, CSWINDOWS31J, CUBA, CWI-2, CWI, CYRILLIC, DE, DEC-MCS, DEC,
DECMCS, DIN_66003, DK, DS2089, DS_2089, E13B, EBCDIC-AT-DE-A, EBCDIC-AT-DE,
EBCDIC-BE, EBCDIC-BR, EBCDIC-CA-FR, EBCDIC-CP-AR1, EBCDIC-CP-AR2,
EBCDIC-CP-BE, EBCDIC-CP-CA, EBCDIC-CP-CH, EBCDIC-CP-DK, EBCDIC-CP-ES,
EBCDIC-CP-FI, EBCDIC-CP-FR, EBCDIC-CP-GB, EBCDIC-CP-GR, EBCDIC-CP-HE,
EBCDIC-CP-IS, EBCDIC-CP-IT, EBCDIC-CP-NL, EBCDIC-CP-NO, EBCDIC-CP-ROECE,
EBCDIC-CP-SE, EBCDIC-CP-TR, EBCDIC-CP-US, EBCDIC-CP-WT, EBCDIC-CP-YU,
EBCDIC-CYRILLIC, EBCDIC-DK-NO-A, EBCDIC-DK-NO, EBCDIC-ES-A, EBCDIC-ES-S,
EBCDIC-ES, EBCDIC-FI-SE-A, EBCDIC-FI-SE, EBCDIC-FR, EBCDIC-GREEK, EBCDIC-INT,
EBCDIC-INT1, EBCDIC-IS-FRISS, EBCDIC-IT, EBCDIC-JP-E, EBCDIC-JP-KANA,
EBCDIC-PT, EBCDIC-UK, EBCDIC-US, EBCDICATDE, EBCDICATDEA, EBCDICCAFR,
EBCDICDKNO, EBCDICDKNOA, EBCDICES, EBCDICESA, EBCDICESS, EBCDICFISE,
EBCDICFISEA, EBCDICFR, EBCDICISFRISS, EBCDICIT, EBCDICPT, EBCDICUK, EBCDICUS,
ECMA-114, ECMA-118, ECMA-128, ECMA-CYRILLIC, ECMACYRILLIC, ELOT_928, ES, ES2,
EUC-CN, EUC-JISX0213, EUC-JP-MS, EUC-JP, EUC-KR, EUC-TW, EUCCN, EUCJP-MS,
EUCJP-OPEN, EUCJP-WIN, EUCJP, EUCKR, EUCTW, FI, FR, GB, GB2312, GB13000,
GB18030, GBK, GB_1988-80, GB_198880, GEORGIAN-ACADEMY, GEORGIAN-PS,
GOST_19768-74, GOST_19768, GOST_1976874, GREEK-CCITT, GREEK, GREEK7-OLD,
GREEK7, GREEK7OLD, GREEK8, GREEKCCITT, HEBREW, HP-ROMAN8, HPROMAN8, HU,
IBM-803, IBM-856, IBM-901, IBM-902, IBM-921, IBM-922, IBM-930, IBM-932,
IBM-933, IBM-935, IBM-937, IBM-939, IBM-943, IBM-1008, IBM-1025, IBM-1046,
IBM-1047, IBM-1097, IBM-1112, IBM-1122, IBM-1123, IBM-1124, IBM-1129,
IBM-1130, IBM-1132, IBM-1133, IBM-1137, IBM-1140, IBM-1141, IBM-1142,
IBM-1143, IBM-1144, IBM-1145, IBM-1146, IBM-1147, IBM-1148, IBM-1149,
IBM-1153, IBM-1154, IBM-1155, IBM-1156, IBM-1157, IBM-1158, IBM-1160,
IBM-1161, IBM-1162, IBM-1163, IBM-1164, IBM-1166, IBM-1167, IBM-1364,
IBM-1371, IBM-1388, IBM-1390, IBM-1399, IBM-4517, IBM-4899, IBM-4909,
IBM-4971, IBM-5347, IBM-9030, IBM-9066, IBM-9448, IBM-12712, IBM-16804,
IBM037, IBM038, IBM256, IBM273, IBM274, IBM275, IBM277, IBM278, IBM280,
IBM281, IBM284, IBM285, IBM290, IBM297, IBM367, IBM420, IBM423, IBM424,
IBM437, IBM500, IBM775, IBM803, IBM813, IBM819, IBM848, IBM850, IBM851,
IBM852, IBM855, IBM856, IBM857, IBM860, IBM861, IBM862, IBM863, IBM864,
IBM865, IBM866, IBM866NAV, IBM868, IBM869, IBM870, IBM871, IBM874, IBM875,
IBM880, IBM891, IBM901, IBM902, IBM903, IBM904, IBM905, IBM912, IBM915,
IBM916, IBM918, IBM920, IBM921, IBM922, IBM930, IBM932, IBM933, IBM935,
IBM937, IBM939, IBM943, IBM1004, IBM1008, IBM1025, IBM1026, IBM1046, IBM1047,
IBM1089, IBM1097, IBM1112, IBM1122, IBM1123, IBM1124, IBM1129, IBM1130,
IBM1132, IBM1133, IBM1137, IBM1140, IBM1141, IBM1142, IBM1143, IBM1144,
IBM1145, IBM1146, IBM1147, IBM1148, IBM1149, IBM1153, IBM1154, IBM1155,
IBM1156, IBM1157, IBM1158, IBM1160, IBM1161, IBM1162, IBM1163, IBM1164,
IBM1166, IBM1167, IBM1364, IBM1371, IBM1388, IBM1390, IBM1399, IBM4517,
IBM4899, IBM4909, IBM4971, IBM5347, IBM9030, IBM9066, IBM9448, IBM12712,
IBM16804, IEC_P27-1, IEC_P271, INIS-8, INIS-CYRILLIC, INIS, INIS8,
INISCYRILLIC, ISIRI-3342, ISIRI3342, ISO-2022-CN-EXT, ISO-2022-CN,
ISO-2022-JP-2, ISO-2022-JP-3, ISO-2022-JP, ISO-2022-KR, ISO-8859-1,
ISO-8859-2, ISO-8859-3, ISO-8859-4, ISO-8859-5, ISO-8859-6, ISO-8859-7,
ISO-8859-8, ISO-8859-9, ISO-8859-10, ISO-8859-11, ISO-8859-13, ISO-8859-14,
ISO-8859-15, ISO-8859-16, ISO-10646, ISO-10646/UCS2, ISO-10646/UCS4,
ISO-10646/UTF-8, ISO-10646/UTF8, ISO-CELTIC, ISO-IR-4, ISO-IR-6, ISO-IR-8-1,
ISO-IR-9-1, ISO-IR-10, ISO-IR-11, ISO-IR-14, ISO-IR-15, ISO-IR-16, ISO-IR-17,
ISO-IR-18, ISO-IR-19, ISO-IR-21, ISO-IR-25, ISO-IR-27, ISO-IR-37, ISO-IR-49,
ISO-IR-50, ISO-IR-51, ISO-IR-54, ISO-IR-55, ISO-IR-57, ISO-IR-60, ISO-IR-61,
ISO-IR-69, ISO-IR-84, ISO-IR-85, ISO-IR-86, ISO-IR-88, ISO-IR-89, ISO-IR-90,
ISO-IR-92, ISO-IR-98, ISO-IR-99, ISO-IR-100, ISO-IR-101, ISO-IR-103,
ISO-IR-109, ISO-IR-110, ISO-IR-111, ISO-IR-121, ISO-IR-122, ISO-IR-126,
ISO-IR-127, ISO-IR-138, ISO-IR-139, ISO-IR-141, ISO-IR-143, ISO-IR-144,
ISO-IR-148, ISO-IR-150, ISO-IR-151, ISO-IR-153, ISO-IR-155, ISO-IR-156,
ISO-IR-157, ISO-IR-166, ISO-IR-179, ISO-IR-193, ISO-IR-197, ISO-IR-199,
ISO-IR-203, ISO-IR-209, ISO-IR-226, ISO/TR_11548-1, ISO646-CA, ISO646-CA2,
ISO646-CN, ISO646-CU, ISO646-DE, ISO646-DK, ISO646-ES, ISO646-ES2, ISO646-FI,
ISO646-FR, ISO646-FR1, ISO646-GB, ISO646-HU, ISO646-IT, ISO646-JP-OCR-B,
ISO646-JP, ISO646-KR, ISO646-NO, ISO646-NO2, ISO646-PT, ISO646-PT2,
ISO646-SE, ISO646-SE2, ISO646-US, ISO646-YU, ISO2022CN, ISO2022CNEXT,
ISO2022JP, ISO2022JP2, ISO2022KR, ISO6937, ISO8859-1, ISO8859-2, ISO8859-3,
ISO8859-4, ISO8859-5, ISO8859-6, ISO8859-7, ISO8859-8, ISO8859-9, ISO8859-10,
ISO8859-11, ISO8859-13, ISO8859-14, ISO8859-15, ISO8859-16, ISO11548-1,
ISO88591, ISO88592, ISO88593, ISO88594, ISO88595, ISO88596, ISO88597,
ISO88598, ISO88599, ISO885910, ISO885911, ISO885913, ISO885914, ISO885915,
ISO885916, ISO_646.IRV:1991, ISO_2033-1983, ISO_2033, ISO_5427-EXT, ISO_5427,
ISO_5427:1981, ISO_5427EXT, ISO_5428, ISO_5428:1980, ISO_6937-2,
ISO_6937-2:1983, ISO_6937, ISO_6937:1992, ISO_8859-1, ISO_8859-1:1987,
ISO_8859-2, ISO_8859-2:1987, ISO_8859-3, ISO_8859-3:1988, ISO_8859-4,
ISO_8859-4:1988, ISO_8859-5, ISO_8859-5:1988, ISO_8859-6, ISO_8859-6:1987,
ISO_8859-7, ISO_8859-7:1987, ISO_8859-7:2003, ISO_8859-8, ISO_8859-8:1988,
ISO_8859-9, ISO_8859-9:1989, ISO_8859-10, ISO_8859-10:1992, ISO_8859-14,
ISO_8859-14:1998, ISO_8859-15, ISO_8859-15:1998, ISO_8859-16,
ISO_8859-16:2001, ISO_9036, ISO_10367-BOX, ISO_10367BOX, ISO_11548-1,
ISO_69372, IT, JIS_C6220-1969-RO, JIS_C6229-1984-B, JIS_C62201969RO,
JIS_C62291984B, JOHAB, JP-OCR-B, JP, JS, JUS_I.B1.002, KOI-7, KOI-8, KOI8-R,
KOI8-T, KOI8-U, KOI8, KOI8R, KOI8U, KSC5636, L1, L2, L3, L4, L5, L6, L7, L8,
L10, LATIN-9, LATIN-GREEK-1, LATIN-GREEK, LATIN1, LATIN2, LATIN3, LATIN4,
LATIN5, LATIN6, LATIN7, LATIN8, LATIN10, LATINGREEK, LATINGREEK1,
MAC-CYRILLIC, MAC-IS, MAC-SAMI, MAC-UK, MAC, MACCYRILLIC, MACINTOSH, MACIS,
MACUK, MACUKRAINIAN, MIK, MS-ANSI, MS-ARAB, MS-CYRL, MS-EE, MS-GREEK,
MS-HEBR, MS-MAC-CYRILLIC, MS-TURK, MS932, MS936, MSCP949, MSCP1361,
MSMACCYRILLIC, MSZ_7795.3, MS_KANJI, NAPLPS, NATS-DANO, NATS-SEFI, NATSDANO,
NATSSEFI, NC_NC0010, NC_NC00-10, NC_NC00-10:81, NF_Z_62-010,
NF_Z_62-010_(1973), NF_Z_62-010_1973, NF_Z_62010, NF_Z_62010_1973, NO, NO2,
NS_4551-1, NS_4551-2, NS_45511, NS_45512, OS2LATIN1, OSF00010001,
OSF00010002, OSF00010003, OSF00010004, OSF00010005, OSF00010006, OSF00010007,
OSF00010008, OSF00010009, OSF0001000A, OSF00010020, OSF00010100, OSF00010101,
OSF00010102, OSF00010104, OSF00010105, OSF00010106, OSF00030010, OSF0004000A,
OSF0005000A, OSF05010001, OSF100201A4, OSF100201A8, OSF100201B5, OSF100201F4,
OSF100203B5, OSF1002011C, OSF1002011D, OSF1002035D, OSF1002035E, OSF1002035F,
OSF1002036B, OSF1002037B, OSF10010001, OSF10020025, OSF10020111, OSF10020115,
OSF10020116, OSF10020118, OSF10020122, OSF10020129, OSF10020352, OSF10020354,
OSF10020357, OSF10020359, OSF10020360, OSF10020364, OSF10020365, OSF10020366,
OSF10020367, OSF10020370, OSF10020387, OSF10020388, OSF10020396, OSF10020402,
OSF10020417, PT, PT2, PT154, R8, RK1048, ROMAN8, RUSCII, SE, SE2,
SEN_850200_B, SEN_850200_C, SHIFT-JIS, SHIFT_JIS, SHIFT_JISX0213, SJIS-OPEN,
SJIS-WIN, SJIS, SS636127, STRK1048-2002, ST_SEV_358-88, T.61-8BIT, T.61,
T.618BIT, TCVN-5712, TCVN, TCVN5712-1, TCVN5712-1:1993, TIS-620, TIS620-0,
TIS620.2529-1, TIS620.2533-0, TIS620, TS-5881, TSCII, UCS-2, UCS-2BE,
UCS-2LE, UCS-4, UCS-4BE, UCS-4LE, UCS2, UCS4, UHC, UJIS, UK, UNICODE,
UNICODEBIG, UNICODELITTLE, US-ASCII, US, UTF-7, UTF-8, UTF-16, UTF-16BE,
UTF-16LE, UTF-32, UTF-32BE, UTF-32LE, UTF7, UTF8, UTF16, UTF16BE, UTF16LE,
UTF32, UTF32BE, UTF32LE, VISCII, WCHAR_T, WIN-SAMI-2, WINBALTRIM,
WINDOWS-31J, WINDOWS-874, WINDOWS-936, WINDOWS-1250, WINDOWS-1251,
WINDOWS-1252, WINDOWS-1253, WINDOWS-1254, WINDOWS-1255, WINDOWS-1256,
WINDOWS-1257, WINDOWS-1258, WINSAMI2, WS2, YU
==================================================
also,
#which od
/usr/bin/od
but I don't know how to use it.
==================================
#cat -v abcd.dat
has a lot of ^@
===================================
#echo $LANG
en_US.UTF-8
======================================================================================
#hexdump -C abcd.dat|head -5
00000000 00 22 00 34 00 36 00 32 00 39 00 33 00 22 00 7c |.".4.6.2.9.3.".||
00000010 00 22 00 32 00 30 00 31 00 33 00 2d 00 31 00 31 |.".2.0.1.3.-.1.1|
00000020 00 2d 00 31 00 38 00 20 00 30 00 38 00 3a 00 30 |.-.1.8. .0.8.:.0|
00000030 00 39 00 3a 00 34 00 38 00 22 00 7c 00 22 00 33 |.9.:.4.8.".|.".3|
00000040 00 36 00 37 00 22 00 7c 00 22 00 53 00 75 00 73 |.6.7.".|.".S.u.s|
=======================================================================================
#vi abcd.tst
testing
esc:wq
#file abcd.tst
abcd.tst: ASCII text
Let me know the complete iconv command with from-and-to encoding.
Appreciate any help.Hi BalusC,
as we write in jsp page as <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
is their something we can write in .properties file
russian words are not correctly displayed in browser......how we can dispaly it in correct format...??
i have all russian words in my .properties file
Thanks a lot -
What every developer should know about character encoding
This was originally posted (with better formatting) at Moderator edit: link removed/what-every-developer-should-know-about-character-encoding.html. I'm posting because lots of people trip over this.
If you write code that touches a text file, you probably need this.
Lets start off with two key items
1.Unicode does not solve this issue for us (yet).
2.Every text file is encoded. There is no such thing as an unencoded file or a "general" encoding.
And lets add a codacil to this – most Americans can get by without having to take this in to account – most of the time. Because the characters for the first 127 bytes in the vast majority of encoding schemes map to the same set of characters (more accurately called glyphs). And because we only use A-Z without any other characters, accents, etc. – we're good to go. But the second you use those same assumptions in an HTML or XML file that has characters outside the first 127 – then the trouble starts.
The computer industry started with diskspace and memory at a premium. Anyone who suggested using 2 bytes for each character instead of one would have been laughed at. In fact we're lucky that the byte worked best as 8 bits or we might have had fewer than 256 bits for each character. There of course were numerous charactersets (or codepages) developed early on. But we ended up with most everyone using a standard set of codepages where the first 127 bytes were identical on all and the second were unique to each set. There were sets for America/Western Europe, Central Europe, Russia, etc.
And then for Asia, because 256 characters were not enough, some of the range 128 – 255 had what was called DBCS (double byte character sets). For each value of a first byte (in these higher ranges), the second byte then identified one of 256 characters. This gave a total of 128 * 256 additional characters. It was a hack, but it kept memory use to a minimum. Chinese, Japanese, and Korean each have their own DBCS codepage.
And for awhile this worked well. Operating systems, applications, etc. mostly were set to use a specified code page. But then the internet came along. A website in America using an XML file from Greece to display data to a user browsing in Russia, where each is entering data based on their country – that broke the paradigm.
Fast forward to today. The two file formats where we can explain this the best, and where everyone trips over it, is HTML and XML. Every HTML and XML file can optionally have the character encoding set in it's header metadata. If it's not set, then most programs assume it is UTF-8, but that is not a standard and not universally followed. If the encoding is not specified and the program reading the file guess wrong – the file will be misread.
Point 1 – Never treat specifying the encoding as optional when writing a file. Always write it to the file. Always. Even if you are willing to swear that the file will never have characters out of the range 1 – 127.
Now lets' look at UTF-8 because as the standard and the way it works, it gets people into a lot of trouble. UTF-8 was popular for two reasons. First it matched the standard codepages for the first 127 characters and so most existing HTML and XML would match it. Second, it was designed to use as few bytes as possible which mattered a lot back when it was designed and many people were still using dial-up modems.
UTF-8 borrowed from the DBCS designs from the Asian codepages. The first 128 bytes are all single byte representations of characters. Then for the next most common set, it uses a block in the second 128 bytes to be a double byte sequence giving us more characters. But wait, there's more. For the less common there's a first byte which leads to a sersies of second bytes. Those then each lead to a third byte and those three bytes define the character. This goes up to 6 byte sequences. Using the MBCS (multi-byte character set) you can write the equivilent of every unicode character. And assuming what you are writing is not a list of seldom used Chinese characters, do it in fewer bytes.
But here is what everyone trips over – they have an HTML or XML file, it works fine, and they open it up in a text editor. They then add a character that in their text editor, using the codepage for their region, insert a character like ß and save the file. Of course it must be correct – their text editor shows it correctly. But feed it to any program that reads according to the encoding and that is now the first character fo a 2 byte sequence. You either get a different character or if the second byte is not a legal value for that first byte – an error.
Point 2 – Always create HTML and XML in a program that writes it out correctly using the encode. If you must create with a text editor, then view the final file in a browser.
Now, what about when the code you are writing will read or write a file? We are not talking binary/data files where you write it out in your own format, but files that are considered text files. Java, .NET, etc all have character encoders. The purpose of these encoders is to translate between a sequence of bytes (the file) and the characters they represent. Lets take what is actually a very difficlut example – your source code, be it C#, Java, etc. These are still by and large "plain old text files" with no encoding hints. So how do programs handle them? Many assume they use the local code page. Many others assume that all characters will be in the range 0 – 127 and will choke on anything else.
Here's a key point about these text files – every program is still using an encoding. It may not be setting it in code, but by definition an encoding is being used.
Point 3 – Always set the encoding when you read and write text files. Not just for HTML & XML, but even for files like source code. It's fine if you set it to use the default codepage, but set the encoding.
Point 4 – Use the most complete encoder possible. You can write your own XML as a text file encoded for UTF-8. But if you write it using an XML encoder, then it will include the encoding in the meta data and you can't get it wrong. (it also adds the endian preamble to the file.)
Ok, you're reading & writing files correctly but what about inside your code. What there? This is where it's easy – unicode. That's what those encoders created in the Java & .NET runtime are designed to do. You read in and get unicode. You write unicode and get an encoded file. That's why the char type is 16 bits and is a unique core type that is for characters. This you probably have right because languages today don't give you much choice in the matter.
Point 5 – (For developers on languages that have been around awhile) – Always use unicode internally. In C++ this is called wide chars (or something similar). Don't get clever to save a couple of bytes, memory is cheap and you have more important things to do.
Wrapping it up
I think there are two key items to keep in mind here. First, make sure you are taking the encoding in to account on text files. Second, this is actually all very easy and straightforward. People rarely screw up how to use an encoding, it's when they ignore the issue that they get in to trouble.
Edited by: Darryl Burke -- link removedDavidThi808 wrote:
This was originally posted (with better formatting) at Moderator edit: link removed/what-every-developer-should-know-about-character-encoding.html. I'm posting because lots of people trip over this.
If you write code that touches a text file, you probably need this.
Lets start off with two key items
1.Unicode does not solve this issue for us (yet).
2.Every text file is encoded. There is no such thing as an unencoded file or a "general" encoding.
And lets add a codacil to this – most Americans can get by without having to take this in to account – most of the time. Because the characters for the first 127 bytes in the vast majority of encoding schemes map to the same set of characters (more accurately called glyphs). And because we only use A-Z without any other characters, accents, etc. – we're good to go. But the second you use those same assumptions in an HTML or XML file that has characters outside the first 127 – then the trouble starts. Pretty sure most Americans do not use character sets that only have a range of 0-127. I don't think I have every used a desktop OS that did. I might have used some big iron boxes before that but at that time I wasn't even aware that character sets existed.
They might only use that range but that is a different issue, especially since that range is exactly the same as the UTF8 character set anyways.
>
The computer industry started with diskspace and memory at a premium. Anyone who suggested using 2 bytes for each character instead of one would have been laughed at. In fact we're lucky that the byte worked best as 8 bits or we might have had fewer than 256 bits for each character. There of course were numerous charactersets (or codepages) developed early on. But we ended up with most everyone using a standard set of codepages where the first 127 bytes were identical on all and the second were unique to each set. There were sets for America/Western Europe, Central Europe, Russia, etc.
And then for Asia, because 256 characters were not enough, some of the range 128 – 255 had what was called DBCS (double byte character sets). For each value of a first byte (in these higher ranges), the second byte then identified one of 256 characters. This gave a total of 128 * 256 additional characters. It was a hack, but it kept memory use to a minimum. Chinese, Japanese, and Korean each have their own DBCS codepage.
And for awhile this worked well. Operating systems, applications, etc. mostly were set to use a specified code page. But then the internet came along. A website in America using an XML file from Greece to display data to a user browsing in Russia, where each is entering data based on their country – that broke the paradigm.
The above is only true for small volume sets. If I am targeting a processing rate of 2000 txns/sec with a requirement to hold data active for seven years then a column with a size of 8 bytes is significantly different than one with 16 bytes.
Fast forward to today. The two file formats where we can explain this the best, and where everyone trips over it, is HTML and XML. Every HTML and XML file can optionally have the character encoding set in it's header metadata. If it's not set, then most programs assume it is UTF-8, but that is not a standard and not universally followed. If the encoding is not specified and the program reading the file guess wrong – the file will be misread.
The above is out of place. It would be best to address this as part of Point 1.
Point 1 – Never treat specifying the encoding as optional when writing a file. Always write it to the file. Always. Even if you are willing to swear that the file will never have characters out of the range 1 – 127.
Now lets' look at UTF-8 because as the standard and the way it works, it gets people into a lot of trouble. UTF-8 was popular for two reasons. First it matched the standard codepages for the first 127 characters and so most existing HTML and XML would match it. Second, it was designed to use as few bytes as possible which mattered a lot back when it was designed and many people were still using dial-up modems.
UTF-8 borrowed from the DBCS designs from the Asian codepages. The first 128 bytes are all single byte representations of characters. Then for the next most common set, it uses a block in the second 128 bytes to be a double byte sequence giving us more characters. But wait, there's more. For the less common there's a first byte which leads to a sersies of second bytes. Those then each lead to a third byte and those three bytes define the character. This goes up to 6 byte sequences. Using the MBCS (multi-byte character set) you can write the equivilent of every unicode character. And assuming what you are writing is not a list of seldom used Chinese characters, do it in fewer bytes.
The first part of that paragraph is odd. The first 128 characters of unicode, all unicode, is based on ASCII. The representational format of UTF8 is required to implement unicode, thus it must represent those characters. It uses the idiom supported by variable width encodings to do that.
But here is what everyone trips over – they have an HTML or XML file, it works fine, and they open it up in a text editor. They then add a character that in their text editor, using the codepage for their region, insert a character like ß and save the file. Of course it must be correct – their text editor shows it correctly. But feed it to any program that reads according to the encoding and that is now the first character fo a 2 byte sequence. You either get a different character or if the second byte is not a legal value for that first byte – an error.
Not sure what you are saying here. If a file is supposed to be in one encoding and you insert invalid characters into it then it invalid. End of story. It has nothing to do with html/xml.
Point 2 – Always create HTML and XML in a program that writes it out correctly using the encode. If you must create with a text editor, then view the final file in a browser.
The browser still needs to support the encoding.
Now, what about when the code you are writing will read or write a file? We are not talking binary/data files where you write it out in your own format, but files that are considered text files. Java, .NET, etc all have character encoders. The purpose of these encoders is to translate between a sequence of bytes (the file) and the characters they represent. Lets take what is actually a very difficlut example – your source code, be it C#, Java, etc. These are still by and large "plain old text files" with no encoding hints. So how do programs handle them? Many assume they use the local code page. Many others assume that all characters will be in the range 0 – 127 and will choke on anything else.
I know java files have a default encoding - the specification defines it. And I am certain C# does as well.
Point 3 – Always set the encoding when you read and write text files. Not just for HTML & XML, but even for files like source code. It's fine if you set it to use the default codepage, but set the encoding.
It is important to define it. Whether you set it is another matter.
Point 4 – Use the most complete encoder possible. You can write your own XML as a text file encoded for UTF-8. But if you write it using an XML encoder, then it will include the encoding in the meta data and you can't get it wrong. (it also adds the endian preamble to the file.)
Ok, you're reading & writing files correctly but what about inside your code. What there? This is where it's easy – unicode. That's what those encoders created in the Java & .NET runtime are designed to do. You read in and get unicode. You write unicode and get an encoded file. That's why the char type is 16 bits and is a unique core type that is for characters. This you probably have right because languages today don't give you much choice in the matter.
Unicode character escapes are replaced prior to actual code compilation. Thus it is possible to create strings in java with escaped unicode characters which will fail to compile.
Point 5 – (For developers on languages that have been around awhile) – Always use unicode internally. In C++ this is called wide chars (or something similar). Don't get clever to save a couple of bytes, memory is cheap and you have more important things to do.
No. A developer should understand the problem domain represented by the requirements and the business and create solutions that appropriate to that. Thus there is absolutely no point for someone that is creating an inventory system for a stand alone store to craft a solution that supports multiple languages.
And another example is with high volume systems moving/storing bytes is relevant. As such one must carefully consider each text element as to whether it is customer consumable or internally consumable. Saving bytes in such cases will impact the total load of the system. In such systems incremental savings impact operating costs and marketing advantage with speed. -
How can I know the encoding format for a file.
I have files encoded in English, Spanish, Japanese etc. I want to know which file has which encoding format while reading.
Can anyone suggest.
AshishLanguage is different from "encoding"...if you mean character encoding. Multiple languages can be represented by a single encoding. The fact that you mix language and encoding in your question confuses me about what you are asking.
If you want to know what language a file uses, you can always use a meta-tag. Or you can do some kind of text analysis based on dictionary lookups to determine language...too complex for my tastes. I think a simple language tag in the file makes more sense.
As for character encoding, you either standardize on a single encoding for all files or again use a meta-tag. If you standardize on a single encoding, you should probably consider one of the Unicode encodings instead of any other character set encoding.
Regards,
John O'Conner -
Wrong character encoding from flash to mysql
Hi, im experiencing problems with character encoding not
functioning correctly when sending from flash to mysql. What i am
doing is doing a contact form in flash which then sends the value
to a php file which takes the values and inserts them into a table.
As i'm using icelandic charecters i need the char encoding to be
either latin1 or utf8 in mysql, or at least i think so. But it
seems that flash or the php document isn't sending in the same
format as i have selected in mysql because all special icelandic
characters come scrambled in the mysql table. Firefox tells me
tough that the html document containing the flash movie is using
utf-8.I don't know anything about Icelandic characters, but Flash
generally really likes UTF-8. So it should be sending that if that
is what it is starting with.
You aren't using any kind of useCodePage? That will mess it
up.
Are you sure that the input method is Icelandic?
In the testing environment can you list variables (from the
debug menu) and see if they look proper? If they do then Flash is
readying them correctly and the problem must be coming in further
down stream. -
Character Encoding in Crystal Reports
Hi,
We are using HTML character encoding to prevent SQL injection, i.e. when a special character like apostrophe (') is entered, it is saved as '. I wanted to know if we could manipulate the data in between the two operations of fetching the data from the DB and displaying the data using Crystal?
We could use formulae/functions in the design/query, but I am looking for a more generic solution using Crystal API. I have attached a screenshot of how Mc'Donald is displayed in the Report.
Thanks for the help!
NoopurAll you need to do is format the field in CR to use HTML Text Interpretation. However note that CR does not support all HTML tags. For details see the following KBAs:
1217084 - What are the supported HTML tags and attributes for the HTML Text Interpretation in Crystal Reports?
1272676 - Not all the standard HTML tags are interpreted in Crystal Reports when using HTML Text Interpretation
- Ludek
Senior Support Engineer AGS Product Support, Global Support Center Canada
Follow us on Twitter -
How PI handels encoding formats?
Hello Experts,
Please help to understand how PI 7.1 treats different encoding formats eg ANSCI ect or how it handels properity chacters coming in messages?
- RajanHi,
in PI in ID in your comm channel, you have a property content encoding where you can specify the characater encoding of your msg being sent or recived by PI.............
moreover for some adapters you have some module properties in which you can set the character encoding of your msg in PI in ID..........
Regards,
Rajeev Gupta -
Change character encoding?
Edit > Preferences > New Document > Character
encoding is set to UTF-8
However when I edit documents with non-standard extensions
(I'm working on PHP files with a .ctp extension) the document still
seems to save in iso-8859-1 format. This problem doesn't seem to
occur on files with .php/html extensions.
Does anyone know of a solution to this problem?
Thanks,
EmilI'm not sure where you are getting %xx encoded UTF-8.... Is it cuz you have it in a GET method form and that's what you are seeing in the browser's location bar? ...
Let's assume you have a form on a page, and the page's charset is set to UTF-8, and you want to generate a URL encoded string (%xx format, although URLEncoder will not encode ASCII chars that way...).
In the page processing the form, you need to do this:
request.setCharacterEncoding("UTF-8"); // makes bytes read as UTF-8 strings(assumes that the form page was properly set to the UTF-8 charset)
String fieldValue = request.getParameter("fieldName"); // get value
// the value is now a Unicode String in Java, generated from reading the bytes submitted from the form as UTF-8 encoded text...
String utf8EncString = URLEncoder.encode(fieldValue, "UTF-8");
// now utf8EncString is a URL encoded (%xx) string of UTF-8 values
String euckrEncString = URLEncoder.encode(fieldValue, "EUC-KR");
// now euckrEncString is a URL encoded (%xx) string of EUC-KR valuesWhat is probably screwing things up for you mostly is this:
euckrValue = new String(utf8Value.getBytes(), "EUC-KR");
What this does is takes the bytes of the string utf8Value (which is not really UTF-8... see below) in the local encoding (possibly Cp1252 (Windows) or ISO8895-1 (Linux), or EUC-KR if it's Korean Windows), and then reads them as if they were EUC-KR... which they aren't.
The key here is that Strings in Java are not of any encoding. They are pure Unicode values. Encodings only matter when converting to or from bytes. The strings stored in a file or sent over the net have to convert to bytes since that's what is stored/sent, just bytes. The encoding defines how the characters can be encoded into 1 or more bytes, and thus reconstructed. -
Setting Character encoding programmaticaly?
Hi,
I am using Sun J2ME wireless toolkit 2.1, and i have a problem with characer encoding. I am receiving text from a .NET web service, and after some processing in the client, i send the string back.
The problem is, the string i am sending back includes Turkish characters. These are sent as question marks instead of characters.
I have failed to find a method that changes the character encoding used while making a web service call.
Actually, i could not see any way to change the encoding overall. For the emulator, property file can be used, but what about the devices i'll be deploying the app? It'd be really great if someone could point me in the right direction.
Best RegardsHi,
My situation is as follows. I have .NET web services on the server side, and i am using mobile devices as clients. When i get a string from method A in web service , i can display it on the device screen without a problem. after that, if i send the same string that i've received from method A as a parameter to method B, the .NET code receives garbage instead of turkish chars.
At the moment i am encoding turkish chars at the client side, and decoding them at the .net web server processing code.
I'd like to try setting the encoding to utf8, but as i have written, i have not seen any way of doing this. Changing properties file for emulator is possible, but how can i do it for the target devices. I have not seen an api call for this purpose in midp or cldc docs. Thanks for your answer
Regards -
Why, after all these years, can't Thunderbird auto-detect character encoding
judging by all the existing messages and complaints about this, not to mention erroneous posts that say the problem is solved when it isn't, I have to conclude Mozilla either doesn't believe this is a problem or doesn't care to fix it. The bottom line is that there is no way to tell Thunderbird to automatically display emails in the character coding format they were written in. I could understand cases where the headers are not properly filled in, but I see tons of emails in which the encoding is plainly there in the headers within the message source. You can force it, but if you do so via the menu VIEW->Character Encoding->UTF8 (for example) it won't "stick" if you view another message. But who would want it to "stick" permanently anyway? What the average user really wants is to be able to toggle VIEW->Character Encoding->Auto Detect from its default "off" to simply "on", and not have to bother with it anymore.
This is a problem that seems to have gone on forever, and it NEVER happens with other email clients. If there is some backdoor way to actually make autodetect work, I'd appreciate knowing about it. But more important, I think ALL users would appreciate it if it were not some secret "backdoor" setting, but a simple global menu choice for all accounts. Can Mozilla please fix this problem once and for all?You said...
''Thunderbird is supposed to be using the encoding in the mail.''
I figured is "should", i'm just reporting that it doesn't
You said...
''Setting auto detect to on disables that.''
Please explain. I've looked at every setting I can find and there is no way to set auto detect to "ON". I DID try setting it to "universal" in an attempt top solve the problem, but I have since restored it to "off", because the universal setting doesn't help.
you said...
''"Based on your earlier response I assume you need to press the F10 key to see the tools menu you were refered to." ''
No... I never said that anywhere. I DID refer to Menu->View_>Character Encoding, and I did refer to right clicking on individual folders, to get to the properties dialog, and the general information tab. But F-10 doesn't do anything
You said...
''I have examines dozens of mails in my inbox and each honours the character encoding set in the HTML''
Well, mine NEVER did. A short example from an email I got today pretty much is exemplative of all mail I get from GMAIL...
--089e013a0572a067a404fc73ceda
Content-Type: text/plain; charset=UTF-8
Ok, very good. Thank you. Phoenix sent you a friend request on Facebook by
the way. Talk to you soon.
--089e013a0572a067a404fc73ceda
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
<p dir=3D"ltr">Ok, very good. Thank you. Phoenix sent you a friend request=
=C2=A0 on Facebook by the way.=C2=A0 Talk to you soon.</p>
--089e013a0572a067a404fc73ceda--
See those incidences pf "=C2=A0"? Each one displays as a strange character, a capitol A with a curved line over it. If I manually set my default encoding to UTF 8, the weird characters go away. If I leave it as Western, there is nothing I can do to tell Thunderbird to "auto detect".
Anyway, I suppose at this point that no one responsible for the product coding is seriously looking at my issue, which is why its never been solved. If anyone does intend to help track it down and solve it, I'll be happy to provide all the examples and screen shots they ask for. Otherwise. -
What character encoding standard does tuxedo use
Hi,
I am trying to resolve a problem with communication between Tuxedo 6.4 and Vitria.
It seems that there is a problem with the translation of special characters. Does
anyone know what encoding standard that Tuxedo uses?
Thanks.Thanks Scott, actually I was asked the following question by Vitria Technical Support,
can you help?
"XDR (External Data Representation) is a protocol used by BEA Tuxedo's
communication engine. XDR handles data format transformations when passing
messages across dissimilar processor architectures.
This is not the equivalent of Character Encoding. I specifically need the
Character Encoding used. I am not sure where your admin needs to check for
this - it might even be set at the OS level. I suspect that it will be
something like ISO-8859-1 or some derivative."
Thanks.
Scott Orshan <[email protected]> wrote:
Within a machine, TUXEDO just sends the bytes that you give it. When
it
goes between machines, it uses XDR to encode the data values for
transmission. There is no character set translation going on, unless
you
are going to an EBCDIC machine. (If you are using data encryption
[tpseal] in TUXEDO 7.1 or 8.0 your data may be encoded even if it stays
on the same machine type.)
Scott Orshan
BEA Systems
Richard Astill wrote:
Hi,
I am trying to resolve a problem with communication between Tuxedo6.4 and Vitria.
It seems that there is a problem with the translation of special characters.Does
anyone know what encoding standard that Tuxedo uses?
Thanks. -
I've invented a Character encoding.
Advantages of this: ANSI ASCII compatible, Bitwise operations based, Self-synchronising, Abundant.
Yields of this encoding, against those of UTF-8, are,
Number of Bits This Encoding UTF-8
Number of Codes Accumulation Number of Codes Accumulation
8 128 128 128 128
10 192 320 0 128
12 512 832 0 128
14 1280 2112 0 128
16 3072 5184 2048 2176
18 7168 12352 0 2176
20 16384 28736 0 2176
22 36864 65600 0 2176
24 81920 147520 65536 67712
26 180224 327744 0 67712
28 393216 720960 0 67712
30 851968 1572928 0 67712
32 1835008 3407936 2097152 2164864
This is a new Character encoding scheme (CES) that maps Unicode code points to bit sequences.
Could you please suggest improvements?
Please bear with me as the table may not be formatted well for you, especially when using serif. When reading a line in the table, the first value is number of bits and the next pair is for this encoding, and the other pair is for UTF-8, with the first of each pair being the number of codes and the others are their accumulation.
Sorry! I wish to provide more details on this but I'm restricted for some time. I hope that this does not stop you from assisting me.
Regards
Anbu
This encoding maintains almost all the properties of UTF-8 in a more compact format. ANSI ASCII compatiblity, Bitwise operations based, Self-synchronising and Abundance are some of the properties of this encoding. Further, this encoding encodes all characters in far fewer number of bits than UTF-8 as shown in the table. As I had mentioned earlier I will provide more details and proofs soon. This post is a request for suggestions. Please suggest the most suitable place for this post.Ambuk which Adobe software or service is your inquiry in relation too?
-
Adding new character encoding to PBP
is there any way to add new character encoding in PBP1.1..?
.I want it to support japanese encoding1) Derive MyDateFormat from SimpleDateFormat only allowing the default constructor.
2) Override the public void applyPattern(String pattern) method so that it detects the 'Q' and replaces it with some easily identifiable pattern involving the month (say �MM� ) and then call the superclass applyPattern method.
3) Override the public StringBuffer format(Date date,
StringBuffer toAppendTo,
FieldPosition fieldPosition)
method such that if first calls the superclass method to get the formatted output and then corrects this output by replacing (using regular expressions) the �01�, �02� etc with the appropriate quarter.
You might do better to not to actually derive a new class from SimpleDateFormat but just create a class which uses SimpleDateFormat. -
Utlfile:- What is Encoding format/type for UTLFILE.
Hello,
If i create file using UTLFILE then what will be the Encoding format/type (like ANSII, UTF-8,unicode).
Please reply soon.
Thanks in advancefrom documentation:
UTL_FILE expects that files opened by UTL_FILE.FOPEN in text mode are encoded in the database character set. It expects that files opened by UTL_FILE.FOPEN_NCHAR in text mode are encoded in the UTF8 character set. If an opened file is not encoded in the expected character set, the result of an attempt to read the file is indeterminate. When data encoded in one character set is read and Globalization Support is told (such as by means of NLS_LANG) that it is encoded in another character set, the result is indeterminate. If NLS_LANG is set, it should therefore be the same as the database character set..For more information:
http://docs.oracle.com/cd/E11882_01/appdev.112/e10577/u_file.htm -
XML Character Encoding Using UTL_DBWS
Hi,
I have a database with WINDOWS-1252 character encoding. I'm using UTL_DBWS to call a web service method which echoes a given string. For this purpose, I do the following:
DECLARE
v_wsdl CONSTANT VARCHAR2(500) := 'http://myhost/myservice?wsdl';
v_namespace CONSTANT VARCHAR2(500) := 'my.namespace';
v_service_name CONSTANT UTL_DBWS.QNAME := UTL_DBWS.to_qname(v_namespace, 'MyService');
v_service_port CONSTANT UTL_DBWS.QNAME := UTL_DBWS.to_qname(v_namespace, 'MySoapServicePort');
v_ping CONSTANT UTL_DBWS.QNAME := UTL_DBWS.to_qname(v_namespace, 'ping');
v_wsdl_uri CONSTANT URITYPE := URIFACTORY.getURI(v_wsdl);
v_str_request CONSTANT VARCHAR2(4000) :=
'<?xml version="1.0" encoding="UTF-8" ?>
<ping>
<pingRequest>
<echoData>Dev Team üöäß</echoData>
</pingRequest>
</ping>';
v_service UTL_DBWS.SERVICE;
v_call UTL_DBWS.CALL;
v_request XMLTYPE := XMLTYPE (v_str_request);
v_response SYS.XMLTYPE;
BEGIN
DBMS_JAVA.set_output(20000);
UTL_DBWS.set_logger_level('FINE');
v_service := UTL_DBWS.create_service(v_wsdl_uri, v_service_name);
v_call := UTL_DBWS.create_call(v_service, v_service_port, v_ping);
UTL_DBWS.set_property(v_call, 'oracle.webservices.charsetEncoding', 'UTF-8');
v_response := UTL_DBWS.invoke(v_call, v_request);
DBMS_OUTPUT.put_line(v_response.getStringVal());
UTL_DBWS.release_call(v_call);
UTL_DBWS.release_all_services;
END;
/Here is the SERVER OUTPUT:
ServiceFacotory: oracle.j2ee.ws.client.ServiceFactoryImpl@a9deba8d
WSDL: http://myhost/myservice?wsdl
Service: oracle.j2ee.ws.client.dii.ConfiguredService@c881d39e
*** Created service: -2121202561 - oracle.jpub.runtime.dbws.DbwsProxy$ServiceProxy@afb58220 ***
ServiceProxy.get(-2121202561) = oracle.jpub.runtime.dbws.DbwsProxy$ServiceProxy@afb58220
Collection Call info: port={my.namespace}MySoapServicePort, operation={my.namespace}ping, returnType={my.namespace}PingResponse, params count=1
setProperty(oracle.webservices.charsetEncoding, UTF-8)
dbwsproxy.add.map: ns, my.namespace
Attribute 0: my.namespace: xmlns:ns, my.namespace
dbwsproxy.lookup.map: ns, my.namespace
createElement(ns:ping,null,my.namespace)
dbwsproxy.add.soap.element.namespace: ns, my.namespace
Attribute 0: my.namespace: xmlns:ns, my.namespace
dbwsproxy.element.node.child.3: 1, null
createElement(echoData,null,null)
dbwsproxy.text.node.child.0: 3, Dev Team üöäß
request:
<ns:ping xmlns:ns="my.namespace">
<pingRequest>
<echoData>Dev Team üöäß</echoData>
</pingRequest>
</ns:ping>
Jul 8, 2008 6:58:49 PM oracle.j2ee.ws.client.StreamingSender _sendImpl
FINE: StreamingSender.response:<?xml version = '1.0' encoding = 'UTF-8'?>
<env:Envelope xmlns:env="http://schemas.xmlsoap.org/soap/envelope/"><env:Header/><env:Body><ns0:pingResponse xmlns:ns0="my.namespace"><pingResponse><responseTimeMillis>0</responseTimeMillis><resultCode>0</resultCode><echoData>Dev Team üöäß</echoData></pingResponse></ns0:pingResponse></env:Body></env:Envelope>
response:
<ns0:pingResponse xmlns:ns0="my.namespace">
<pingResponse>
<responseTimeMillis>0</responseTimeMillis>
<resultCode>0</resultCode>
<echoData>Dev Team üöäß</echoData>
</pingResponse>
</ns0:pingResponse>As you can see the character encoding is broken in the request and in the response, i.e. the SOAP encoder does not take into consideration the UTF-8 encoding.
I tracked down the problem to the method oracle.jpub.runtime.dbws.DbwsProxy.dom2SOAP(org.w3c.dom.Node, java.util.Hashtable); and more specifically to the calls of oracle.j2ee.ws.saaj.soap.soap11.SOAPFactory11.
My question is: is there a way to make the SOAP encoder use the correct character encoding?
Thanks a lot in advance!
Greetings,
DimitarI found a workaround of the problem:
v_response := XMLType(v_response.getBlobVal(NLS_CHARSET_ID('CHAR_CS')), NLS_CHARSET_ID('AL32UTF8'));Ugly, but I'm tired of decompiling and debugging Java classes ;)
Greetings,
Dimitar
Maybe you are looking for
-
General upgrading query of iMac G5 1.6ghz 17" OS 10.4.11
hi ive just purchased my first iMac as above for my son as his 1st desktop. im in the process of upgrading the RAM to 2gb and looking into purchasing Leopard 10.5. could anyone advise whether there are any other 'home' upgrades i can carry out to max
-
Bug between JRockit and X11 forwarding via ssh
I have encountered what appears to be a bug in the interaction of JRockit with X11 ssh forwarding. When running any Java GUI application on a remote machine using X11 forwarding via ssh, a variety of problems occur. For example: --- cut here --- % mi
-
How to get the NT user id and passwd
Hi, How to get the NT user id and passwd using form 6i
-
Closing/Killing a long runing Query gracefully using PLSQL
I have dynamic query which I want to kill programatically in PLSQL if its runing for more then 5 mins and return NULL result set. Any advise how best in can do it. I read about dbms_alert but not sure that is the correct way. Any inputs on this Thank
-
Can anyone recommend their preferred Application to do organizing on their iPad and then sync their photos back to iPhoto. I am having a hard time finding an app that does this. To be more specific - I'm less concerned about "editing" on the iPad - m