Character encoding format in j2me

Hi,
I am new to this forum.
I have some queries regarding Character encoding support in j2me.
1. What is the character encoding format supported in j2me?
2. Is it varies from device to device or same in all j2me devices?
3. Whether all j2me devices support UTF-8 format?
4. Are some devices sopport UTF-16 (or UTF-16BE/UTF-16LE)?
5. If a device supports UTF-8 scheme, can we assume that it will support all the languages(i am somehow concerned for Chineese)?
Eagerly waiting for the feedbacks :)

Not all devices has support for even UTF-8.
Use this class I wrote for my projects. This is implementation of Reader so you may easy use it anywhere.
* Created on 12.03.2008 14:21
package misc;
import java.io.IOException;
import java.io.InputStream;
import java.io.Reader;
* Reader of UTF-8 encoded string from bytes
* @author RFK
public class UTF8InputStreamReader extends Reader{
  InputStream is;
  /** Creates a new instance of UTF8InputStreamReader */
  public UTF8InputStreamReader(InputStream is) {
    this.is=is;
  public int read(char[] cbuf, int off, int len) throws IOException {
    int b,b2,b3, r=0;
    while (len>0){
      b = is.read();
      if (b<0) break;
      if (b<128)
        cbuf[off]= (char)b;
      else if (b<224){
        b2 = is.read();
        cbuf[off]= (char)(((b& 0x1F) << 6) | (b2 & 0x3F));
      } else {
        b2 = is.read();
        b3 = is.read();
        cbuf[off]=  (char)(((b & 0x0F) << 12) | ((b2 & 0x3F) << 6) | (b3 & 0x3F));
      r++;
      off++;
      len--;
    return r;
  public void close() throws IOException {
    is.close();
}Edited by: MagDel on Jun 1, 2009 10:33 PM

Similar Messages

  • File character encoding format conversion

    Hi People,
    uname -a
    Linux abcd.us.com 2.6.32-400.21.1.el5uek #1 SMP Wed Feb 20 01:35:01 PST 2013 x86_64 x86_64 x86_64 GNU/Linux
    I am trying to convert a file abcd.dat
    # file abcd.dat
    abcd.dat: binary Computer Graphics Metafile
    #file -i abcd.dat
    abcd.dat: application/octet-stream
    I've tried dos2unix in vain.
    dos2unix: converting file abcd.dat to UNIX format ...
    dos2unix: converting file abc.dat to UNIX format ...
    dos2unix: problems converting file abc.dat
    I've used iconv successfully earlier with this command
    iconv -f UTF-16 -t UTF-8  abcd.dat > abcd.abc
    only, this time I do not know the "from" format of the file,
    #iconv -l
    The following list contain all the coded character sets known.  This does
    not necessarily mean that all combinations of these names can be used for
    the FROM and TO command line parameters.  One coded character set can be
    listed with several different names (aliases).
      437, 500, 500V1, 850, 851, 852, 855, 856, 857, 860, 861, 862, 863, 864, 865,
      866, 866NAV, 869, 874, 904, 1026, 1046, 1047, 8859_1, 8859_2, 8859_3, 8859_4,
      8859_5, 8859_6, 8859_7, 8859_8, 8859_9, 10646-1:1993, 10646-1:1993/UCS4,
      ANSI_X3.4-1968, ANSI_X3.4-1986, ANSI_X3.4, ANSI_X3.110-1983, ANSI_X3.110,
      ARABIC, ARABIC7, ARMSCII-8, ASCII, ASMO-708, ASMO_449, BALTIC, BIG-5,
      BIG-FIVE, BIG5-HKSCS, BIG5, BIG5HKSCS, BIGFIVE, BS_4730, CA, CN-BIG5, CN-GB,
      CN, CP-AR, CP-GR, CP-HU, CP037, CP038, CP273, CP274, CP275, CP278, CP280,
      CP281, CP282, CP284, CP285, CP290, CP297, CP367, CP420, CP423, CP424, CP437,
      CP500, CP737, CP775, CP803, CP813, CP819, CP850, CP851, CP852, CP855, CP856,
      CP857, CP860, CP861, CP862, CP863, CP864, CP865, CP866, CP866NAV, CP868,
      CP869, CP870, CP871, CP874, CP875, CP880, CP891, CP901, CP902, CP903, CP904,
      CP905, CP912, CP915, CP916, CP918, CP920, CP921, CP922, CP930, CP932, CP933,
      CP935, CP936, CP937, CP939, CP949, CP950, CP1004, CP1008, CP1025, CP1026,
      CP1046, CP1047, CP1070, CP1079, CP1081, CP1084, CP1089, CP1097, CP1112,
      CP1122, CP1123, CP1124, CP1125, CP1129, CP1130, CP1132, CP1133, CP1137,
      CP1140, CP1141, CP1142, CP1143, CP1144, CP1145, CP1146, CP1147, CP1148,
      CP1149, CP1153, CP1154, CP1155, CP1156, CP1157, CP1158, CP1160, CP1161,
      CP1162, CP1163, CP1164, CP1166, CP1167, CP1250, CP1251, CP1252, CP1253,
      CP1254, CP1255, CP1256, CP1257, CP1258, CP1361, CP1364, CP1371, CP1388,
      CP1390, CP1399, CP4517, CP4899, CP4909, CP4971, CP5347, CP9030, CP9066,
      CP9448, CP10007, CP12712, CP16804, CPIBM861, CSA7-1, CSA7-2, CSASCII,
      CSA_T500-1983, CSA_T500, CSA_Z243.4-1985-1, CSA_Z243.4-1985-2,
      CSA_Z243.419851, CSA_Z243.419852, CSDECMCS, CSEBCDICATDE, CSEBCDICATDEA,
      CSEBCDICCAFR, CSEBCDICDKNO, CSEBCDICDKNOA, CSEBCDICES, CSEBCDICESA,
      CSEBCDICESS, CSEBCDICFISE, CSEBCDICFISEA, CSEBCDICFR, CSEBCDICIT, CSEBCDICPT,
      CSEBCDICUK, CSEBCDICUS, CSEUCKR, CSEUCPKDFMTJAPANESE, CSGB2312, CSHPROMAN8,
      CSIBM037, CSIBM038, CSIBM273, CSIBM274, CSIBM275, CSIBM277, CSIBM278,
      CSIBM280, CSIBM281, CSIBM284, CSIBM285, CSIBM290, CSIBM297, CSIBM420,
      CSIBM423, CSIBM424, CSIBM500, CSIBM803, CSIBM851, CSIBM855, CSIBM856,
      CSIBM857, CSIBM860, CSIBM863, CSIBM864, CSIBM865, CSIBM866, CSIBM868,
      CSIBM869, CSIBM870, CSIBM871, CSIBM880, CSIBM891, CSIBM901, CSIBM902,
      CSIBM903, CSIBM904, CSIBM905, CSIBM918, CSIBM921, CSIBM922, CSIBM930,
      CSIBM932, CSIBM933, CSIBM935, CSIBM937, CSIBM939, CSIBM943, CSIBM1008,
      CSIBM1025, CSIBM1026, CSIBM1097, CSIBM1112, CSIBM1122, CSIBM1123, CSIBM1124,
      CSIBM1129, CSIBM1130, CSIBM1132, CSIBM1133, CSIBM1137, CSIBM1140, CSIBM1141,
      CSIBM1142, CSIBM1143, CSIBM1144, CSIBM1145, CSIBM1146, CSIBM1147, CSIBM1148,
      CSIBM1149, CSIBM1153, CSIBM1154, CSIBM1155, CSIBM1156, CSIBM1157, CSIBM1158,
      CSIBM1160, CSIBM1161, CSIBM1163, CSIBM1164, CSIBM1166, CSIBM1167, CSIBM1364,
      CSIBM1371, CSIBM1388, CSIBM1390, CSIBM1399, CSIBM4517, CSIBM4899, CSIBM4909,
      CSIBM4971, CSIBM5347, CSIBM9030, CSIBM9066, CSIBM9448, CSIBM12712,
      CSIBM16804, CSIBM11621162, CSISO4UNITEDKINGDOM, CSISO10SWEDISH,
      CSISO11SWEDISHFORNAMES, CSISO14JISC6220RO, CSISO15ITALIAN, CSISO16PORTUGESE,
      CSISO17SPANISH, CSISO18GREEK7OLD, CSISO19LATINGREEK, CSISO21GERMAN,
      CSISO25FRENCH, CSISO27LATINGREEK1, CSISO49INIS, CSISO50INIS8,
      CSISO51INISCYRILLIC, CSISO58GB1988, CSISO60DANISHNORWEGIAN,
      CSISO60NORWEGIAN1, CSISO61NORWEGIAN2, CSISO69FRENCH, CSISO84PORTUGUESE2,
      CSISO85SPANISH2, CSISO86HUNGARIAN, CSISO88GREEK7, CSISO89ASMO449, CSISO90,
      CSISO92JISC62991984B, CSISO99NAPLPS, CSISO103T618BIT, CSISO111ECMACYRILLIC,
      CSISO121CANADIAN1, CSISO122CANADIAN2, CSISO139CSN369103, CSISO141JUSIB1002,
      CSISO143IECP271, CSISO150, CSISO150GREEKCCITT, CSISO151CUBA,
      CSISO153GOST1976874, CSISO646DANISH, CSISO2022CN, CSISO2022JP, CSISO2022JP2,
      CSISO2022KR, CSISO2033, CSISO5427CYRILLIC, CSISO5427CYRILLIC1981,
      CSISO5428GREEK, CSISO10367BOX, CSISOLATIN1, CSISOLATIN2, CSISOLATIN3,
      CSISOLATIN4, CSISOLATIN5, CSISOLATIN6, CSISOLATINARABIC, CSISOLATINCYRILLIC,
      CSISOLATINGREEK, CSISOLATINHEBREW, CSKOI8R, CSKSC5636, CSMACINTOSH,
      CSNATSDANO, CSNATSSEFI, CSN_369103, CSPC8CODEPAGE437, CSPC775BALTIC,
      CSPC850MULTILINGUAL, CSPC862LATINHEBREW, CSPCP852, CSSHIFTJIS, CSUCS4,
      CSUNICODE, CSWINDOWS31J, CUBA, CWI-2, CWI, CYRILLIC, DE, DEC-MCS, DEC,
      DECMCS, DIN_66003, DK, DS2089, DS_2089, E13B, EBCDIC-AT-DE-A, EBCDIC-AT-DE,
      EBCDIC-BE, EBCDIC-BR, EBCDIC-CA-FR, EBCDIC-CP-AR1, EBCDIC-CP-AR2,
      EBCDIC-CP-BE, EBCDIC-CP-CA, EBCDIC-CP-CH, EBCDIC-CP-DK, EBCDIC-CP-ES,
      EBCDIC-CP-FI, EBCDIC-CP-FR, EBCDIC-CP-GB, EBCDIC-CP-GR, EBCDIC-CP-HE,
      EBCDIC-CP-IS, EBCDIC-CP-IT, EBCDIC-CP-NL, EBCDIC-CP-NO, EBCDIC-CP-ROECE,
      EBCDIC-CP-SE, EBCDIC-CP-TR, EBCDIC-CP-US, EBCDIC-CP-WT, EBCDIC-CP-YU,
      EBCDIC-CYRILLIC, EBCDIC-DK-NO-A, EBCDIC-DK-NO, EBCDIC-ES-A, EBCDIC-ES-S,
      EBCDIC-ES, EBCDIC-FI-SE-A, EBCDIC-FI-SE, EBCDIC-FR, EBCDIC-GREEK, EBCDIC-INT,
      EBCDIC-INT1, EBCDIC-IS-FRISS, EBCDIC-IT, EBCDIC-JP-E, EBCDIC-JP-KANA,
      EBCDIC-PT, EBCDIC-UK, EBCDIC-US, EBCDICATDE, EBCDICATDEA, EBCDICCAFR,
      EBCDICDKNO, EBCDICDKNOA, EBCDICES, EBCDICESA, EBCDICESS, EBCDICFISE,
      EBCDICFISEA, EBCDICFR, EBCDICISFRISS, EBCDICIT, EBCDICPT, EBCDICUK, EBCDICUS,
      ECMA-114, ECMA-118, ECMA-128, ECMA-CYRILLIC, ECMACYRILLIC, ELOT_928, ES, ES2,
      EUC-CN, EUC-JISX0213, EUC-JP-MS, EUC-JP, EUC-KR, EUC-TW, EUCCN, EUCJP-MS,
      EUCJP-OPEN, EUCJP-WIN, EUCJP, EUCKR, EUCTW, FI, FR, GB, GB2312, GB13000,
      GB18030, GBK, GB_1988-80, GB_198880, GEORGIAN-ACADEMY, GEORGIAN-PS,
      GOST_19768-74, GOST_19768, GOST_1976874, GREEK-CCITT, GREEK, GREEK7-OLD,
      GREEK7, GREEK7OLD, GREEK8, GREEKCCITT, HEBREW, HP-ROMAN8, HPROMAN8, HU,
      IBM-803, IBM-856, IBM-901, IBM-902, IBM-921, IBM-922, IBM-930, IBM-932,
      IBM-933, IBM-935, IBM-937, IBM-939, IBM-943, IBM-1008, IBM-1025, IBM-1046,
      IBM-1047, IBM-1097, IBM-1112, IBM-1122, IBM-1123, IBM-1124, IBM-1129,
      IBM-1130, IBM-1132, IBM-1133, IBM-1137, IBM-1140, IBM-1141, IBM-1142,
      IBM-1143, IBM-1144, IBM-1145, IBM-1146, IBM-1147, IBM-1148, IBM-1149,
      IBM-1153, IBM-1154, IBM-1155, IBM-1156, IBM-1157, IBM-1158, IBM-1160,
      IBM-1161, IBM-1162, IBM-1163, IBM-1164, IBM-1166, IBM-1167, IBM-1364,
      IBM-1371, IBM-1388, IBM-1390, IBM-1399, IBM-4517, IBM-4899, IBM-4909,
      IBM-4971, IBM-5347, IBM-9030, IBM-9066, IBM-9448, IBM-12712, IBM-16804,
      IBM037, IBM038, IBM256, IBM273, IBM274, IBM275, IBM277, IBM278, IBM280,
      IBM281, IBM284, IBM285, IBM290, IBM297, IBM367, IBM420, IBM423, IBM424,
      IBM437, IBM500, IBM775, IBM803, IBM813, IBM819, IBM848, IBM850, IBM851,
      IBM852, IBM855, IBM856, IBM857, IBM860, IBM861, IBM862, IBM863, IBM864,
      IBM865, IBM866, IBM866NAV, IBM868, IBM869, IBM870, IBM871, IBM874, IBM875,
      IBM880, IBM891, IBM901, IBM902, IBM903, IBM904, IBM905, IBM912, IBM915,
      IBM916, IBM918, IBM920, IBM921, IBM922, IBM930, IBM932, IBM933, IBM935,
      IBM937, IBM939, IBM943, IBM1004, IBM1008, IBM1025, IBM1026, IBM1046, IBM1047,
      IBM1089, IBM1097, IBM1112, IBM1122, IBM1123, IBM1124, IBM1129, IBM1130,
      IBM1132, IBM1133, IBM1137, IBM1140, IBM1141, IBM1142, IBM1143, IBM1144,
      IBM1145, IBM1146, IBM1147, IBM1148, IBM1149, IBM1153, IBM1154, IBM1155,
      IBM1156, IBM1157, IBM1158, IBM1160, IBM1161, IBM1162, IBM1163, IBM1164,
      IBM1166, IBM1167, IBM1364, IBM1371, IBM1388, IBM1390, IBM1399, IBM4517,
      IBM4899, IBM4909, IBM4971, IBM5347, IBM9030, IBM9066, IBM9448, IBM12712,
      IBM16804, IEC_P27-1, IEC_P271, INIS-8, INIS-CYRILLIC, INIS, INIS8,
      INISCYRILLIC, ISIRI-3342, ISIRI3342, ISO-2022-CN-EXT, ISO-2022-CN,
      ISO-2022-JP-2, ISO-2022-JP-3, ISO-2022-JP, ISO-2022-KR, ISO-8859-1,
      ISO-8859-2, ISO-8859-3, ISO-8859-4, ISO-8859-5, ISO-8859-6, ISO-8859-7,
      ISO-8859-8, ISO-8859-9, ISO-8859-10, ISO-8859-11, ISO-8859-13, ISO-8859-14,
      ISO-8859-15, ISO-8859-16, ISO-10646, ISO-10646/UCS2, ISO-10646/UCS4,
      ISO-10646/UTF-8, ISO-10646/UTF8, ISO-CELTIC, ISO-IR-4, ISO-IR-6, ISO-IR-8-1,
      ISO-IR-9-1, ISO-IR-10, ISO-IR-11, ISO-IR-14, ISO-IR-15, ISO-IR-16, ISO-IR-17,
      ISO-IR-18, ISO-IR-19, ISO-IR-21, ISO-IR-25, ISO-IR-27, ISO-IR-37, ISO-IR-49,
      ISO-IR-50, ISO-IR-51, ISO-IR-54, ISO-IR-55, ISO-IR-57, ISO-IR-60, ISO-IR-61,
      ISO-IR-69, ISO-IR-84, ISO-IR-85, ISO-IR-86, ISO-IR-88, ISO-IR-89, ISO-IR-90,
      ISO-IR-92, ISO-IR-98, ISO-IR-99, ISO-IR-100, ISO-IR-101, ISO-IR-103,
      ISO-IR-109, ISO-IR-110, ISO-IR-111, ISO-IR-121, ISO-IR-122, ISO-IR-126,
      ISO-IR-127, ISO-IR-138, ISO-IR-139, ISO-IR-141, ISO-IR-143, ISO-IR-144,
      ISO-IR-148, ISO-IR-150, ISO-IR-151, ISO-IR-153, ISO-IR-155, ISO-IR-156,
      ISO-IR-157, ISO-IR-166, ISO-IR-179, ISO-IR-193, ISO-IR-197, ISO-IR-199,
      ISO-IR-203, ISO-IR-209, ISO-IR-226, ISO/TR_11548-1, ISO646-CA, ISO646-CA2,
      ISO646-CN, ISO646-CU, ISO646-DE, ISO646-DK, ISO646-ES, ISO646-ES2, ISO646-FI,
      ISO646-FR, ISO646-FR1, ISO646-GB, ISO646-HU, ISO646-IT, ISO646-JP-OCR-B,
      ISO646-JP, ISO646-KR, ISO646-NO, ISO646-NO2, ISO646-PT, ISO646-PT2,
      ISO646-SE, ISO646-SE2, ISO646-US, ISO646-YU, ISO2022CN, ISO2022CNEXT,
      ISO2022JP, ISO2022JP2, ISO2022KR, ISO6937, ISO8859-1, ISO8859-2, ISO8859-3,
      ISO8859-4, ISO8859-5, ISO8859-6, ISO8859-7, ISO8859-8, ISO8859-9, ISO8859-10,
      ISO8859-11, ISO8859-13, ISO8859-14, ISO8859-15, ISO8859-16, ISO11548-1,
      ISO88591, ISO88592, ISO88593, ISO88594, ISO88595, ISO88596, ISO88597,
      ISO88598, ISO88599, ISO885910, ISO885911, ISO885913, ISO885914, ISO885915,
      ISO885916, ISO_646.IRV:1991, ISO_2033-1983, ISO_2033, ISO_5427-EXT, ISO_5427,
      ISO_5427:1981, ISO_5427EXT, ISO_5428, ISO_5428:1980, ISO_6937-2,
      ISO_6937-2:1983, ISO_6937, ISO_6937:1992, ISO_8859-1, ISO_8859-1:1987,
      ISO_8859-2, ISO_8859-2:1987, ISO_8859-3, ISO_8859-3:1988, ISO_8859-4,
      ISO_8859-4:1988, ISO_8859-5, ISO_8859-5:1988, ISO_8859-6, ISO_8859-6:1987,
      ISO_8859-7, ISO_8859-7:1987, ISO_8859-7:2003, ISO_8859-8, ISO_8859-8:1988,
      ISO_8859-9, ISO_8859-9:1989, ISO_8859-10, ISO_8859-10:1992, ISO_8859-14,
      ISO_8859-14:1998, ISO_8859-15, ISO_8859-15:1998, ISO_8859-16,
      ISO_8859-16:2001, ISO_9036, ISO_10367-BOX, ISO_10367BOX, ISO_11548-1,
      ISO_69372, IT, JIS_C6220-1969-RO, JIS_C6229-1984-B, JIS_C62201969RO,
      JIS_C62291984B, JOHAB, JP-OCR-B, JP, JS, JUS_I.B1.002, KOI-7, KOI-8, KOI8-R,
      KOI8-T, KOI8-U, KOI8, KOI8R, KOI8U, KSC5636, L1, L2, L3, L4, L5, L6, L7, L8,
      L10, LATIN-9, LATIN-GREEK-1, LATIN-GREEK, LATIN1, LATIN2, LATIN3, LATIN4,
      LATIN5, LATIN6, LATIN7, LATIN8, LATIN10, LATINGREEK, LATINGREEK1,
      MAC-CYRILLIC, MAC-IS, MAC-SAMI, MAC-UK, MAC, MACCYRILLIC, MACINTOSH, MACIS,
      MACUK, MACUKRAINIAN, MIK, MS-ANSI, MS-ARAB, MS-CYRL, MS-EE, MS-GREEK,
      MS-HEBR, MS-MAC-CYRILLIC, MS-TURK, MS932, MS936, MSCP949, MSCP1361,
      MSMACCYRILLIC, MSZ_7795.3, MS_KANJI, NAPLPS, NATS-DANO, NATS-SEFI, NATSDANO,
      NATSSEFI, NC_NC0010, NC_NC00-10, NC_NC00-10:81, NF_Z_62-010,
      NF_Z_62-010_(1973), NF_Z_62-010_1973, NF_Z_62010, NF_Z_62010_1973, NO, NO2,
      NS_4551-1, NS_4551-2, NS_45511, NS_45512, OS2LATIN1, OSF00010001,
      OSF00010002, OSF00010003, OSF00010004, OSF00010005, OSF00010006, OSF00010007,
      OSF00010008, OSF00010009, OSF0001000A, OSF00010020, OSF00010100, OSF00010101,
      OSF00010102, OSF00010104, OSF00010105, OSF00010106, OSF00030010, OSF0004000A,
      OSF0005000A, OSF05010001, OSF100201A4, OSF100201A8, OSF100201B5, OSF100201F4,
      OSF100203B5, OSF1002011C, OSF1002011D, OSF1002035D, OSF1002035E, OSF1002035F,
      OSF1002036B, OSF1002037B, OSF10010001, OSF10020025, OSF10020111, OSF10020115,
      OSF10020116, OSF10020118, OSF10020122, OSF10020129, OSF10020352, OSF10020354,
      OSF10020357, OSF10020359, OSF10020360, OSF10020364, OSF10020365, OSF10020366,
      OSF10020367, OSF10020370, OSF10020387, OSF10020388, OSF10020396, OSF10020402,
      OSF10020417, PT, PT2, PT154, R8, RK1048, ROMAN8, RUSCII, SE, SE2,
      SEN_850200_B, SEN_850200_C, SHIFT-JIS, SHIFT_JIS, SHIFT_JISX0213, SJIS-OPEN,
      SJIS-WIN, SJIS, SS636127, STRK1048-2002, ST_SEV_358-88, T.61-8BIT, T.61,
      T.618BIT, TCVN-5712, TCVN, TCVN5712-1, TCVN5712-1:1993, TIS-620, TIS620-0,
      TIS620.2529-1, TIS620.2533-0, TIS620, TS-5881, TSCII, UCS-2, UCS-2BE,
      UCS-2LE, UCS-4, UCS-4BE, UCS-4LE, UCS2, UCS4, UHC, UJIS, UK, UNICODE,
      UNICODEBIG, UNICODELITTLE, US-ASCII, US, UTF-7, UTF-8, UTF-16, UTF-16BE,
      UTF-16LE, UTF-32, UTF-32BE, UTF-32LE, UTF7, UTF8, UTF16, UTF16BE, UTF16LE,
      UTF32, UTF32BE, UTF32LE, VISCII, WCHAR_T, WIN-SAMI-2, WINBALTRIM,
      WINDOWS-31J, WINDOWS-874, WINDOWS-936, WINDOWS-1250, WINDOWS-1251,
      WINDOWS-1252, WINDOWS-1253, WINDOWS-1254, WINDOWS-1255, WINDOWS-1256,
      WINDOWS-1257, WINDOWS-1258, WINSAMI2, WS2, YU
    ==================================================
    also,
    #which od
    /usr/bin/od
    but I don't know how to use it.
    ==================================
    #cat -v abcd.dat
    has a lot of ^@
    ===================================
    #echo $LANG
    en_US.UTF-8
    ======================================================================================
    #hexdump -C abcd.dat|head -5
    00000000  00 22 00 34 00 36 00 32  00 39 00 33 00 22 00 7c  |.".4.6.2.9.3.".||
    00000010  00 22 00 32 00 30 00 31  00 33 00 2d 00 31 00 31  |.".2.0.1.3.-.1.1|
    00000020  00 2d 00 31 00 38 00 20  00 30 00 38 00 3a 00 30  |.-.1.8. .0.8.:.0|
    00000030  00 39 00 3a 00 34 00 38  00 22 00 7c 00 22 00 33  |.9.:.4.8.".|.".3|
    00000040  00 36 00 37 00 22 00 7c  00 22 00 53 00 75 00 73  |.6.7.".|.".S.u.s|
    =======================================================================================
    #vi abcd.tst
    testing
    esc:wq
    #file abcd.tst
    abcd.tst: ASCII text
    Let me know the complete iconv command with from-and-to encoding.
    Appreciate any help.

    Hi BalusC,
    as we write in jsp page as <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    is their something we can write in .properties file
    russian words are not correctly displayed in browser......how we can dispaly it in correct format...??
    i have all russian words in my .properties file
    Thanks a lot

  • What every developer should know about character encoding

    This was originally posted (with better formatting) at Moderator edit: link removed/what-every-developer-should-know-about-character-encoding.html. I'm posting because lots of people trip over this.
    If you write code that touches a text file, you probably need this.
    Lets start off with two key items
    1.Unicode does not solve this issue for us (yet).
    2.Every text file is encoded. There is no such thing as an unencoded file or a "general" encoding.
    And lets add a codacil to this – most Americans can get by without having to take this in to account – most of the time. Because the characters for the first 127 bytes in the vast majority of encoding schemes map to the same set of characters (more accurately called glyphs). And because we only use A-Z without any other characters, accents, etc. – we're good to go. But the second you use those same assumptions in an HTML or XML file that has characters outside the first 127 – then the trouble starts.
    The computer industry started with diskspace and memory at a premium. Anyone who suggested using 2 bytes for each character instead of one would have been laughed at. In fact we're lucky that the byte worked best as 8 bits or we might have had fewer than 256 bits for each character. There of course were numerous charactersets (or codepages) developed early on. But we ended up with most everyone using a standard set of codepages where the first 127 bytes were identical on all and the second were unique to each set. There were sets for America/Western Europe, Central Europe, Russia, etc.
    And then for Asia, because 256 characters were not enough, some of the range 128 – 255 had what was called DBCS (double byte character sets). For each value of a first byte (in these higher ranges), the second byte then identified one of 256 characters. This gave a total of 128 * 256 additional characters. It was a hack, but it kept memory use to a minimum. Chinese, Japanese, and Korean each have their own DBCS codepage.
    And for awhile this worked well. Operating systems, applications, etc. mostly were set to use a specified code page. But then the internet came along. A website in America using an XML file from Greece to display data to a user browsing in Russia, where each is entering data based on their country – that broke the paradigm.
    Fast forward to today. The two file formats where we can explain this the best, and where everyone trips over it, is HTML and XML. Every HTML and XML file can optionally have the character encoding set in it's header metadata. If it's not set, then most programs assume it is UTF-8, but that is not a standard and not universally followed. If the encoding is not specified and the program reading the file guess wrong – the file will be misread.
    Point 1 – Never treat specifying the encoding as optional when writing a file. Always write it to the file. Always. Even if you are willing to swear that the file will never have characters out of the range 1 – 127.
    Now lets' look at UTF-8 because as the standard and the way it works, it gets people into a lot of trouble. UTF-8 was popular for two reasons. First it matched the standard codepages for the first 127 characters and so most existing HTML and XML would match it. Second, it was designed to use as few bytes as possible which mattered a lot back when it was designed and many people were still using dial-up modems.
    UTF-8 borrowed from the DBCS designs from the Asian codepages. The first 128 bytes are all single byte representations of characters. Then for the next most common set, it uses a block in the second 128 bytes to be a double byte sequence giving us more characters. But wait, there's more. For the less common there's a first byte which leads to a sersies of second bytes. Those then each lead to a third byte and those three bytes define the character. This goes up to 6 byte sequences. Using the MBCS (multi-byte character set) you can write the equivilent of every unicode character. And assuming what you are writing is not a list of seldom used Chinese characters, do it in fewer bytes.
    But here is what everyone trips over – they have an HTML or XML file, it works fine, and they open it up in a text editor. They then add a character that in their text editor, using the codepage for their region, insert a character like ß and save the file. Of course it must be correct – their text editor shows it correctly. But feed it to any program that reads according to the encoding and that is now the first character fo a 2 byte sequence. You either get a different character or if the second byte is not a legal value for that first byte – an error.
    Point 2 – Always create HTML and XML in a program that writes it out correctly using the encode. If you must create with a text editor, then view the final file in a browser.
    Now, what about when the code you are writing will read or write a file? We are not talking binary/data files where you write it out in your own format, but files that are considered text files. Java, .NET, etc all have character encoders. The purpose of these encoders is to translate between a sequence of bytes (the file) and the characters they represent. Lets take what is actually a very difficlut example – your source code, be it C#, Java, etc. These are still by and large "plain old text files" with no encoding hints. So how do programs handle them? Many assume they use the local code page. Many others assume that all characters will be in the range 0 – 127 and will choke on anything else.
    Here's a key point about these text files – every program is still using an encoding. It may not be setting it in code, but by definition an encoding is being used.
    Point 3 – Always set the encoding when you read and write text files. Not just for HTML & XML, but even for files like source code. It's fine if you set it to use the default codepage, but set the encoding.
    Point 4 – Use the most complete encoder possible. You can write your own XML as a text file encoded for UTF-8. But if you write it using an XML encoder, then it will include the encoding in the meta data and you can't get it wrong. (it also adds the endian preamble to the file.)
    Ok, you're reading & writing files correctly but what about inside your code. What there? This is where it's easy – unicode. That's what those encoders created in the Java & .NET runtime are designed to do. You read in and get unicode. You write unicode and get an encoded file. That's why the char type is 16 bits and is a unique core type that is for characters. This you probably have right because languages today don't give you much choice in the matter.
    Point 5 – (For developers on languages that have been around awhile) – Always use unicode internally. In C++ this is called wide chars (or something similar). Don't get clever to save a couple of bytes, memory is cheap and you have more important things to do.
    Wrapping it up
    I think there are two key items to keep in mind here. First, make sure you are taking the encoding in to account on text files. Second, this is actually all very easy and straightforward. People rarely screw up how to use an encoding, it's when they ignore the issue that they get in to trouble.
    Edited by: Darryl Burke -- link removed

    DavidThi808 wrote:
    This was originally posted (with better formatting) at Moderator edit: link removed/what-every-developer-should-know-about-character-encoding.html. I'm posting because lots of people trip over this.
    If you write code that touches a text file, you probably need this.
    Lets start off with two key items
    1.Unicode does not solve this issue for us (yet).
    2.Every text file is encoded. There is no such thing as an unencoded file or a "general" encoding.
    And lets add a codacil to this – most Americans can get by without having to take this in to account – most of the time. Because the characters for the first 127 bytes in the vast majority of encoding schemes map to the same set of characters (more accurately called glyphs). And because we only use A-Z without any other characters, accents, etc. – we're good to go. But the second you use those same assumptions in an HTML or XML file that has characters outside the first 127 – then the trouble starts. Pretty sure most Americans do not use character sets that only have a range of 0-127. I don't think I have every used a desktop OS that did. I might have used some big iron boxes before that but at that time I wasn't even aware that character sets existed.
    They might only use that range but that is a different issue, especially since that range is exactly the same as the UTF8 character set anyways.
    >
    The computer industry started with diskspace and memory at a premium. Anyone who suggested using 2 bytes for each character instead of one would have been laughed at. In fact we're lucky that the byte worked best as 8 bits or we might have had fewer than 256 bits for each character. There of course were numerous charactersets (or codepages) developed early on. But we ended up with most everyone using a standard set of codepages where the first 127 bytes were identical on all and the second were unique to each set. There were sets for America/Western Europe, Central Europe, Russia, etc.
    And then for Asia, because 256 characters were not enough, some of the range 128 – 255 had what was called DBCS (double byte character sets). For each value of a first byte (in these higher ranges), the second byte then identified one of 256 characters. This gave a total of 128 * 256 additional characters. It was a hack, but it kept memory use to a minimum. Chinese, Japanese, and Korean each have their own DBCS codepage.
    And for awhile this worked well. Operating systems, applications, etc. mostly were set to use a specified code page. But then the internet came along. A website in America using an XML file from Greece to display data to a user browsing in Russia, where each is entering data based on their country – that broke the paradigm.
    The above is only true for small volume sets. If I am targeting a processing rate of 2000 txns/sec with a requirement to hold data active for seven years then a column with a size of 8 bytes is significantly different than one with 16 bytes.
    Fast forward to today. The two file formats where we can explain this the best, and where everyone trips over it, is HTML and XML. Every HTML and XML file can optionally have the character encoding set in it's header metadata. If it's not set, then most programs assume it is UTF-8, but that is not a standard and not universally followed. If the encoding is not specified and the program reading the file guess wrong – the file will be misread.
    The above is out of place. It would be best to address this as part of Point 1.
    Point 1 – Never treat specifying the encoding as optional when writing a file. Always write it to the file. Always. Even if you are willing to swear that the file will never have characters out of the range 1 – 127.
    Now lets' look at UTF-8 because as the standard and the way it works, it gets people into a lot of trouble. UTF-8 was popular for two reasons. First it matched the standard codepages for the first 127 characters and so most existing HTML and XML would match it. Second, it was designed to use as few bytes as possible which mattered a lot back when it was designed and many people were still using dial-up modems.
    UTF-8 borrowed from the DBCS designs from the Asian codepages. The first 128 bytes are all single byte representations of characters. Then for the next most common set, it uses a block in the second 128 bytes to be a double byte sequence giving us more characters. But wait, there's more. For the less common there's a first byte which leads to a sersies of second bytes. Those then each lead to a third byte and those three bytes define the character. This goes up to 6 byte sequences. Using the MBCS (multi-byte character set) you can write the equivilent of every unicode character. And assuming what you are writing is not a list of seldom used Chinese characters, do it in fewer bytes.
    The first part of that paragraph is odd. The first 128 characters of unicode, all unicode, is based on ASCII. The representational format of UTF8 is required to implement unicode, thus it must represent those characters. It uses the idiom supported by variable width encodings to do that.
    But here is what everyone trips over – they have an HTML or XML file, it works fine, and they open it up in a text editor. They then add a character that in their text editor, using the codepage for their region, insert a character like ß and save the file. Of course it must be correct – their text editor shows it correctly. But feed it to any program that reads according to the encoding and that is now the first character fo a 2 byte sequence. You either get a different character or if the second byte is not a legal value for that first byte – an error.
    Not sure what you are saying here. If a file is supposed to be in one encoding and you insert invalid characters into it then it invalid. End of story. It has nothing to do with html/xml.
    Point 2 – Always create HTML and XML in a program that writes it out correctly using the encode. If you must create with a text editor, then view the final file in a browser.
    The browser still needs to support the encoding.
    Now, what about when the code you are writing will read or write a file? We are not talking binary/data files where you write it out in your own format, but files that are considered text files. Java, .NET, etc all have character encoders. The purpose of these encoders is to translate between a sequence of bytes (the file) and the characters they represent. Lets take what is actually a very difficlut example – your source code, be it C#, Java, etc. These are still by and large "plain old text files" with no encoding hints. So how do programs handle them? Many assume they use the local code page. Many others assume that all characters will be in the range 0 – 127 and will choke on anything else.
    I know java files have a default encoding - the specification defines it. And I am certain C# does as well.
    Point 3 – Always set the encoding when you read and write text files. Not just for HTML & XML, but even for files like source code. It's fine if you set it to use the default codepage, but set the encoding.
    It is important to define it. Whether you set it is another matter.
    Point 4 – Use the most complete encoder possible. You can write your own XML as a text file encoded for UTF-8. But if you write it using an XML encoder, then it will include the encoding in the meta data and you can't get it wrong. (it also adds the endian preamble to the file.)
    Ok, you're reading & writing files correctly but what about inside your code. What there? This is where it's easy – unicode. That's what those encoders created in the Java & .NET runtime are designed to do. You read in and get unicode. You write unicode and get an encoded file. That's why the char type is 16 bits and is a unique core type that is for characters. This you probably have right because languages today don't give you much choice in the matter.
    Unicode character escapes are replaced prior to actual code compilation. Thus it is possible to create strings in java with escaped unicode characters which will fail to compile.
    Point 5 – (For developers on languages that have been around awhile) – Always use unicode internally. In C++ this is called wide chars (or something similar). Don't get clever to save a couple of bytes, memory is cheap and you have more important things to do.
    No. A developer should understand the problem domain represented by the requirements and the business and create solutions that appropriate to that. Thus there is absolutely no point for someone that is creating an inventory system for a stand alone store to craft a solution that supports multiple languages.
    And another example is with high volume systems moving/storing bytes is relevant. As such one must carefully consider each text element as to whether it is customer consumable or internally consumable. Saving bytes in such cases will impact the total load of the system. In such systems incremental savings impact operating costs and marketing advantage with speed.

  • How can I know the encoding format for a file.

    I have files encoded in English, Spanish, Japanese etc. I want to know which file has which encoding format while reading.
    Can anyone suggest.
    Ashish

    Language is different from "encoding"...if you mean character encoding. Multiple languages can be represented by a single encoding. The fact that you mix language and encoding in your question confuses me about what you are asking.
    If you want to know what language a file uses, you can always use a meta-tag. Or you can do some kind of text analysis based on dictionary lookups to determine language...too complex for my tastes. I think a simple language tag in the file makes more sense.
    As for character encoding, you either standardize on a single encoding for all files or again use a meta-tag. If you standardize on a single encoding, you should probably consider one of the Unicode encodings instead of any other character set encoding.
    Regards,
    John O'Conner

  • Wrong character encoding from flash to mysql

    Hi, im experiencing problems with character encoding not
    functioning correctly when sending from flash to mysql. What i am
    doing is doing a contact form in flash which then sends the value
    to a php file which takes the values and inserts them into a table.
    As i'm using icelandic charecters i need the char encoding to be
    either latin1 or utf8 in mysql, or at least i think so. But it
    seems that flash or the php document isn't sending in the same
    format as i have selected in mysql because all special icelandic
    characters come scrambled in the mysql table. Firefox tells me
    tough that the html document containing the flash movie is using
    utf-8.

    I don't know anything about Icelandic characters, but Flash
    generally really likes UTF-8. So it should be sending that if that
    is what it is starting with.
    You aren't using any kind of useCodePage? That will mess it
    up.
    Are you sure that the input method is Icelandic?
    In the testing environment can you list variables (from the
    debug menu) and see if they look proper? If they do then Flash is
    readying them correctly and the problem must be coming in further
    down stream.

  • Character Encoding in Crystal Reports

    Hi,
    We are using HTML character encoding to prevent SQL injection, i.e. when a special character like apostrophe (') is entered, it is saved as &#39;. I wanted to know if we could manipulate the data in between the two operations of fetching the data from the DB and displaying the data using Crystal?
    We could use formulae/functions in the design/query, but I am looking for a more generic solution using Crystal API. I have attached a screenshot of how Mc'Donald is displayed in the Report.
    Thanks for the help!
    Noopur

    All you need to do is format the field in CR to use HTML Text Interpretation. However note that CR does not support all HTML tags. For details see the following KBAs:
    1217084 - What are the supported HTML tags and attributes for the HTML Text Interpretation in Crystal Reports?
    1272676 - Not all the standard HTML tags are interpreted in Crystal Reports when using HTML Text Interpretation
    - Ludek
    Senior Support Engineer AGS Product Support, Global Support Center Canada
    Follow us on Twitter

  • How PI handels encoding formats?

    Hello Experts,
    Please help to understand how PI 7.1 treats different encoding formats eg ANSCI ect or how it handels properity chacters coming in messages?
    - Rajan

    Hi,
    in PI in ID in your comm channel, you have a property content encoding where you can specify the characater encoding of your msg being sent or recived by PI.............
    moreover for some adapters you have some module properties in which you can set the character encoding of your msg in PI in ID..........
    Regards,
    Rajeev Gupta

  • Change character encoding?

    Edit > Preferences > New Document > Character
    encoding is set to UTF-8
    However when I edit documents with non-standard extensions
    (I'm working on PHP files with a .ctp extension) the document still
    seems to save in iso-8859-1 format. This problem doesn't seem to
    occur on files with .php/html extensions.
    Does anyone know of a solution to this problem?
    Thanks,
    Emil

    I'm not sure where you are getting %xx encoded UTF-8.... Is it cuz you have it in a GET method form and that's what you are seeing in the browser's location bar? ...
    Let's assume you have a form on a page, and the page's charset is set to UTF-8, and you want to generate a URL encoded string (%xx format, although URLEncoder will not encode ASCII chars that way...).
    In the page processing the form, you need to do this:
    request.setCharacterEncoding("UTF-8"); // makes bytes read as UTF-8 strings(assumes that the form page was properly set to the UTF-8 charset)
    String fieldValue = request.getParameter("fieldName"); // get value
    // the value is now a Unicode String in Java, generated from reading the bytes submitted from the form as UTF-8 encoded text...
    String utf8EncString = URLEncoder.encode(fieldValue, "UTF-8");
    // now utf8EncString is a URL encoded (%xx) string of UTF-8 values
    String euckrEncString = URLEncoder.encode(fieldValue, "EUC-KR");
    // now euckrEncString is a URL encoded (%xx) string of EUC-KR valuesWhat is probably screwing things up for you mostly is this:
    euckrValue = new String(utf8Value.getBytes(), "EUC-KR");
    What this does is takes the bytes of the string utf8Value (which is not really UTF-8... see below) in the local encoding (possibly Cp1252 (Windows) or ISO8895-1 (Linux), or EUC-KR if it's Korean Windows), and then reads them as if they were EUC-KR... which they aren't.
    The key here is that Strings in Java are not of any encoding. They are pure Unicode values. Encodings only matter when converting to or from bytes. The strings stored in a file or sent over the net have to convert to bytes since that's what is stored/sent, just bytes. The encoding defines how the characters can be encoded into 1 or more bytes, and thus reconstructed.

  • Setting Character encoding programmaticaly?

    Hi,
    I am using Sun J2ME wireless toolkit 2.1, and i have a problem with characer encoding. I am receiving text from a .NET web service, and after some processing in the client, i send the string back.
    The problem is, the string i am sending back includes Turkish characters. These are sent as question marks instead of characters.
    I have failed to find a method that changes the character encoding used while making a web service call.
    Actually, i could not see any way to change the encoding overall. For the emulator, property file can be used, but what about the devices i'll be deploying the app? It'd be really great if someone could point me in the right direction.
    Best Regards

    Hi,
    My situation is as follows. I have .NET web services on the server side, and i am using mobile devices as clients. When i get a string from method A in web service , i can display it on the device screen without a problem. after that, if i send the same string that i've received from method A as a parameter to method B, the .NET code receives garbage instead of turkish chars.
    At the moment i am encoding turkish chars at the client side, and decoding them at the .net web server processing code.
    I'd like to try setting the encoding to utf8, but as i have written, i have not seen any way of doing this. Changing properties file for emulator is possible, but how can i do it for the target devices. I have not seen an api call for this purpose in midp or cldc docs. Thanks for your answer
    Regards

  • Why, after all these years, can't Thunderbird auto-detect character encoding

    judging by all the existing messages and complaints about this, not to mention erroneous posts that say the problem is solved when it isn't, I have to conclude Mozilla either doesn't believe this is a problem or doesn't care to fix it. The bottom line is that there is no way to tell Thunderbird to automatically display emails in the character coding format they were written in. I could understand cases where the headers are not properly filled in, but I see tons of emails in which the encoding is plainly there in the headers within the message source. You can force it, but if you do so via the menu VIEW->Character Encoding->UTF8 (for example) it won't "stick" if you view another message. But who would want it to "stick" permanently anyway? What the average user really wants is to be able to toggle VIEW->Character Encoding->Auto Detect from its default "off" to simply "on", and not have to bother with it anymore.
    This is a problem that seems to have gone on forever, and it NEVER happens with other email clients. If there is some backdoor way to actually make autodetect work, I'd appreciate knowing about it. But more important, I think ALL users would appreciate it if it were not some secret "backdoor" setting, but a simple global menu choice for all accounts. Can Mozilla please fix this problem once and for all?

    You said...
    ''Thunderbird is supposed to be using the encoding in the mail.''
    I figured is "should", i'm just reporting that it doesn't
    You said...
    ''Setting auto detect to on disables that.''
    Please explain. I've looked at every setting I can find and there is no way to set auto detect to "ON". I DID try setting it to "universal" in an attempt top solve the problem, but I have since restored it to "off", because the universal setting doesn't help.
    you said...
    ''"Based on your earlier response I assume you need to press the F10 key to see the tools menu you were refered to." ''
    No... I never said that anywhere. I DID refer to Menu->View_>Character Encoding, and I did refer to right clicking on individual folders, to get to the properties dialog, and the general information tab. But F-10 doesn't do anything
    You said...
    ''I have examines dozens of mails in my inbox and each honours the character encoding set in the HTML''
    Well, mine NEVER did. A short example from an email I got today pretty much is exemplative of all mail I get from GMAIL...
    --089e013a0572a067a404fc73ceda
    Content-Type: text/plain; charset=UTF-8
    Ok, very good. Thank you. Phoenix sent you a friend request on Facebook by
    the way. Talk to you soon.
    --089e013a0572a067a404fc73ceda
    Content-Type: text/html; charset=UTF-8
    Content-Transfer-Encoding: quoted-printable
    <p dir=3D"ltr">Ok, very good. Thank you. Phoenix sent you a friend request=
    =C2=A0 on Facebook by the way.=C2=A0 Talk to you soon.</p>
    --089e013a0572a067a404fc73ceda--
    See those incidences pf "=C2=A0"? Each one displays as a strange character, a capitol A with a curved line over it. If I manually set my default encoding to UTF 8, the weird characters go away. If I leave it as Western, there is nothing I can do to tell Thunderbird to "auto detect".
    Anyway, I suppose at this point that no one responsible for the product coding is seriously looking at my issue, which is why its never been solved. If anyone does intend to help track it down and solve it, I'll be happy to provide all the examples and screen shots they ask for. Otherwise.

  • What character encoding standard does tuxedo use

    Hi,
    I am trying to resolve a problem with communication between Tuxedo 6.4 and Vitria.
    It seems that there is a problem with the translation of special characters. Does
    anyone know what encoding standard that Tuxedo uses?
    Thanks.

    Thanks Scott, actually I was asked the following question by Vitria Technical Support,
    can you help?
    "XDR (External Data Representation) is a protocol used by BEA Tuxedo's
    communication engine. XDR handles data format transformations when passing
    messages across dissimilar processor architectures.
    This is not the equivalent of Character Encoding. I specifically need the
    Character Encoding used. I am not sure where your admin needs to check for
    this - it might even be set at the OS level. I suspect that it will be
    something like ISO-8859-1 or some derivative."
    Thanks.
    Scott Orshan <[email protected]> wrote:
    Within a machine, TUXEDO just sends the bytes that you give it. When
    it
    goes between machines, it uses XDR to encode the data values for
    transmission. There is no character set translation going on, unless
    you
    are going to an EBCDIC machine. (If you are using data encryption
    [tpseal] in TUXEDO 7.1 or 8.0 your data may be encoded even if it stays
    on the same machine type.)
         Scott Orshan
         BEA Systems
    Richard Astill wrote:
    Hi,
    I am trying to resolve a problem with communication between Tuxedo6.4 and Vitria.
    It seems that there is a problem with the translation of special characters.Does
    anyone know what encoding standard that Tuxedo uses?
    Thanks.

  • New character encoding

    I've invented a Character encoding.
    Advantages of this: ANSI ASCII compatible, Bitwise operations based, Self-synchronising, Abundant.
    Yields of this encoding, against those of UTF-8, are,
    Number of Bits    This Encoding    UTF-8
    Number of Codes    Accumulation    Number of Codes    Accumulation
    8    128    128    128    128
    10    192    320    0    128
    12    512    832    0    128
    14    1280    2112    0    128
    16    3072    5184    2048    2176
    18    7168    12352    0    2176
    20    16384    28736    0    2176
    22    36864    65600    0    2176
    24    81920    147520    65536    67712
    26    180224    327744    0    67712
    28    393216    720960    0    67712
    30    851968    1572928    0    67712
    32    1835008    3407936    2097152    2164864
    This is a new Character encoding scheme (CES) that maps Unicode code points to bit sequences.
    Could you please suggest improvements?
    Please bear with me as the table may not be formatted well for you, especially when using serif. When reading a line in the table, the first value is number of bits and the next pair is for this encoding, and the other pair is for UTF-8, with the first of each pair being the number of codes and the others are their accumulation.
    Sorry! I wish to provide more details on this but I'm restricted for some time. I hope that this does not stop you from assisting me.
    Regards
    Anbu
    This encoding maintains almost all the properties of UTF-8 in a more compact format. ANSI ASCII compatiblity, Bitwise operations based, Self-synchronising and Abundance are some of the properties of this encoding. Further, this encoding encodes all characters in far fewer number of bits than UTF-8 as shown in the table. As I had mentioned earlier I will provide more details and proofs soon. This post is a request for suggestions. Please suggest the most suitable place for this post.

    Ambuk which Adobe software or service is your inquiry in relation too?

  • Adding new character encoding to PBP

    is there any way to add new character encoding in PBP1.1..?
    .I want it to support japanese encoding

    1) Derive MyDateFormat from SimpleDateFormat only allowing the default constructor.
    2) Override the public void applyPattern(String pattern) method so that it detects the 'Q' and replaces it with some easily identifiable pattern involving the month (say �MM� ) and then call the superclass applyPattern method.
    3) Override the public StringBuffer format(Date date,
                StringBuffer toAppendTo,
                FieldPosition fieldPosition)
          method such that if first calls the superclass method to get the formatted output and then corrects this output by replacing (using regular expressions) the �01�, �02� etc with the appropriate quarter.
    You might do better to not to actually derive a new class from SimpleDateFormat but just create a class which uses SimpleDateFormat.

  • Utlfile:- What is Encoding format/type for UTLFILE.

    Hello,
    If i create file using UTLFILE then what will be the Encoding format/type (like ANSII, UTF-8,unicode).
    Please reply soon.
    Thanks in advance

    from documentation:
    UTL_FILE expects that files opened by UTL_FILE.FOPEN in text mode are encoded in the database character set. It expects that files opened by UTL_FILE.FOPEN_NCHAR in text mode are encoded in the UTF8 character set. If an opened file is not encoded in the expected character set, the result of an attempt to read the file is indeterminate. When data encoded in one character set is read and Globalization Support is told (such as by means of NLS_LANG) that it is encoded in another character set, the result is indeterminate. If NLS_LANG is set, it should therefore be the same as the database character set..For more information:
    http://docs.oracle.com/cd/E11882_01/appdev.112/e10577/u_file.htm

  • XML Character Encoding Using UTL_DBWS

    Hi,
    I have a database with WINDOWS-1252 character encoding. I'm using UTL_DBWS to call a web service method which echoes a given string. For this purpose, I do the following:
    DECLARE
        v_wsdl CONSTANT VARCHAR2(500) := 'http://myhost/myservice?wsdl';
        v_namespace CONSTANT VARCHAR2(500) := 'my.namespace';
        v_service_name CONSTANT UTL_DBWS.QNAME := UTL_DBWS.to_qname(v_namespace, 'MyService');
        v_service_port CONSTANT UTL_DBWS.QNAME := UTL_DBWS.to_qname(v_namespace, 'MySoapServicePort');
        v_ping CONSTANT UTL_DBWS.QNAME := UTL_DBWS.to_qname(v_namespace, 'ping');
        v_wsdl_uri CONSTANT URITYPE := URIFACTORY.getURI(v_wsdl);
        v_str_request CONSTANT VARCHAR2(4000) :=
    '<?xml version="1.0" encoding="UTF-8" ?>
    <ping>
        <pingRequest>
            <echoData>Dev Team üöäß</echoData>
        </pingRequest>
    </ping>';
        v_service UTL_DBWS.SERVICE;
        v_call UTL_DBWS.CALL;
        v_request XMLTYPE := XMLTYPE (v_str_request);
        v_response SYS.XMLTYPE;
    BEGIN
        DBMS_JAVA.set_output(20000);
        UTL_DBWS.set_logger_level('FINE');
        v_service := UTL_DBWS.create_service(v_wsdl_uri, v_service_name);
        v_call := UTL_DBWS.create_call(v_service, v_service_port, v_ping);
        UTL_DBWS.set_property(v_call, 'oracle.webservices.charsetEncoding', 'UTF-8');
        v_response := UTL_DBWS.invoke(v_call, v_request);
        DBMS_OUTPUT.put_line(v_response.getStringVal());
        UTL_DBWS.release_call(v_call);
        UTL_DBWS.release_all_services;
    END;
    /Here is the SERVER OUTPUT:
    ServiceFacotory: oracle.j2ee.ws.client.ServiceFactoryImpl@a9deba8d
    WSDL: http://myhost/myservice?wsdl
    Service: oracle.j2ee.ws.client.dii.ConfiguredService@c881d39e
    *** Created service: -2121202561 - oracle.jpub.runtime.dbws.DbwsProxy$ServiceProxy@afb58220 ***
    ServiceProxy.get(-2121202561) = oracle.jpub.runtime.dbws.DbwsProxy$ServiceProxy@afb58220
    Collection Call info: port={my.namespace}MySoapServicePort, operation={my.namespace}ping, returnType={my.namespace}PingResponse, params count=1
    setProperty(oracle.webservices.charsetEncoding, UTF-8)
    dbwsproxy.add.map: ns, my.namespace
    Attribute 0: my.namespace: xmlns:ns, my.namespace
    dbwsproxy.lookup.map: ns, my.namespace
    createElement(ns:ping,null,my.namespace)
    dbwsproxy.add.soap.element.namespace: ns, my.namespace
    Attribute 0: my.namespace: xmlns:ns, my.namespace
    dbwsproxy.element.node.child.3: 1, null
    createElement(echoData,null,null)
    dbwsproxy.text.node.child.0: 3, Dev Team üöäß
    request:
    <ns:ping xmlns:ns="my.namespace">
       <pingRequest>
          <echoData>Dev Team üöäß</echoData>
       </pingRequest>
    </ns:ping>
    Jul 8, 2008 6:58:49 PM oracle.j2ee.ws.client.StreamingSender _sendImpl
    FINE: StreamingSender.response:<?xml version = '1.0' encoding = 'UTF-8'?>
    <env:Envelope xmlns:env="http://schemas.xmlsoap.org/soap/envelope/"><env:Header/><env:Body><ns0:pingResponse xmlns:ns0="my.namespace"><pingResponse><responseTimeMillis>0</responseTimeMillis><resultCode>0</resultCode><echoData>Dev Team üöäß</echoData></pingResponse></ns0:pingResponse></env:Body></env:Envelope>
    response:
    <ns0:pingResponse xmlns:ns0="my.namespace">
       <pingResponse>
          <responseTimeMillis>0</responseTimeMillis>
          <resultCode>0</resultCode>
          <echoData>Dev Team üöäß</echoData>
       </pingResponse>
    </ns0:pingResponse>As you can see the character encoding is broken in the request and in the response, i.e. the SOAP encoder does not take into consideration the UTF-8 encoding.
    I tracked down the problem to the method oracle.jpub.runtime.dbws.DbwsProxy.dom2SOAP(org.w3c.dom.Node, java.util.Hashtable); and more specifically to the calls of oracle.j2ee.ws.saaj.soap.soap11.SOAPFactory11.
    My question is: is there a way to make the SOAP encoder use the correct character encoding?
    Thanks a lot in advance!
    Greetings,
    Dimitar

    I found a workaround of the problem:
        v_response := XMLType(v_response.getBlobVal(NLS_CHARSET_ID('CHAR_CS')), NLS_CHARSET_ID('AL32UTF8'));Ugly, but I'm tired of decompiling and debugging Java classes ;)
    Greetings,
    Dimitar

Maybe you are looking for

  • General upgrading query of iMac G5 1.6ghz 17" OS 10.4.11

    hi ive just purchased my first iMac as above for my son as his 1st desktop. im in the process of upgrading the RAM to 2gb and looking into purchasing Leopard 10.5. could anyone advise whether there are any other 'home' upgrades i can carry out to max

  • Bug between JRockit and X11 forwarding via ssh

    I have encountered what appears to be a bug in the interaction of JRockit with X11 ssh forwarding. When running any Java GUI application on a remote machine using X11 forwarding via ssh, a variety of problems occur. For example: --- cut here --- % mi

  • How to get the NT user id and passwd

    Hi, How to get the NT user id and passwd using form 6i

  • Closing/Killing a long runing Query gracefully using PLSQL

    I have dynamic query which I want to kill programatically in PLSQL if its runing for more then 5 mins and return NULL result set. Any advise how best in can do it. I read about dbms_alert but not sure that is the correct way. Any inputs on this Thank

  • IPhoto and iPad

    Can anyone recommend their preferred Application to do organizing on their iPad and then sync their photos back to iPhoto. I am having a hard time finding an app that does this. To be more specific - I'm less concerned about "editing" on the iPad - m