Unicode and emacs

Hello all!
I don't know, how to come over the following problem: I am using tcsh - Shell, my "Windows settings" of my Terminal are set to unicode (utf-8). But I am unable to enter German Umlauts with my keyboard, when using emacs (MBP, 17 inch, one year old, 10.4.11) - instead of inserting the special letters for example "ö ä ü" emacs is jumping on a different place of the file and inserting nothing!
Don't know, if this is the right place to ask this. But I did not discover a place of UNIX/Shell/BSD related questions in Apple discussions ...
If by any chance an Unix freak is passing by, one other question again: I am just starting to discover emacs, but I am unable to find in my two emacs-books an answer: is it possible to have a syntax-highlighting with different colours in emacs for xml/html, Perl, LaTeX ... ???
Thank you all and best greetings from Munich
marek

The unix forum is here:
http://discussions.apple.com/forum.jspa?forumID=735

Similar Messages

  • Emacs and emacs -nw with solarized look different

    Hey,
    I installed solarized for emacs and in my .Xdefaults (https://github.com/solarized/xresources/blob/master/solarized from here).
    Emacs looks fine, but everything in my urxvt looks slightly off, compared to emacs.
    Here is a screen shot between emacs -nw and emacs side by side:
    http://p67.img-up.net/screenshot8025.png
    The colors are kind of switched. Is there something wrong with the color scheme in my xresources?
    Or do I have to to something additional to get the solarized look right in emacs -nw and urxvt?
    Thanks!
    EDIT:
    setting:URxvt*termName: rxvt-unicode-256color
    change the look of emacs -nw, but now it looks even more wrong.
    Other console programms look better now.
    Last edited by schlicht (2013-08-27 15:09:27)

    Ok I've installed everything and put the following into the .emacs file:
    (require 'org-install)
    ;; The following lines are always needed. Choose your own keys.
    (add-to-list 'auto-mode-alist '("\\.org\\'" . org-mode))
    (global-set-key "\C-cl" 'org-store-link)
    (global-set-key "\C-ca" 'org-agenda)
    (global-set-key "\C-cb" 'org-iswitchb)
    (global-font-lock-mode 1)
    ; for all buffers
    ;;;orgbabel;;;;
    (org-babel-do-load-languages
    'org-babel-load-languages
    '((R . t)
    The last part being the bit about babel - the business end for R. Now I don't need to install anything else? Or any other lisp's for babel? But! The .emacs file is not producing any errors now when I start emacs up.
    [EDIT] Everything work's now I've installed an external - more up to date distribution of Org!
    Last edited by Ben9250 (2010-10-26 19:53:54)

  • How can I use the "gcc compiler and emacs editor" in Solaris 8 for Intel?

    I installed Solaris 8 for Intel to my desktop computer.
    After the installation, I found I couldn't use the companion software (the software CD included in the Multilinugal Media Kit for Solaris 8) - especially the emacs editor and gcc compilers.
    I certainly installed the companion software to my computer. And I can identify that those softwares are
    really installed to my system. In order to check out whether I installed them or not, I went to the "System Administrator" and opened the "Solaris Product Registry", in which I identified that all the files in
    the companion software were succesfully installed in my computer.
    However, when I open terminal or console box and tries to use those
    software, the system responds that "there is no emacs or gcc."
    Why I cannot use those things?
    Did I install the companion software wrong?
    I really don't understand why I can't use them.
    For your reference, I saw the directories "/usr/bin" and "usr/ccs/bin"
    too.
    In /usr/bin I typed
    #as
    and no such command.
    In /usr/ccs/bin
    #as
    and also no such command
    was what I had as a response.
    If I installed the companion software CD correctly and if they are installed somewhere in my computer,
    where I can find them and how I can use those softwares?
    Lastly, whenever I try to connect to my ISP via telnet mode, I see this
    message.
    "Try to connect ...
    connected to ***.***.***.*** (IP address of my ISP)
    Closed by foreign hosts."
    Is there any one who knows why this happens?
    Thanks for reading my question.

    gcc and emacs install in /opt/sfw. You should change your path statement to include /opt/sfw/bin and should obtain the FAQ from sunfreeware.com. The FAQ answers all your questions.
    I have installed gcc and gtk and am using both at this time.
    [email protected]

  • What is alignment in unicode and what are restrictions

    what is alignment in unicode and what are restrictions, dont give about unicode i want only about alignment in unicode
    Points will be awarded if usefull

    Hi,
    Check the following Threads,
    what is internal and external encoding in unicode
    Unicode
    UNICODE
    Regards,
    Padmam.

  • Unicode and Java

    Hi
    As we all know Java treat character literals as Unicode characters. I have been studying Unicode and the way they treat characters and I have a doubt which is not specific to Java code but specific to Unicode.
    Unicode states that each character is assigned a number which is unique, this number is called code point.
    The relationship between characters and code points is 1:1.
    Eg: the String *"hello"* (which is sequence of character literals) can be represent by the following Code Points
    *\u0065 \u0048 \u006c \u006c \u006f*
    I also read that a certain character code point must be recognized by a specific encoding or else a question mark (?) is output in place of the character. Not all code points can be recognized by an encoding.
    So, the letter *ל* would not be recognized by all encodings and should be replaced by a question mark (?) right?
    The interesting is that this code point represents a different character and not a *"?"* in other encodings. It should print the same character
    This is the HTML code I used for tests (save it in your hard disk and open using your navigator, then select the following encodings: UTF16, ISO-8859-1)
    <html>
    <body>
    &#1502;&#1506;&#1497;&#1500; &#1488;&#1495;&#1491; &#1489;&#1490;&#1513;&#1501;, &#1500;&#1497;&#1500;&#1492; &#1513;&#1500; &#1488;&#1508;&#1512;&#1497;&#1500;
    &#1504;&#1508;&#1514;&#1495; &#1499;&#1502;&#1493; &#1506;&#1504;&#1503;, &#1493;&#1512;&#1506;&#1501; &#1488;&#1494; &#1502;&#1488;&#1497;&#1512;
    &#1502;&#1506;&#1497;&#1500; &#1488;&#1495;&#1491; &#1489;&#1490;&#1513;&#1501;, &#1500;&#1497;&#1500;&#1492; &#1495;&#1501; &#1493;&#1511;&#1512;
    &#1504;&#1508;&#1512;&#1505; &#1499;&#1502;&#1493; &#1495;&#1493;&#1508;&#1492;, &#1493;&#1502;&#1514;&#1495;&#1514; &#1488;&#1504;&#1497; &#1513;&#1512;
    &#1502;&#1506;&#1497;&#1500; &#1488;&#1495;&#1491; &#1489;&#1490;&#1513;&#1501;, &#1512;&#1496;&#1493;&#1489;, &#1500;&#1502;&#1497; &#1488;&#1499;&#1508;&#1514;
    &#1488;&#1504;&#1497; &#1500;&#1488; &#1506;&#1500; &#1492;&#1488;&#1512;&#1509;, &#1488;&#1497;&#1514;&#1498; &#1500;&#1502;&#1506;&#1500;&#1492; &#1513;&#1496;
    &#1512;&#1493;&#1495; &#1489;&#1508;&#1504;&#1497;&#1501;, &#1496;&#1497;&#1508;&#1493;&#1514; &#1492;&#1490;&#1513;&#1501; &#1492;&#1488;&#1495;&#1512;&#1493;&#1504;&#1493;&#1514;
    &#1504;&#1493;&#1490;&#1506;&#1493;&#1514; &#1489;&#1500;&#1495;&#1497;&#1497;&#1501;, &#1489;&#1508;&#1504;&#1497;&#1497;&#1498; &#1502;&#1513;&#1495;&#1511;&#1493;&#1514;
    &#1488;&#1502;&#1510;&#1506; &#1492;&#1512;&#1495;&#1493;&#1489;, &#1499;&#1493;&#1500;&#1501; &#1499;&#1489;&#1512; &#1497;&#1513;&#1504;&#1497;&#1501;
    &#1492;&#1497;&#1497;&#1514;&#1492; &#1506;&#1491;&#1492; &#1492;&#1512;&#1493;&#1495; &#1493;&#1506;&#1493;&#1491; &#1513;&#1504;&#1497; &#1499;&#1493;&#1499;&#1489;&#1497;&#1501;
    &#1488;&#1502;&#1510;&#1506; &#1492;&#1512;&#1495;&#1493;&#1489;, &#1499;&#1493;&#1500;&#1501; &#1499;&#1489;&#1512; &#1497;&#1513;&#1504;&#1497;&#1501;,
    &#1492;&#1497;&#1497;&#1514;&#1492; &#1506;&#1491;&#1492; &#1492;&#1512;&#1493;&#1495; &#1493;&#1506;&#1493;&#1491; &#1513;&#1504;&#1497; &#1499;&#1493;&#1499;&#1489;&#1497;&#1501;
    &#1512;&#1488;&#1497;&#1514;&#1497; &#1494;&#1493;&#1490; &#1506;&#1497;&#1504;&#1497;&#1497;&#1501;, &#1502;&#1505;&#1512;&#1489;&#1493;&#1514; &#1500;&#1492;&#1497;&#1508;&#1514;&#1495;
    &#1510;&#1493;&#1500;&#1500;&#1514; &#1488;&#1500; &#1506;&#1510;&#1502;&#1498; &#1506;&#1502;&#1493;&#1511; &#1489;&#1497;&#1501; &#1513;&#1500;&#1498;,
    &#1502;&#1491;&#1497; &#1508;&#1506;&#1501; &#1488;&#1514; &#1506;&#1493;&#1500;&#1492;, &#1500;&#1493;&#1511;&#1495;&#1514; &#1511;&#1510;&#1514; &#1488;&#1493;&#1497;&#1512;
    &#1500;&#1488; &#1512;&#1493;&#1510;&#1492; &#1500;&#1492;&#1497;&#1505;&#1495;&#1507;, &#1502;&#1499;&#1497;&#1512;&#1492; &#1488;&#1514; &#1492;&#1502;&#1495;&#1497;&#1512;
    &#1488;&#1502;&#1510;&#1506; &#1492;&#1512;&#1495;&#1493;&#1489;, &#1499;&#1493;&#1500;&#1501; &#1499;&#1489;&#1512; &#1497;&#1513;&#1504;&#1497;&#1501;...
    </body>
    </html>I would appreciate if you correct me in case I am wrong!
    Edited by: charllescuba1008 on Mar 31, 2009 2:08 PM

    charllescuba1008 wrote:
    Unicode states that each character is assigned a number which is unique, this number is called code point. Right.
    The relationship between characters and code points is 1:1.Uhm .... let's assume "yes" for the moment. (Note that the relationship between the Java type char and code point is not 1:1 and there are other exceptions ...)
    Eg: the String *"hello"* (which is sequence of character literals) can be represent by the following Code Points
    *\u0065 \u0048 \u006c \u006c \u006f*Those are the Java String unicode escapes. If you want to talk about Unicode Codepoints, then the correct notation for "Hello" would be
    U+0048 U+0065 U+006C U+006C U+006F
    Note that you swapped the H and e.
    I also read that a certain character code point must be recognized by a specific encoding or else a question mark (?) is output in place of the character.This one is Java specific. If Java tries to translate some unicode character to bytes using some encoding that doesn't support that character then it will output the byte(s) for "?" instead.
    Not all code points can be recognized by an encoding.Some encodings (such as UTF-8) can encode all codepoints, others (such as ISO-8859-*, EBCDIC or UCS-2) can not.
    So, the letter *&#1500;* would not be recognized by all encodings and should be replaced by a question mark (?) right?Only in a very specific case in Java. This is not a genral Unicode-level rule.
    (disclaimer: the HTML code presented was using decimal XML entities to represent the unicode characters).
    What you are seing is possibly the replacement character that your text rendering system uses to represent characters that it knows, but can't display (possibly because the current font has no character for them).

  • Unicode and non-unicode

    WHAT IS DIFFRENTS BETWEEN UNICODE AND NON UNICODE ?
    BRIEFLY EXPLAIN ABOUT UNICODE?
                                                            THANKS IN ADVANCES

    A 16-bit character encoding scheme allowing characters from Western European, Eastern European, Cyrillic, Greek, Arabic, Hebrew, Chinese, Japanese, Korean, Thai, Urdu, Hindi and all other major world languages, living and dead, to be encoded in a single character set. The Unicode specification also includes standard compression schemes and a wide range of typesetting information required for worldwide locale support. Symbian OS fully implements Unicode. A 16-bit code to represent the characters used in most of the world's scripts. UTF-8 is an alternative encoding in which one or more 8-bit bytes represents each Unicode character. A 16-bit character set defined by ISO 10646. A code similar to ASCII, used for representing commonly used symbols in a digital form. Unlike ASCII, however, Unicode uses a 16-bit dataspace, and so can support a wide variety of non-Roman alphabets including Cyrillic, Han Chinese, Japanese, Arabic, Korean, Bengali, and so on. Supporting common non-Roman alphabets is of interest to community networks, which may want to promote multicultural aspects of their systems.
    ABAP Development under Unicode
    Prior to Unicode the length of a character was exactly one byte, allowing implicit typecasts or memory-layout oriented programming. With Unicode this situation has changed: One character is no longer one byte, so that additional specifications have to be added to define the unit of measure for implicit or explicit references to (the length of) characters.
    Character-like data in ABAP are always represented with the UTF-16 - standard (also used in Java or other development tools like Microsoft's Visual Basic); but this format is not related to the encoding of the underlying database.
    A Unicode-enabled ABAP program (UP) is a program in which all Unicode checks are effective. Such a program returns the same results in a non-Unicode system (NUS) as in a Unicode system (US). In order to perform the relevant syntax checks, you must activate the Unicode flag in the screens of the program and class attributes.
    In a US, you can only execute programs for which the Unicode flag is set. In future, the Unicode flag must be set for all SAP programs to enable them to run on a US. If the Unicode flag is set for a program, the syntax is checked and the program executed according to the rules described in this document, regardless of whether the system is a US or a NUS. From now on, the Unicode flag must be set for all new programs and classes that are created.
    If the Unicode flag is not set, a program can only be executed in an NUS. The syntactical and semantic changes described below do not apply to such programs. However, you can use all language extensions that have been introduced in the process of the conversion to Unicode.
    As a result of the modifications and restrictions associated with the Unicode flag, programs are executed in both Unicode and non-Unicode systems with the same semantics to a large degree. In rare cases, however, differences may occur. Programs that are designed to run on both systems therefore need to be tested on both platforms.
    Refer to the below related threads
    Re: Why the select doesn't run?
    what is unicode
    unicode
    unicode
    Regards,
    Santosh

  • Cannot convert between unicode and non-unicode string datatypes

      My source is having 3 fields :
    ItemCode nvarchar(50)
    DivisionCode nvarchar(50)
    Salesplan (float)
    My destination is : 
    ItemCode nvarchar(50)
    DivisionCode nvarchar(50)
    Salesplan (float)
    But still I am getting this error : 
    Column ItemCode cannot convert between unicode and non-unicode string datatypes.
    As I am new to SSIS , please show me step by step.
    Thanks In Advance.

      My source is having 3 fields :
    ItemCode nvarchar(50)
    DivisionCode nvarchar(50)
    Salesplan (float)
    My destination is : 
    ItemCode nvarchar(50)
    DivisionCode nvarchar(50)
    Salesplan (float)
    But still I am getting this error : 
    Column ItemCode cannot convert between unicode and non-unicode string datatypes.
    As I am new to SSIS , please show me step by step.
    Thanks In Advance.
    HI Subu ,
    there is some information gap , what is your source ? are there any transformation in between ?
    If its SQL server source and destination and the datatype is as you have mentioned I dont think you should be getting such errors ... to be sure check advance properties of your source and check metada of your source columns
    just check simple oledb source as
    SELECT TOP 1 ItemCode = cast('111' as nvarchar(50)),DivisionCode = cast('222' AS nvarchar(50)), Salesplan = cast(3.3 As float) FROM sys.sysobjects
    and destination as you mentioned ... it should work ...
    somewher in your package the source columns metadata is not right .. and you need to convert it or fix the source.
    Hope that helps
    -- Kunal
    Hope that helps ... Kunal

  • Unicode and Non-Unicode Instances in one Transport Landscape

    We have a 4.7 landscape that includes a shared global development system supporting two regional landscapes.  The shared global development system is used for all ABAP/Workbench activity and for global customization used by both regional production systems.  The two regional landscapes include primarily three instances - Regional Configuration, Quality Assurance, and Production.  The transport landscape includes all systems with transport routes for global and regional.
    A conversion to unicode is also being planned for the global development and one regional landscape.  It is possible that we will not convert the other regional landscape due to pending discussions on consolidation.  This means one of the regional landscapes will be receiving global transports from a unicode-based system.  
    All information I've located implies no actual technical constraints.  Make sure you have the right R3trans versions, don't use non-Latin_1 languages, etc.  Basic caveats for a heterogenous environment ....
    Is anyone currently supporting a complete, productive landscape that includes unicode and non-unicode systems?   If so, any issues or problems encountered with transports across the systems?  (insignificant or significant)
    Information on actual experiences will be greatly appreciated ....
    Many thanks in advance.

    Hi Laura,
    Although i do not have the live / practical experience, but this is what i can share.
    I have been working on a Non-Unicode to Unicode conversion project. While we were in the discussion phase there was one such possibility of a scenario that part of the landscapes would remain non-unicode. So based on the research i did by reading and directly interacting with some excellent SPA consultants, i came to know there are absolutely no issues in transporting ABAP programs from a Unicode system to non-unicode system. In a Unicode system the ABAP code has already been checked and rectified for higher syntax checks and these are downward compatible with the ABAP code on lower ABAP versions and non-unicode systems. Hence i beleive there should not be any issues, however as i mentioned this is not from practical experience.
    Thanks.
    Chetan

  • Need Clarification On Unicode and Upgrade-ECC6.0

    Dear All,
    I need some clarification unicode and upgrade . It would be great help if you give your time .
    We had 2 code pages - 1100,1401 in 4.6B system. We had languages - FR,EN,ES,PT and PL. The system has been upgraded to ECC6.0 non-unicode now.
    Now in I18N->System configuration (RSCPINST), only EN is listed. SPUMG asked for activation of I18N to proceed. When the I18N activation was done,  it has knocked out the code page 1401 from TCPDB table.
    Is this normal?
    But, the code pages 1401 is shown as consistent in SCP transaction.
    The system setting has changed to Single code page. Will this affect unicode migration? How did the additonal code page 1401 which was in 4.6B get knocked out
    now? How did the languages ES,FR,PT, IT and PL which were in 4.6B get knocked out of RCPINST?
    We are manually filling the vocabulary since SPUMG is not showing Scanning tabs. The language key in vocabulary is not completely set. The reprocess logs are not completely green. Will this allow unicode migration now? Can we start unicode migration even with this status.
    Regards,
    Santosh
    Edited by: santosh suryavanshi on Nov 18, 2010 11:11 PM
    Edited by: santosh suryavanshi on Nov 18, 2010 11:11 PM

    Hi Santosh,
    SAP ECC 6.0 is not supported with MDMP. This is the reason for the behaviour in RSCPINST.
    The standard way for an upgrade based on start release 4.6B with MDMP would be TU&UC (see SAP note 959698).
    Do you follow this procedure ?
    Best regards,
    Nils Buerckel
    SAP AG

  • ECC 6.0 - Non-Unicode and BI 7.0 - Unicode any known problems?

    We are determing whether or not to do a unicode conversion during our upgrade from BW 3.5 to BI 7.0.  Our R/3 system is non-unicode and my company does not utilize languages other than English and is not expected to at any time in the near future.  This being said we are trying to determine whether or not we should go ahead and do the conversion prior to being forced to by SAP at some point in the future.  I was just hoping I could get some insight into whether there were known issues with running the two versions in conjuction with each other.
    Thanks in advance,
    John

    We have a customer running this scenario, it should be possible to do so. Just keep in mind to do extensive testing as there might be some notes which are not yet in standard ...
    Regards, Kai

  • Emacs AND emacs-nox

    I'd like to use on the same machine both emacs and emacs-nox. But they are mutually exclusive. A proximate cause is that they both give
    the same name, 'emacs', to the resulting executable. I presume there are no fundamental conflicts between the X and the no-X versions
    of Emacs, and I'd be perfectly happy to run emacs-nox under a different name. How can I get it my way?
    The reason I want emacs-nox is that I need a just-in-time, disposable, mini-emacs, to be evoked for a brief moment in, say, the 'root' permission terminal that I keep constantly open to run a root-authority commands or touch up a system file. I used to use zile for that, but it is not 100% compatible with emacs, lacks many of the most common commands, and doesn't handle well Ctrl, Alt, Bcksp, and friends.
    When looking for a small-footprint version of emacs for the Raspberry Pi, I realized that emacs-nox has a much smaller footprint not only than emacs, but also than emacs -nw, and in fact is not much bigger than zile itself. So, why use a crippled app if one can use the real thing at very
    little extra cost?
    What do you think? Thanks

    rugantino wrote:The reason I want emacs-nox is that I need a just-in-time, disposable, mini-emacs, to be evoked for a brief moment in, say, the 'root' permission terminal that I keep constantly open to run a root-authority commands or touch up a system file. I used to use zile for that, but it is not 100% compatible with emacs, lacks many of the most common commands, and doesn't handle well Ctrl, Alt, Bcksp, and friends.
    Use TRAMP to edit files via sudo within Emacs. No need to start up a new instance.
    Alternatively, use ABS to recompile it and add the --program-{prefix,suffix} option to change the name of the binary to emacs-nox or whatever.
    Last edited by jakobcreutzfeldt (2013-06-01 10:20:04)

  • What  type of  database  operations  effectd  with  Unicode  and  non  unic

    Hi  Friends,
       I want  to  know what  type of  database  operations  effects  with  Unicode  and  non Unicode  Programing  .
    Thanks,
    Ravi Kumar Mukkera

    Hi ,
    Check these links .
    http://help.sap.com/saphelp_nw04/helpdata/en/62/3f2cadb35311d5993800508b6b8b11/frameset.htm
    https://www.sdn.sap.com/irj/sdn/go/portal/prtoot/docs/library/uuid/ff99cb90-0201-0010-e389

  • Differnce between unicode and non unicode

    Hi every body i want to differnce  between unicode and non unicode and for what purposes this ulities are used explain me little brief what is t code for that , how to checj version, how to convert uni to non uni ?
    Advance Thanks
    Vishnuprasad.G

    Hello Vishnu,
    before Release 6.10, SAP software only used codes where every character is displayed by one byte, therefore character sets like these are also called single-byte codepages. However, every one of these character sets is only suitable for a limited number of languages.
    Problems arise if you try to work with texts written in different incompatible character sets in one central system. If, for example, a system only has a West European character set, other characters cannot be correctly processed.
    As of 6.10, to resolve these issues, SAP has introduced Unicode. Each character is generally mapped using 2 bytes and this offers a maximum of 65 536 bit combinations.
    Thus, a Unicode-compatible ABAP program is one where all Unicode checks are in effect. Such programs return the same results in UC systems as in non-UC systems. To perform the relevant syntax checks, you must activate the "UC checks" flag in the screens of the program and class attributes.
    With TC: /nUCCHECH you can check a program set for a syntax errors in UC environment.
    Bye,
    Peter

  • Unicode and converter

    Hi there all, Im Chrno, and i have now a question... well exactly what im trying to do now, that's imposible to me lol... Hope you guys can help
    OKey so this the question...
    I want to convert a string like this (in Vietnamese) "t&aacute; l&#7843;" or something like "ch&uacute;ng t&ocirc;i lu&ocirc;n ch&agrave;o &#273;&oacute;n b&#7841;n" to a string like this " *t&aacute; l & # 7 8 4 3;* "
    I make the first string with Unicode and dont know how to convert it in wellform like the second string...
    Ths for read it and hope u can help me out :)
    Edited by: ChrnoLove on Apr 24, 2009 9:41 AM

    yet all their ID tags were edited in iTunes when I had them on my PC
    I think the problem there is that on a PC you can have them a legacy Japanese encoding while the Mac only accepts Unicode, and the Mac and PC also use slightly different forms of Unicode. But you are right, I don't see why, if all were ok in Windows, only some would be ok on the Mac.
    There is a Japanese version of these forums where you might ask if you or a colleague knows Japanese well:
    http://discussions.info.apple.co.jp/

  • Unicode and Chinese

    This is driving me nuts.
    Created a page where there is a mix of English and Chinese,
    used unicode
    and worked fine.
    But then created another page exactly the same and now the
    unicode is
    not being converted..
    First link is fine
    http://www.destinationcdg.com/Bonaparte/BonaparteC.cfm
    But this link is all screwed up.
    http://www.destinationcdg.com/Bonaparte/areaC.cfm
    Any ideas please.
    DW8.02 CFMX7 and Apache2

    Hi guys
    I've just realised that the solution here isn't totally complete. If you are still interested in helping I would be really grateful.
    Quick re-cap:
    The problem was Java was mis-calculating the length of unicode strings.
    e.g. ...
    String nihao = "??";  //Should read 2 chinese characters, may display here as ??
    System.out.println(nihao.length()) ;... would print 6 or something, but not 2 as it should.
    I was recommended to use a parameter when invoking javac which fixed this problem.
    javac -encoding UTF-8 ClassName.javaNow, this solved the problem so far.
    However!!!! What I assumed would work and didn't test until now is this:
    System.out.println(nihao);But it doesn't work.
    So in a nutshell. If I have a Class which contains unicode strings out of the usual latin set and encode that text file as unicode, use a -encoding UTF-8 parameter when compiling, Java still prints out ?? to the command line.
    Is it my shell or is it Java?
    I'm using the Bash shell.
    If I had a file called ??.txt (should be 2 chinese chars) and used ls then ?? (should be 2 chiense chars) would not display properly. I would get ??.txt.
    To get the file name to display properly I would need to use ls -v. This -v flag makes things work.
    I've tried it with the java command but java doesn't like it.
    This is really doing my head in. If anyone has any ideas please help.
    Thanks.
    Chinese characters don't seem to be uploading to this website so it makes this post difficult. Where you are supposed to see chinese I have said so. It might display as ??. There are places where I wanted to write ??.
    I can't award Duke Dollars to this post as I did it already. I have posted a fresh version of this problem in the Java Programming forum. I have allocated Duke Dollars to that post so best to reply there if you have any ideas :)
    Message was edited by:
    stanton_ian

Maybe you are looking for