Non US characters in login and email generation

I have a design problem that I would like to check if anyone else has found a good solution to.
Once you leave the safe shores of the United States your users start having names that includes all kinds of funny characters. In the good old days this problem was resolved by the fact that the HR system only handled 7 bit US ascii characters but today you are likely to have to face an HR system that supports unicode or at least some kind of character set that includes lots of non US ascii characters. I just ran some stats on my current enterprise population and it seems like about 5% of the users have names containing "strangeness".
These strange characters causes big problems if you aren't allowed to include non US ascii characters in logins, email addresses and other generated fields. Exactly what a "strange character" is varies. RFC 5322 takes a quite liberal view towards special characters but explicitly disallows non US letters.
The simplistic solution is to drop any character that isn't a US ascii letter. This works if the problem is names like "O'Malley" as the "'" really shouldn't be part of the user login and probably not part of an email address either(can be debated). This solution breaks down when you get to Germany or Scandinavia where your users that are called "Örjan Åhs" may not appreciate an email address of rjan.hs@your_company.com.
What you would like to do is to convert "Örjan Åhs" to either "Orjan Ahs" or (possible) "Oerjan Aohs" but I haven't been able to find any java lab that does that conversion for you.
Anyone that has run into this problem before and solved it?
I wonder how certain characters in this post will be rendered on computers in different parts of the world :)
/Martin, who long ago converted his last name (Swedish) to be 7 bit ascii compliant

Thanks Daniel
The code above drops any non US ascii characters which is fine in some situations but doesn't work for me as that would result in (amongst other issues) unacceptable email addresses.
Example: The user "Jörgen Åhs" gets the email [email protected] (using drop strategy), what is needed is [email protected]
The solution to this problem is to write a transform function and as we have about 80 non US ascii characters in character set we are using this mapping can quite easily be externalized to a configuration file.
Good point about the preferred name. I have not seen this specific problem in my current system but it is very common in certain parts of the world i.e. people with Chinese heritage in south east Asia often have a Chinese legal name and a western name that they actually use in day to day interactions. If you base the email address of their name in HR much screaming ensures. The same thing should actually happen in the US as you are supposed to enter the name on your social security card into the HR system but that seems largely to be ignored.

Similar Messages

  • Find non alpha characters in a string/email

    HI,
    I'm on 10g database.
    I want to find out a list of email address where there are non alpha characters in an email .
    Example
    KEY1 EMAIL
    1       [email protected]
    2       [email protected]
    3       AXIAP#[email protected] can i find out the emails with "#" and "-" characters only.
    Thanks
    Edited by: user527060 on Aug 14, 2009 8:15 AM

    Hey Centinul, you have a DUP_VAL_ON_INDEX ;)
    It's friday after all
    SQL> with t as (
      2  select 1 key1, '[email protected]' email from dual union all
      3  select 2,'[email protected]' from dual union all
      4  select 3,'AXIAP#[email protected]' from dual
      5  )
      6  select key1
      7  ,      email
      8  from   t
      9  where  length(regexp_replace(substr(email, 1, instr(email, '@') -1), '[[:alnum:]]')) is not null;
          KEY1 EMAIL
             1 [email protected]
             3 AXIAP#[email protected]
    2 rows selected.http://www.oracle.com/technology/oramag/webcolumns/2003/techarticles/rischert_regexp_pt1.html
    http://www.oracle.com/technology/oramag/webcolumns/2003/techarticles/rischert_regexp_pt2.html
    However regexp consume CPU, why not just simply:
    SQL> with t as (
      2  select 1 key1, '[email protected]' email from dual union all
      3  select 2,'[email protected]' from dual union all
      4  select 3,'AXIAP#[email protected]' from dual
      5  )
      6  select key1
      7  ,      email
      8  from   t
      9  where  email like '%#%'
    10  or     email like '%-%';
          KEY1 EMAIL
             1 [email protected]
             3 AXIAP#[email protected]
    2 rows selected.if you're only interested in finding # or - occurences in an emailaddress?

  • Loading Non-English Characters using VBA and BAPI

    Hi Experts,
    I am trying to load Non-English characters (Chinese, Korean, Japanese, etc.) into a SAP Table using BAPI and VBA. I have set the connection language and codepage values but when I run the tool, the non-English characters display as ????? or #####. Do you know how to fix this issue?
    Thanks!

    If your language is a unicode tehn you need to change the options  like IN SAP you need to change it to unicode  in the initial screen Customize local layout(ALT F12) options 118  --> Encoding ....

  • Unicode: non-Latin characters in identifiers and data

    I would like to use Unicode escapes in identifiers, say create
    a variable name that is Japanese. But I can't seem to get this
    to work.
    I have a product (Japanese Partner) that lets me key in latin
    characters then converts these to Japanese (kana or Kanji,
    depending on various options) in Unicode and passes them
    to the input line.
    But I can't get these to compile.
    Also, if I code:
    char Jletter = '\u2f80';
    System.out.println("Jletter = " + Jletter);
    The runtime output is:
    Jletter = ?
    I thought it was supposed to display as a Unicode escape.
    TIA for any help.

    Perhaps, but I'm going on:
    "Programs are written in Unicode (�3.1), but lexical translations are provided (�3.2) so that Unicode escapes (�3.3) can be used to include any Unicode character using only ASCII characters."
    Then ...
    "3.2 Lexical Translations
    A raw Unicode character stream is translated into a sequence of tokens, using the following three lexical translation steps, which are applied in turn:
    1. A translation of Unicode escapes (�3.3) in the raw stream of Unicode characters to the corresponding Unicode character. A Unicode escape of the form \uxxxx, where xxxx is a hexadecimal value, represents the UTF-16 code unit whose encoding is xxxx. This translation step allows any program to be expressed using only ASCII characters.
    2. A translation of the Unicode stream resulting from step 1 into a stream of input characters and line terminators (�3.4).
    3. A translation of the stream of input characters and line terminators resulting from step 2 into a sequence of input elements (�3.5) which, after white space (�3.6) and comments (�3.7) are discarded, comprise the tokens (�3.5) that are the terminal symbols of the syntactic grammar (�2.3). "
    I take this to mean you can Unicode escapes for Unicode characters. But
    it doesn't seem to work, so maybe my understanding is deficient. Maybe
    the docs need to be more clear.

  • Non English characters conversion issue in LSMW BAPI Inbound IDOCs

    Hi Experts,
    We have some fields in customer master LSMW data load program which can
    contain non-English characters. We are facing issues in LSMW BAPI
    method with non-English characters Conversion. LMSW steps read and
    conversion are showing the non-English characters properly with out any
    issue. While creating inbound IDOCs most of the non-English characters
    replaced with '#' and its causing issues in creating customer master data in
    system. In our scenario customer data with non-English characters in
    the first name, last name and address details. Any specific setting
    needs to be done from our side? Please suggest me to resolve this issue.
    Thanks
    Rajesh Yadla

    If your language is a unicode tehn you need to change the options  like IN SAP you need to change it to unicode  in the initial screen Customize local layout(ALT F12) options 118  --> Encoding ....

  • RegEx in TSQL - replace non-alphanumeric characters etc

    Hi guys, I have this function in VB that I used in Access to replace all non-alphanumeric characters, including spaces and anything in brackets.
    Public Function charactersonly(inputString As String) As String
    Dim RE As Object
    Set RE = CreateObject("vbscript.regexp")
    RE.Pattern = "\([^)]+\)|[^\w]|_"
    RE.Global = True
    charactersonly = RE.Replace(inputString, "")
    Set RE = Nothing
    End Function
    Now, I moved to SQL server and I'm writing scripts to do same thing.
    How can I use RegEx in TSQL?
    Only thing I will do is that function.

    As alternative
    declare @string varchar(200)
    set @string = 'gg$%^^&is%^& s2342jjk23&&({}e c76l232e+_+a#n/ c][#o''y#e'
    select cast(cast((select substring(@string,n,1)
    from numbers
    where n <= len(@string)
    and substring(@string,n,1) like '[0-9 ]' for xml path('')) as xml)as varchar(max))
    Best Regards,Uri Dimant SQL Server MVP,
    http://sqlblog.com/blogs/uri_dimant/
    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Consulting:
    Large scale of database and data cleansing
    Remote DBA Services:
    Improves MS SQL Database Performance
    SQL Server Integration Services:
    Business Intelligence

  • Non English characters in BIP email

    Hi, my report contains Japanese characters, when I view the output in HTML format. It is displayed properly. But when I click on send button , enter email parameters like to, cc, bcc, subject , etc and send it, in the mail I receive, the japanese characters are not getting displayed properly. The same problem occurs for spanish and portugese texts-in general to all non english characters. I am using Oracle Business Intelligence Publisher Release 10.1.3.4. If someone has faced a similar issue, kindly help. Thanks in advance

    Suggestions
    1) Try with NLS_LANG as
    SWEDISH_SWEDEN.WE8DEC
    2) Make a paramform and enter via paramform (unencoded)
    (This is just for testing purpose)
    3) Change machine locale to swedish and try
    4) Which reports version is this ?
    Please see
    BUG 2713695 - NLS CHARACTERS FOR PARAMETERS CHANGE TO QUESTION MARKS WHEN PASSED ON URL BAR
    Get in touch with Support to see if this is the issue and if "yes" get a one-off patch.
    [    All Docs for all versions    ]
    http://otn.oracle.com/documentation/reports.html
    [     Publishing reports to web  - 10G  ]
    http://download.oracle.com/docs/html/B10314_01/toc.htm (html)
    http://download.oracle.com/docs/pdf/B10314_01.pdf (pdf)
    [   Building reports  - 10G ]
    http://download.oracle.com/docs/pdf/B10602_01.pdf (pdf)
    http://download.oracle.com/docs/html/B10602_01/toc.htm (html)
    [   Forms Reports Integration whitepaper  9i ]
    http://otn.oracle.com/products/forms/pdf/frm9isrw9i.pdf
    ---------------------------------------------------------------------------------

  • Cannot login with password containing non-ascii characters

    Hello,
    I have web application, form based login. UTF-8 is specified "everywhere".
    And it works, except for passwords.
    If user register itself with password containing non-ascii characters, it is correctly written in database, but when doing either programmatic login or normal form based login, if fails.
    If the password is only ascii, it works.
    Username of login could be ascii or non-ascii, it doesn't matter, both works.
    I'm using sun java application server 9.1.
    jdbc realm.
    I'm not using hashing passwords, just clean (now)
    I tried configure realm Charset: UTF8 as last chance, but it doesn't work either.
    The problem is only with non-ascii characters in password.
    Any help very appreciated
    Thanks a lot

    hi,
    I know all that, but that's not the case. My app uses preparedStatements, everything is properly configured, in all pages, utf-8 is going from user to db and back without any problems.
    The only problem is with password field. As I am using form based login, with jdbc realm configured (again, nicely working when only ascii characters), I have very little chance to do something bad through the login phase.
    I'm not talking about special characters, I'm talking about non-ascii characters, let's say - Chinese, arabish, Russian alphabet etc.
    When user registers (my code), the fields are properly written to db. I have checked that, trust me.
    But the Sun app server realm seems to have some problems with the password field.
    (realm uses jdbc connection to mysql, the url contains all extra parameters to be sure about utf8. there is nothing more what can be configured...)
    If I try other alphabet codes in login and ascii in password, it works. But soon, as I use other alphabet code also in password, it doesn't work anymore.
    My only idea is, that I could try MD5 to create ascii only characters (I hope it works that way) on the client with javascript and then set Digest to MD5 in realm configuration. But still, it seems very strange. The clear way storage should also function? (now set Digest to 'none')
    Is it a bug of Sun App Server?
    thanks

  • Odd number of non-english characters get broken in windows-chrome and ff

    I developed jnlp applet which prints out the user input.
    When I put odd number of non-english characters(eg: chinese), chrome and firefox browser prints out the last character as question mark.
    input : 가
    output : 가��
    I checked on java console that the character is correct.
    It must be bug in communication of applet to chrome browser.
    IE prints out correctly.
    I can resolve the issue by appending white space on applet and remove it on java script.
    Anyone has any clue on the issue?
    Codes are as follows.
    MainApplet.Java
    public class MainApplet extends JApplet implements JSInterface{//, Runnable {
         public int stringOut(String sData) {
              OutData = sData;
              return 0;
    js File
    function TSToolkitRealWrapper ()
         var OutData;
         var OutDataNum;
    var TSToolkit = new TSToolkitRealWrapper();
    var attributes = { id:'TSToolkitReal',code:'tradesign.pkitoolkit.applet.MainApplet', width:100, height:100} ;
    var parameters = {jnlp_href: getContextPath() + '/download/pkitoolkit.jnlp',
                         separate_jvm:true, classloader_cache:false} ;
    TSToolkitRealWrapper.prototype.stringOut=function(str)
              var      nRet = TSToolkitReal.stringOut(str)     ;
              this.OutData= TSToolkitReal.OutData;
              return      nRet;
    HTML
    <SCRIPT language=javascript>
    <!--
    function StringOut(form)
         var data = form.data.value;
         var nRet = 0;
         var base64Data;
         nRet = TSToolkit.stringOut(data);
         if (nRet > 0)
              alert(nRet + " : " + TSToolkit.GetErrorMessage());
         else
              form.data1.value = TSToolkit.OutData;
    -->
    </SCRIPT>
    Edited by: user13496918 on 2013. 3. 20 오후 7:29
    Edited by: user13496918 on 2013. 3. 20 오후 7:39
    Edited by: user13496918 on 2013. 3. 20 오후 9:17
    Edited by: user13496918 on 2013. 3. 20 오후 9:18

    I checked on java console that the character is correct.So it isn't a Java problem.
    It must be bug in communication of applet to chrome browser.So tell the people who make the Chrome browser.
    IE prints out correctly.That's a change. I've just spent nine days tracking down an IE applet problem and I'm not finished yet.
    Please omit the boldface next time. We can read. Boldface doesn't help; it makes it worse.

  • When I try to send an email I get a message - Non ASCII characters in the local part of the recipient address.

    I am trying to send an emails to Italy. When I click send I get a message ( Non-ASCII characters in the local part of the recipient address). [email protected]  is one of the email address I am trying to send to. My other email address' work OK. I have sent emails to these Italian address before with no problem.

    Restart the operating system in '''[http://en.wikipedia.org/wiki/Safe_mode safe mode with Networking]'''. This loads only the very basics needed to start your computer while enabling an Internet connection. Click on your operating system for instructions on how to start in safe mode: [http://windows.microsoft.com/en-us/windows-8/windows-startup-settings-including-safe-mode Windows 8], [http://windows.microsoft.com/en-us/windows/start-computer-safe-mode#start-computer-safe-mode=windows-7 Windows 7], [http://windows.microsoft.com/en-us/windows/start-computer-safe-mode#start-computer-safe-mode=windows-vista Windows Vista], [http://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/boot_failsafe.mspx?mfr=true" Windows XP], [http://support.apple.com/kb/ht1564 OSX]
    ; If safe mode for the operating system fixes the issue, there's other software in your computer that's causing problems. Possibilities include but not limited to: AV scanning, virus/malware, background downloads such as program updates.

  • Flex, xml, and non-English characters

    Hello! I have a Flex web app with AdvancedDataGrid. And I use httpService component to load some data to grid. The .xml file contains non-english characters in attributes (russian in my case) like this:
    <?xml version="1.0" encoding="utf-8" ?>
       <Autoparts>
        <autopart  DESCRIPTION="Барабан">
    </Autoparts>
    And when i run app, AdvancedDataGrid display it like "Ñ&#129;ПÐ". How can i fix it? I try to change encoding="utf-8" with some another charsets, bun unsuccesfully. Thank you.

    Try changing the xml structure by using CDATA instead of having the russian part as an attribute and see if that makes any difference.
    What I meant is use something like this:
    <?xml version="1.0" encoding="utf-8" ?>
       <Autoparts>
        <autopart>
           <description><![CDATA[Барабан]]></description>
      </autopart>
    </Autoparts>
    instead of the current xml.

  • PDF generation for Non English Characters from ADF

    Hi
    We are using below piece of code to generate pdf from ADF Managed bean. It works fine. However for non English Characters(eg. Japanese,Vietnamese,Arabic)  it puts
    I got few blogs
    https://blogs.oracle.com/BIDeveloper/entry/non-english_characters_appears
    However we are not using BI Publisher product . We are using its API's
    Can anyone tell where do we need to setup fonts within ADF or Weblogic or Server ?
    Input Parameters are
    a)xml Data
    b)InputStream  ie rtf Template
    import oracle.apps.xdo.XDOException;
    import oracle.apps.xdo.template.FOProcessor;
    import oracle.apps.xdo.template.RTFProcessor;
        public static byte[] genPdfRep(String pOutFileType,byte[] pXmlOut ,InputStream pTemplate)
            byte[] dataBytes = null;
            try {
                //Process RTF template to convert to XSL-FO format
                RTFProcessor rtfp = new RTFProcessor(pTemplate);
                ByteArrayOutputStream xslOutStream = new ByteArrayOutputStream();
                rtfp.setOutput(xslOutStream);
                rtfp.process();
                //Use XSL Template and Data from the VO to generate report and return the OutputStream of report
                ByteArrayInputStream xslInStream = new ByteArrayInputStream(xslOutStream.toByteArray());
                FOProcessor processor = new FOProcessor();
                ByteArrayInputStream dataStream = new ByteArrayInputStream((byte[])pXmlOut);  
                processor.setData(dataStream);
                processor.setTemplate(xslInStream);
                ByteArrayOutputStream pdfOutStream = new ByteArrayOutputStream();
                processor.setOutput(pdfOutStream);
                byte outFileTypeByte = FOProcessor.FORMAT_PDF;
                processor.setOutputFormat(outFileTypeByte); //FOProcessor.FORMAT_HTML
                processor.generate();
                dataBytes = pdfOutStream.toByteArray();
            } catch (XDOException e) {
                e.printStackTrace();
            return dataBytes;
    Appreciate your help.
    Thanks,
    Abhijit

    Fonts are defined in the template you use to generate the pdf. Your application add the data and both is processed yb the FOP processor. Now there are two possible causes of the '???' :
    1. the data you sent to the template contains the '???' already
    2. the template can't digest the data (the special characters) and puts '???' in the pdf.
    Before going on you have to find out which one is your problem. The 2nd is the problem you better ask this in a FOP forum as you have to solve it by changing the template.
    Timo

  • Search for users and non-ASCII characters

    I am having a little issue with the "Accounts - Find Users" functionality. The search breaks on what I assume is non-ASCII characters (we use the following three up here in Denmark: �, �, �). To be precise, I have a user with the first name "J�rgen". Searching for first names starting with "J" works just fine but "J�" returns zero matches.
    My setup is with two machines, one (A) holding the MySQL database and one (B) serving Identity Manager on top of tomcat.
    Both A and B are RHEL boxes, and both have da_DK.UTF-8 as default locale.
    MySQL's /etc/my.cnf file has the following entry (as recommended in create_waveset_tables.mysql):
    [mysqld]
    default-character-set=utf8
    default-collation=binFor clarity, some functionality works just fine in Identity Manager with these non-ASCII characters such as adding a user whose name contains non-ASCII characters (not only ��� but also � for example). At the moment, it appears to be the search functionality which is not working correctly as I would expect it to. I'm still on the fence concerning whether I've missed something in terms of configuration, or whether this is a limitation.
    Does anyone know whether this problem is on my side or the software's side?

    I am having a little issue with the "Accounts - Find Users" functionality. The search breaks on what I assume is non-ASCII characters (we use the following three up here in Denmark: �, �, �). To be precise, I have a user with the first name "J�rgen". Searching for first names starting with "J" works just fine but "J�" returns zero matches.
    My setup is with two machines, one (A) holding the MySQL database and one (B) serving Identity Manager on top of tomcat.
    Both A and B are RHEL boxes, and both have da_DK.UTF-8 as default locale.
    MySQL's /etc/my.cnf file has the following entry (as recommended in create_waveset_tables.mysql):
    [mysqld]
    default-character-set=utf8
    default-collation=binFor clarity, some functionality works just fine in Identity Manager with these non-ASCII characters such as adding a user whose name contains non-ASCII characters (not only ��� but also � for example). At the moment, it appears to be the search functionality which is not working correctly as I would expect it to. I'm still on the fence concerning whether I've missed something in terms of configuration, or whether this is a limitation.
    Does anyone know whether this problem is on my side or the software's side?

  • CMSDK Non-ASCII Characters and WebFolders

    Hi,
    i have the follow problems with the CMSDK and Microsoft.
    windows-explorer:
    It is impossible to enter a folder that contains non-ascii characters in the name.(the clientrequest will never send.)
    After the doubleclick on the foldername, i get a errormessage an then the url in the editbox is ISO-8859-1 encoded, but this url will never send to the sever.
    Other operations like create, rename, ... have no problems with non-ascii chars.
    with the IE, i can enter without a problem.
    MS Word:
    I can't save any file with non-ascii chars in the name.
    Only this methods are send:
    PROPFIND
    PROPFIND
    GET
    GET
    But never a "PUT", without a non-ascii char in the name the traffic looks like this:
    PROPFIND
    PROPFIND
    GET
    GET
    PROPFIND
    LOCK
    PUT
    It is also impossible to enter a folder containing non-ascii characters in the name, form the word filesave-dialog.
    URL UTF-8 encoding is enabled in the IE options and other operations (MOVE,COPY) are send correctly UTF-8 encoded.
    is there any solution?
    thanks
    Maik

    Have you set the following DAV Server configuration property:
    IFS.SERVER.PROTOCOL.DAV.Webfolders.DefaultCharset
    for your domain?
    You have to configure it for the character set that you want your clients to use when connecting to iFS via WebDAV.
    (You can use the web admin tool to change this property.)
    The reason for this is that the Microsoft WebFolders client software does not transmit the client's character set to the server, so the server has no way of knowing what to expect.

  • Inserting strings of printable and non printable characters

    I would very much appreciate some help with the following
    To handle an interface with a legacy system I need to create strings containing both printable and non-printabel ascii characters. And with non printable characters I mean in particular those in the range of ASCII 128 to 159.
    It seems it is not possible to insert a string containting both printable and not printable characters from the afore mentioned range into a VARCHAR2 table column as the following demonstrates:
    insert into test values(chr(156)); -- this inserts the 'œ' symbol.
    SQL> select test, ascii(test), length(test), substr(test,1,1), ascii(substr(test,1,1))from test;
    TEST       ASCII(TEST) LENGTH(TEST) SUBSTR(TEST,1,1) ASCII(SUBSTR(TEST,1,1))
    ┐                  156            1That the the character mapped is shown as '┐' and not 'œ' is not really issue for my application, what is important is that the ASCII value is shown as 156, which is the ASCII code of the character I inserted.
    What is however strange (actually probably not strange but has to do with the lack of understanding of the issue at hand) is that substr returns an empty string...
    Now I try to insert a concatenated string, first the "non printable" character then a printable character
    insert into test values(chr(156)||chr(65));
    SQL> select test, ascii(test), length(test), substr(test,1,1), ascii(substr(test,1,1))from test;
    TEST       ASCII(TEST) LENGTH(TEST) SUBSTR(TEST,1,1) ASCII(SUBSTR(TEST,1,1))
    A                   65            1 A                                     65For some reason the not printable character (chr(156)) is now not inserted or at least does not appear when I selected the data from the table, this effect seems to apply to all characters in the range of ASCII 128 to 159 (tried some but not all) However for instance CHR(13) can be inserted as part of a string as shown above .
    For our application I really don't care much what character is shown or not show, what is important is that I can retrieve the ASCII value and that this value matches the one I inserted which for some reason does not seem to work.
    This seems to be, at least to some extent a character set issue. I have also tested this on a database with character sets set as follows
    NLS_CHARACTERSET
    WE8MSWIN1252
    NLS_NCHAR_CHARACTERSET
    AL16UTF16
    With WE8MSWIN1252 the described issue does NOT occur, however unfortunately I must use NLS_CHARACTERSET AL32UTF8 which produces the results as described above!
    As said any insights would be much appreciated as I am slowly but surely starting to despair.
    For completions sake, character sets are set as follows (changing it is NOT an option):
    NLS_CHARACTERSET
    AL32UTF8
    NLS_NCHAR_CHARACTERSET
    AL16UTF16
    The test table is created as follows
    CREATE TABLE TEST
    TEST VARCHAR2(1000 BYTE)
    Database Version 11.2.0.3.0
    Edited by: helios.taraba on Dec 2, 2012 10:18 AM --Added database version
    Edited by: helios.taraba on Dec 2, 2012 10:24 AM Added description of test results using NLS_CHARACTERSET WE8MSWIN1252

    Hello Orafad,
    Thanks for your reply, at least I understand the effects I'm seeing i.e.
    +"For multibyte character sets, n must resolve to one entire code point. Invalid code points are not validated, and the result of specifying invalid code points is indeterminate."+
    http://docs.oracle.com/cd/E11882_01/server.112/e26088/functions026.htm
    You are absolutely right I could use chr(50579) to get the ligature symbol. However as what we are trying to achieve is to implement a legacy interface to a 20+ years old subsystem we are actually not so much interested in the symbol itself but rather in the ascii value of that symbol (156 as you so rightly point out in the win-1252 characterset), this particular field represents the lenght of the message being sent to the subsystem and can vary from decimal 68 to 164 and is also considered in a checksum calculation which is part of the message.
    As changing the nls_characterset of the database is not an option I guess I only have one reasonable avenue to resolve this namely to push the functionality to added the "encoded" length of the message (and the calculation of the checksum) to the java driver which is responsible for sending the message (tcp/ip) to the subsystem. Here we should not have any issues adding a byte with the value 156 (or any other for that matter) to the datastream.
    Thankfully all other fields have characters with ascii values below 128 and above 31.
    I'm going to leave my question as un-answered for a bit longer in the hopes of someone coming up with a golden bullet, although not getting my hopes up.
    Thanks, Helios

Maybe you are looking for