Replace non-english characters function

Hi folks,
I have a text which includes non english characters. Is there any trick, how can I replace those characters with "closest" english character?
Examples:
"Hytölä"  to become "Hytola"
"Säynatsälo" to become "Säynatsälo"
etc ...
I was thinking about usage of REGEXP
select regexp_replace('Hytölä Säynatsälo ', '[^0-9A-Za-z]', '') from dual
but a pattern is not correct.
Any suggesitons?

There is something that smells like a hack for me (source: replace characters with accent with their base letter)
However
with data as (
select 'Hytölä' str from dual
union all
select 'Säynatsälo' from dual
select
  str
,utl_raw.cast_to_varchar2(nlssort(str, 'NLS_SORT=BINARY_AI')) nstr
,length(utl_raw.cast_to_varchar2(nlssort(str, 'NLS_SORT=BINARY_AI'))) l
from data
STR
NSTR
L
Hytölä
hytola
7
Säynatsälo
saynatsalo
11
Notice that change in length through an extra null bit at the end of the strings.
And the loss of the uppercase.
For this kind of questions it's helpful to know about the requirements. Why there shuóuld be a baseletter conversion? For search purposes for example.
not to forget the db characterset.

Similar Messages

  • Removing non-english characters

    Hi,
    I'm trying to define a regular expression that helps me to replace non-english characters from a string.
    For example:
    BESANÇON
    and I need to get something like: BESANCON, or BESAN*ON.
    Could any one give me some hints?
    Max A.

    You can use the convert function:
    SELECT CONVERT('BESANÇON','US7ASCII')
    FROM dual;
    CONVERT(
    BESANCON
    1 row selected.

  • Word Replacements for Non- English Characters

    Hi
    Does anyone have an idea on implementing Word Replacements for non- english characters in TCA- DQM 11i.
    We are trying to identify, capture and cleanse common accented characters like à, â , ê
    However, the default language for replacement is American English , So even if we add these in the existing lists it will not take any effect
    Is creating a new Word replacement list for every language the solution ?? any patch recommendations???
    Thanks in advance

    It seems that this is an issue that has popped up in various forums before, here's one example from last year:
    http://forum.java.sun.com/thread.jspa?forumID=16&threadID=490722
    This entry has some suggestions for handling mnemonics in resource bundles, and they would take care of translated mnemonics - as long as the translated values are restricted to the values contained in the VK_XXX keycodes.
    And since those values are basically the English (ASCII) character set + a bunch of function keys, it doesn't solve the original problem - how to specify mnemonics that are not part of the English character set. The more I look at this I don't really understand the reason for making setMnemonic (char mnemonic) obsolete and making setMnemonic (int mnemonic) the default. If anything this has made the method more difficult to use.
    I also don't understand the statement in the API about setMnemonic (char mnemonic):
    "This method is only designed to handle character values which fall between 'a' and 'z' or 'A' and 'Z'."
    If the type is "char", why would the character values be restricted to values between 'a' and 'z' or 'A' and 'Z'? I understand the need for the value to be restricted to one keystroke (eliminating the possibility of using ideographic characters), but why make it impossible to use all the Latin-1 and Latin-2 characters, for instance? (and is that in fact the case?) It is established practice on other platforms to be able to use characters such as '�', '�' and '�', for instance.
    And if changes were made, why not enable the simple way of specifying a mnemonic that other platforms have implemented, by adding an '&' in front of the character?
    Sorry if this disintegrated into a rant - didn't mean to... :-) I'm sure there must be good reasons for the changes, would love to understand them.

  • Replacing any non english Characters

    How can I Replace any non english characters I have alot of the characters that look like a block.
    --John                                                                                                                                                                                                                   

    Probably the easiest way to code would be to convert the string to a byte array and back again using the ASCII character encoding. That should give you ? for any non ASCII characters.
    Something like;
    String newString = new String(oldString.getBytes("ASCII"), "ASCII");

  • Removing non-English characters from data.

    Ours is global system with some data with non-English characters. We want to download file by removing this non-English characters.
    Any suggestions how we can remove these non-English characters from file..?

    The FM u said
         Replace non-standard characters with standard characters
       Functionality
         SCP_REPLACE_STRANGE_CHARS processes a text so that it only contains
         simple characters. Special characters and national characters are
         replaced in such a way that the text remains reasonably legible.
         The character set 1146 is used by default. In this case the following
         replacements are made, for example:
          Æ ==> AE        (AE)
          Â ==> A         (Acircumflex)
          Ä ==> Ae        (Adieresis)
          £ ==> L         (sterling)
         Note that the new text can be longer than the old.
    So i dont think it ll be useful for eliminating the sp. chars.
    U have to check each and every alphabet with std 26 alphabets
    Thanks & Regards
    vinsee

  • Odd number of non-english characters get broken in windows-chrome and ff

    I developed jnlp applet which prints out the user input.
    When I put odd number of non-english characters(eg: chinese), chrome and firefox browser prints out the last character as question mark.
    input : 가
    output : 가��
    I checked on java console that the character is correct.
    It must be bug in communication of applet to chrome browser.
    IE prints out correctly.
    I can resolve the issue by appending white space on applet and remove it on java script.
    Anyone has any clue on the issue?
    Codes are as follows.
    MainApplet.Java
    public class MainApplet extends JApplet implements JSInterface{//, Runnable {
         public int stringOut(String sData) {
              OutData = sData;
              return 0;
    js File
    function TSToolkitRealWrapper ()
         var OutData;
         var OutDataNum;
    var TSToolkit = new TSToolkitRealWrapper();
    var attributes = { id:'TSToolkitReal',code:'tradesign.pkitoolkit.applet.MainApplet', width:100, height:100} ;
    var parameters = {jnlp_href: getContextPath() + '/download/pkitoolkit.jnlp',
                         separate_jvm:true, classloader_cache:false} ;
    TSToolkitRealWrapper.prototype.stringOut=function(str)
              var      nRet = TSToolkitReal.stringOut(str)     ;
              this.OutData= TSToolkitReal.OutData;
              return      nRet;
    HTML
    <SCRIPT language=javascript>
    <!--
    function StringOut(form)
         var data = form.data.value;
         var nRet = 0;
         var base64Data;
         nRet = TSToolkit.stringOut(data);
         if (nRet > 0)
              alert(nRet + " : " + TSToolkit.GetErrorMessage());
         else
              form.data1.value = TSToolkit.OutData;
    -->
    </SCRIPT>
    Edited by: user13496918 on 2013. 3. 20 오후 7:29
    Edited by: user13496918 on 2013. 3. 20 오후 7:39
    Edited by: user13496918 on 2013. 3. 20 오후 9:17
    Edited by: user13496918 on 2013. 3. 20 오후 9:18

    I checked on java console that the character is correct.So it isn't a Java problem.
    It must be bug in communication of applet to chrome browser.So tell the people who make the Chrome browser.
    IE prints out correctly.That's a change. I've just spent nine days tracking down an IE applet problem and I'm not finished yet.
    Please omit the boldface next time. We can read. Boldface doesn't help; it makes it worse.

  • Only VBA does not recognize non-English characters

    Hello guys,
    I have a new laptop with Windows 8.1 bought in the USA and I'm having a difficulties with Excel VBA (Office 365 University-64x bought in the Czech Republic - Central Europe). The VBA does not recognize non-English characters (particularly "ř" and
    "ů") which causes me problem when running some codes that I wrote earlier on my previous laptop (Windows 7, bought in the Czech Republic with the same Office). 
    The problem with non-English characters has occurred only in VBA so far, otherwise I can use these characters normally in Excel cells, Word... I tried to install both English and Czech version of the Office with no change, I also installed Czech proofreading
    tools and set everything to Czech in the Office. The location and language preferences in the Windows are also set up to Czech. And it is not a problem of a font. I also mentioned that when I tried to look up these characters, using Ctrl+F, it changes
    original ř to r after a search and again this is only an issue of the VBA.   
    Thank you very much for any help.
    Tom

    Hi Tom,
    VBA for Excel can only recognize ASCII code from 0 to 255, if you use other special characters like "ř" or "ů", it will returns 63(?) to you. To use this kind of characters, you have to utilize ChrW function to parse a decemal to the
    character.
    http://msdn.microsoft.com/en-us/library/ee177465.aspx
    for example, the hex code and dec code for these two characters are as below:
      Hex   Dec
    ř 159   345
    ů 016F  367
    So to get these two characters in VBA, you could code as below:
    ChrW(&H159) or ChrW(345)
    ChrW(&H16F) or ChrW(367)
    You can get the hex code of the character by searching in the system character map(in the Win8.1 start view, search "character map"), then convert the hex code to decimal code by yourself.
    Range("A1").Value = ChrW(&H159) & ChrW(&H16F)
    Range("A1").Value = ChrW(345) & ChrW(367)
    We are trying to better understand customer views on social support experience, so your participation in this interview project would be greatly appreciated if you have time. Thanks for helping make community forums a great place.
    Click
    HERE to participate the survey.

  • Why Rpad does not work in non-English characters...????

    Ηι ,
    I have the classic table of emp in user scott....
    When i insert another row containing non-English characters the rpad function does not work......
    SQL> select ename , rpad(ename,12,'.') from emp;
    ENAME                          RPAD(ENAME,12,'.')
    SMITH                          SMITH.......
    ALLEN                          ALLEN.......
    WARD                           WARD........
    JONES                          JONES.......
    MARTIN                         MARTIN......
    BLAKE                          BLAKE.......
    CLARK                          CLARK.......
    KING                           KING........
    TURNER                         TURNER......
    JAMES                          JAMES.......
    FORD                           FORD........
    MILLER                         MILLER......
    SCOTT                          SCOTT.......
    ADAMS                          ADAMS.......
    ΠΑΝΑΓΙΩΤΟΥ ΠΑΝΑΓΙWhen i convert it to English characters then it works....
    PANAGIOTOU PANAGIOTOU..
    How can i make it to work....????
    I use 10g v.2
    Many thanks...
    Sim

    Hi ,
    SQL> select length('ΠΑΝΑΓΙΩΤΟΥ') from dual;
    LENGTH('ΠΑΝΑΓΙΩΤΟΥ')
                      10
    SQL> select vsize('ΠΑΝΑΓΙΩΤΟΥ') from dual;
    VSIZE('ΠΑΝΑΓΙΩΤΟΥ')
                     20When i issue the command
    SQL> select ename , rpad(ename,25,'.') from emp;
    ENAME                          RPAD(ENAME,25,'.')
    SMITH                          SMITH....................
    ALLEN                          ALLEN....................
    WARD                           WARD.....................
    JONES                          JONES....................
    MARTIN                         MARTIN...................
    BLAKE                          BLAKE....................
    CLARK                          CLARK....................
    KING                           KING.....................
    TURNER                         TURNER...................
    JAMES                          JAMES....................
    FORD                           FORD.....................
    MILLER                         MILLER...................
    SCOTT                          SCOTT....................
    ADAMS                          ADAMS....................
    ΠΑΝΑΓΙΩΤΟΥ                     ΠΑΝΑΓΙΩΤΟΥ.....It worked.... setting as 25 characters for padding.....
    Thanks....for the useful remark
    Sim

  • Non English characters conversion issue in LSMW BAPI Inbound IDOCs

    Hi Experts,
    We have some fields in customer master LSMW data load program which can
    contain non-English characters. We are facing issues in LSMW BAPI
    method with non-English characters Conversion. LMSW steps read and
    conversion are showing the non-English characters properly with out any
    issue. While creating inbound IDOCs most of the non-English characters
    replaced with '#' and its causing issues in creating customer master data in
    system. In our scenario customer data with non-English characters in
    the first name, last name and address details. Any specific setting
    needs to be done from our side? Please suggest me to resolve this issue.
    Thanks
    Rajesh Yadla

    If your language is a unicode tehn you need to change the options  like IN SAP you need to change it to unicode  in the initial screen Customize local layout(ALT F12) options 118  --> Encoding ....

  • Display non-english characters in its own corresponding language in excel

    Hello Experts,
    I have description texts in chinese and other languages which is visible properly in the debugger in my internal table.
    After downloading the data into an excel sheet into my file path, when opened the non-english description is displayed as ####
    Please help me in displaying the non-english descriptions in the excel sheet in its own corresponding language.
    Note:  Function module used : GUI_DOWNLOAD
                 File type assigned       : 'ASC'
    Edited by: keerthi shanker on Mar 14, 2008 11:02 AM

    Hello Vasanth,
    Please explain about what did you mean by 'Last Button in SAP screen'
    Well, to re-iterate my problem, I have data retrieved from SAP database that has values of multi languages which is displaying properly in the internal table as checked in the debugger.
    After the execution of FM 'GUI_DOWNLOAD', when i open the file from my desktop, the non-english characters like the chinese and japanese are each character is displaying in HASH symbol.

  • Predictive text non-English characters to be made ...

    I just filled an enhancement request on this feature, please vote for it HERE:
    Predictive text non-English characters to be made optional
    The story is: while using predictive text for non-English languages (Polish in my case) the dictionary words are grammar correct which include special characters like: ą,ę,ć,ś,ż,ź,ó,ł etc. For texting (SMS) operators count these as 3 characters making a message much longer than it looks. Therefore I can tell you no one uses these characters while texting and people use EN only characters instead a,e,c,s,z,o,l... which makes using the predictive text useless for eg Polish language.
    I'd like to have an option to switch using these non-en chars off for predicting text, which is grammatically not correct but in real life that's how people type.
    So basically if there's an option to disable lang specific characters I would be getting an example suggestion of 'Prosze' instead of grammatically correct 'Proszę'. 'Prosze' is a 6 character word, 'Proszę' is a 5+3=8 character word. Considering a single SMS message a 300 chars, than it really makes a difference.
    Simple solution would be to replace every char ą with a, ć with c, ó with o etc... in each word suggested for the ones who have this option enabled.

    HI,
    You can write a code in PAI of main screen. there by using loop at screen you can make that field editable or disabled.
    Code sample:
    loop at screen.
    ****condition for value check
    if screen-name = 'TEXT_EDIT_NAME'
    screen-output = 1.
    screen-input = 0.
    modify screen.
    endif.
    endloop.
    Hope this will help you.

  • SetMnemonic for non-english characters

    Does anybody knos how to set JButtons mnemonic for non-english characters?
    My mnemonic is loaded from a resource bundle, and in the documentation the setMnemonic(char) is only limited to english and it is written that the user should call setMnemonic(int) instead.
    So what value should this int contains in order to display the non-english char which is loaded from resource bundle?
    Thanks in advanve,
    Hanoch

    It seems that this is an issue that has popped up in various forums before, here's one example from last year:
    http://forum.java.sun.com/thread.jspa?forumID=16&threadID=490722
    This entry has some suggestions for handling mnemonics in resource bundles, and they would take care of translated mnemonics - as long as the translated values are restricted to the values contained in the VK_XXX keycodes.
    And since those values are basically the English (ASCII) character set + a bunch of function keys, it doesn't solve the original problem - how to specify mnemonics that are not part of the English character set. The more I look at this I don't really understand the reason for making setMnemonic (char mnemonic) obsolete and making setMnemonic (int mnemonic) the default. If anything this has made the method more difficult to use.
    I also don't understand the statement in the API about setMnemonic (char mnemonic):
    "This method is only designed to handle character values which fall between 'a' and 'z' or 'A' and 'Z'."
    If the type is "char", why would the character values be restricted to values between 'a' and 'z' or 'A' and 'Z'? I understand the need for the value to be restricted to one keystroke (eliminating the possibility of using ideographic characters), but why make it impossible to use all the Latin-1 and Latin-2 characters, for instance? (and is that in fact the case?) It is established practice on other platforms to be able to use characters such as '�', '�' and '�', for instance.
    And if changes were made, why not enable the simple way of specifying a mnemonic that other platforms have implemented, by adding an '&' in front of the character?
    Sorry if this disintegrated into a rant - didn't mean to... :-) I'm sure there must be good reasons for the changes, would love to understand them.

  • Download none-English characters from tables

    Hi guys,
    I use apex 4.1.1 multilingual - with oracle xe 11g R2.. with listener 1.1.3 deployed on Glassfish 3.1.1
    I have Arabic data stored in my tables. if I go to Object Browser and try to download it, I get question marks instead of the letters....
    I have the same problem if I want to download the data of IR...
    Can you help please ??
    Regards,
    Fateh

    The FM u said
         Replace non-standard characters with standard characters
       Functionality
         SCP_REPLACE_STRANGE_CHARS processes a text so that it only contains
         simple characters. Special characters and national characters are
         replaced in such a way that the text remains reasonably legible.
         The character set 1146 is used by default. In this case the following
         replacements are made, for example:
          Æ ==> AE        (AE)
          Â ==> A         (Acircumflex)
          Ä ==> Ae        (Adieresis)
          £ ==> L         (sterling)
         Note that the new text can be longer than the old.
    So i dont think it ll be useful for eliminating the sp. chars.
    U have to check each and every alphabet with std 26 alphabets
    Thanks & Regards
    vinsee

  • Non-English characters

    Hello, I have read several times that since Java uses Unicode, it solves the problems of non-English characters automatically or something like that.
    But my app is not working as expected. Would someone help please?
    I have a client/server combo written in Java. The server can send messages in English or Japanese. The Japanese messages are hard-coded as String literals in the server source code. On the client side, they are displayed on a JEditorPane. But the Japanese characters are all garbled. The OS on the server side and client side are, of course, different.
    My supposition, which is obviously wrong as it is not working, is that since both ends of communication are Java app, I need not worry about any encoding conversions for String literals.
    Suggest me what is wrong here?

    How is the required encoding/decoding supposed to be done?
    When I didn't worry about non-English characters, I did the following, which WORKED.
    // SENDER side
    Socket socket ;
    PrintWriter     out = new PrintWriter(socket.getOutputStream(),true);
    String outMessage = "my message";
    out.println(outMessage);//RECEIVER
    Socket socket ;
    BufferedReader in = new BufferedReader(new InputStreamReader(socket.getInputStream()));
    String inMessage =  in.readLine();When non-English characters are involved, I did the following, which DID NOT WORK. Please someone correct me.
    // SENDER side
    Socket socket ;
    PrintWriter     out = new PrintWriter(socket.getOutputStream(),true);
    String outMessage = "my message";
    String utfString = new String(outMessage.getBytes(),"UTF-8");
    out.println(utfString);//RECEIVER
    Socket socket ;
    InputStreamReader ins = new InputStreamReader(clientSocket.getInputStream(),"UTF-8");
    BufferedReader in = new BufferedReader(ins);
    String inMessage =  in.readLine();The received message is still garbled.

  • Non english characters in DN cannot be retrieved

    We are using Netscape directory server 4, protocal V3. We have a problem related to non-english characters appearing in RDN.
    We publish to Ldap entries using the values from database. For example, we have pubulished an entry to Ldap, based on DB values, the entry should have a DN like: ou=Liege BELGIUM ... LGG1a, <other components of DN>. However, when we call netscape search API (search against uid attribute which does not have non-english characters), the search return the entry, but when further call getDN() method on the returned Ldap Entry, it only returns Li, instead of the complete DN value.
    It seems the entry is corrupted in Ldap. I wanted to delete the corrupted entry and re create new one to test. I tried many ways, but none of them worked, I think it is because DN is corrupted, there is no key value to identify the Ldap entry for any operation(modify, delete).
    You help and insights are much appreciated.
    Thanks.
    Han Shen

    LDAP uses the UTF8 encoding. You must store data in the directory using the UTF8 encoding. This includes DN values. This also means that if you want to be able to view the values in your native character set and font, you must use an application that can convert the UTF8 LDAP data back to the native character encoding. The directory console by default should work for LATIN-1 (ISO 8859) languages if the LOCALE is set correctly.

Maybe you are looking for