Cyrilic to Latin

Is there any way to convert Cyrillic data into Latin?
For example i want to convert :
б то b, п то p, у то u ...
Thanks!

Can you be a little more specific about what you want / what you're thinking is?
I presume you don't want to translate Russian into the language of the Romans ;)

Similar Messages

Cyrillic Decoding To Latin

I have to admit, I'm completely at a loss on this. I thought I understood what I was doing, but I don't.
I have a text (htm) file downloaded off the web. When viewed in a browser, it appears with Russian characters. According to the meta-data on the page, it uses Windows-1251 to display.
I also have a list of the characters and what their Latin equivalents are. I had initially thought I could simply parse through the file, and replace the Russian characters with the Latin equivalents, but that does not work. Usually the characters just get replaced with a ?.
I was relying on Java being Unicode compliant for it to interpret the non-Ascii characters correctly, but I've since found that the whole characters sets and Unicode stuff is amazingly huge and complex.
I've tried different approaches, but can't seem to get my head around what is actually happening. I keep bouncing between settings in my IDE (Eclipse), the Console, the Input Stream, the Output Stream, etc. I've tried the following charsets: UTF-8, UTF-16, ISO_8859-5, Windows-1251.
1) Can conversion from Cyrillic to Latin be done purely using Character Set encoding and decoding, or does a find and replace methodology still stand?
2) What character set should I be using to pick up the source file? (I viewed the source using Word/Notepad and got Scandinavian characters instead of Russian, so I can't even tell what character set the file itself is actually storing this in).
3) Do I need to specify a particular character set for the output, either to the file or the console?
The code below is the last attempt I made at this. Any help is greatly appreciated.
Rob.
import java.io.*;
import java.util.HashMap;
import java.util.Set;
import java.util.Iterator;
public class textrep
     //Snip
     public static void main(String[] args)
                                        //Snip
                    try
                         BufferedReader in = new BufferedReader(new InputStreamReader(new FileInputStream(source), "UTF-16"));
                         StringBuilder s = new StringBuilder();
                         int temp;
                         while ((temp = in.read()) != -1)
                              s.append(russian((char) temp));
                         in.close();
                         BufferedWriter out = new BufferedWriter(new FileWriter(dest));
                         out.write(s.toString());
                         out.flush();
                         out.close();
                    catch (Exception e)
                                                       //Snip
     //Snip
     public static char russian(char c)
          update(c);
          switch (c)
               case '�':
               case '�':
               case '�':
               case '�':
                    return 'a';
                                               //Snip - Carries on for all other known characters.
          return c;
}Edited by: peridian on Feb 28, 2008 4:24 AM - Stripped out some code that was unneccessary (made it look weird on page)

peridian wrote:
I have to admit, I'm completely at a loss on this. I thought I understood what I was doing, but I don't.
I have a text (htm) file downloaded off the web. When viewed in a browser, it appears with Russian characters. According to the meta-data on the page, it uses Windows-1251 to display.
I also have a list of the characters and what their Latin equivalents are. I had initially thought I could simply parse through the file, and replace the Russian characters with the Latin equivalents, but that does not work. Usually the characters just get replaced with a ?.That suggests that when you read them in, you're using the wrong encoding. If you know what's in the file, your first step should be to read it in and see if your InputStreamReader is decoding to the correct values.
I was relying on Java being Unicode compliant for it to interpret the non-Ascii characters correctly, but I've since found that the whole characters sets and Unicode stuff is amazingly huge and complex.There's a lot of detail, but basically it isn't complex at all. A charset converts between Unicode characters (inside your Java code) and bytes (outside your Java code). There's just a large number of charsets that all implement that.
I've tried different approaches, but can't seem to get my head around what is actually happening. I keep bouncing between settings in my IDE (Eclipse), the Console, the Input Stream, the Output Stream, etc. I've tried the following charsets: UTF-8, UTF-16, ISO_8859-5, Windows-1251.
1) Can conversion from Cyrillic to Latin be done purely using Character Set encoding and decoding, or does a find and replace methodology still stand?You need find-and-replace. Character encodings are strictly for converting Unicode characters to bytes. Expecting to find an encoding that replaces "щ" by "shch" is absurd.
2) What character set should I be using to pick up the source file? (I viewed the source using Word/Notepad and got Scandinavian characters instead of Russian, so I can't even tell what character set the file itself is actually storing this in).It's HTML, right? Does it have a <META> tag that states a charset? If it doesn't, then the browser is getting the charset from one of the response headers. You can use Firefox's Live HTTP Headers add-on to see the headers.
However if you're seeing lots of Å in your text editor that suggests to me that the real encoding is UTF-8. That's just a guess though. Another thing you could try is to load your text file into a hex editor and (looking at what the browser shows) see if you can see the equivalent bytes for the Windows-1251 encoding.
3) Do I need to specify a particular character set for the output, either to the file or the console?If your output is all Latin, then any of the Latin-based charsets will do. Almost certainly your system's default encoding will be Latin-based.
The code below is the last attempt I made at this. Any help is greatly appreciated.In your switch statement I don't see anything that uses Cyrillic characters. The case you left in considers four possible characters, none of which is a Cyrillic character. What's that supposed to be for?

Flash chart doesn't display cyrillic characters

Hi !
Please give me you advise to solve my problem...
I tried to write X-axis labels on 2D-column chart using Cyrillic characters.
If I set "Labels Rotation" to a value more than 0 chart does not display labels. If labels consists of Cyrillic and Latin characters then only Latin characters are displayed.
Please help me to resolve this issue.
Best regards,
Renat.

That is possible.
The rotation functionality within html, if I understand it correctly, actually generates a graphic and rotates that rather than rotates the individual characters. It could be that this functionality just isn't designed for the characters you need.
This may be something you need to research on Adobe's website?
Andy

Safari doesn't display Cyrillic domain name correctly

Hello there,
I wonder why is that Safari doesn't display Cyrillic domain name with .укр TLD correctly, that is official Ukrainian Cyrillic TLD.
For example, if you visit the official website of the President of Ukraine, by typing: президент.укр in your Safari web-browser address bar instead you will see this: xn--d1abbgf6aiiy.xn--j1amh
On the other hand, if you visit the official website of the Russian president by typing the Cyrillic domain name with .рф TLD: президент.рф, it will be displayed correctly in Safari address bar.
For me it sounds as some form of discrimination, and I think it should be corrected ASAP.
Is there any solution to this?
This is an issue on all Safari web-browsers running on latest iOS of iPhones, iPads and iPods.

To ask Apple to fix this, you need to repost it at
http://www.apple.com/feedback
If I remember right, display of such domains in the original script may depend on whether the domain officials have policies in place to safeguard against potential confusion between names that look the same in cyrillic and latin. But it could also be a simple oversight.
Each browser may be different: Do Firefox and Chrome behave the same?

Cyrillic string conversion question

Hello,
First time here...
I would like to know if there is a way to convert a text string from cyrillic to latin. Should I use a function?
Edited by: user12099545 on 2009-10-22 5:38

I have insufficient privileges to create the function so I'm going to look for someone else to create it so that I can use it. I've added additional letters and now it looks like this:
CREATE FUNCTION cyrillic_to_latin_spelling(p_cyrillic VARCHAR2)
RETURN VARCHAR2 IS
TYPE v_spelling_type IS TABLE OF VARCHAR2(3) INDEX BY VARCHAR2(1);
v_spelling_tbl v_spelling_type;
v_str VARCHAR2(4000);
v_char VARCHAR2(1);
BEGIN
v_spelling_tbl(' ') := ' ';
v_spelling_tbl('А') := 'A';
v_spelling_tbl('Б') := 'B';
v_spelling_tbl('В') := 'V';
v_spelling_tbl('Г') := 'G';
v_spelling_tbl('Д') := 'D';
v_spelling_tbl('Е') := 'E';
v_spelling_tbl('Ж') := 'Z';
v_spelling_tbl('З') := 'Z';
v_spelling_tbl('И') := 'I';
v_spelling_tbl('Й') := 'I';
v_spelling_tbl('К') := 'K';
v_spelling_tbl('Л') := 'L';
v_spelling_tbl('М') := 'M';
v_spelling_tbl('Н') := 'N';
v_spelling_tbl('О') := 'O';
v_spelling_tbl('П') := 'P';
v_spelling_tbl('Р') := 'R';
v_spelling_tbl('С') := 'S';
v_spelling_tbl('Т') := 'T';
v_spelling_tbl('У') := 'U';
v_spelling_tbl('Ф') := 'F';
v_spelling_tbl('Х') := 'H';
v_spelling_tbl('Ц') := 'CH';
v_spelling_tbl('Ч') := 'CH';
v_spelling_tbl('Ш') := 'SH';
v_spelling_tbl('Щ') := 'SH';
v_spelling_tbl('Ъ') := 'A';
v_spelling_tbl('Ь') := '';
v_spelling_tbl('Ю') := 'IU';
v_spelling_tbl('Я') := 'YA';
FOR v_i IN 1 .. NVL(LENGTH(p_cyrillic), 0) LOOP
v_char := SUBSTR(p_cyrillic, v_i, 1);
v_str := v_str || v_spelling_tbl(v_char);
END LOOP;
RETURN v_str;
END
Edited by: harkon on 2009-10-23 0:07
Edited by: harkon on 2009-10-23 0:08

Cyrillic encoding,help?

Hi,Java Masters !!! I am sorry that I post this topic here but I cannot find another appropriate place.
I'm working on a WebApplication and I have to upload fie.Everything is ok,but after I upload file and try to downloaded the text in the file is encoded :( !!!
I'm new to Java and this is the my first time that I faced with reality of encoding :( . I need to read the files in Cyrillic and Latin. I used Oracle DateBase where every file is encoded with Win1251.This is how I put the the file in DB:
InputStream in = new FileInputStream(file);
stmnt.setBinaryStream(10,in, (int) file.length());
and this is the code in my servlet which I use to download the file:
response.setContentType("text/plain");
request.setCharacterEncoding("UTF-8");
response.setHeader("Content-Disposition", "attachment; filename=\""+ "MyFile" + ".txt\"");
response.getOutputStream().write(File);
response.getOutputStream().flush();

I try to do it like this:
this is how I put in in DB:
InputStream in = new FileInputStream(file);
stmnt.setBinaryStream(10,in, (int) file.length());
StringBuffer buff = new StringBuffer("cp1251");
and then I get in the servlet :
response.setContentType("text/plain");
request.setCharacterEncoding("cp1251");
response.setHeader("Content-Disposition", "attachment; filename=\""+ "MyFile" + ".txt\"");
response.getOutputStream().write(File);
response.getOutputStream().flush();
BUT NOTHING HAPPENS !!!! This is a very big shit !!!!! Please,help,guys !!!

How to show turkish caharacters in smart forms

Hi ,
i am unable to show the turkish characters in my smartform. Please suggest me the character set that i should be using.Is there any transaction for adding turkish to fontset in samrtstyles.
My pc supports turkish...
THanks in advance

Hi,
There is a number of SAP notes which regard printing the special characters. Look in SAP notes section, and look for "printing turkish" or similar query. There might be fonts designed for Turkish language (similarly, I found the suitable fonts to print cyrillic and latin Serbian).
Once you find the fonts, you have only solved the displaying part- you still need to find the suitable SAP device which can actually send the characters to the printer, in the right format. Look for that too, in the SAP notes section.
Regards,
SD.

Finder doesn't display spanish file name correctly ...

Hi all;
I have a Mac mini which access a share on a Network Drive with
all my media files.
When I access the share from my daughter's account (she is set as a user
with parental controls), I can see all the files with the correct names, i.e:
FERNÁNDEZ, etc.
For my account (I am set as an admin) it display the name
FERNæNDEZ (or something like that)
I believe the UNICODE settings are different from both accounts,
but I can't find where to change it. Any ideas?
How can I change the default setting?
Additional Info:
My settings shows (from a terminal window):
_CF_USER_TEXTENCODING=0x1F5:0:0
My Daughter's settings shows:
_CF_USER_TEXTENCODING=0x1F7:0:0
Thanks, this is driving me crazy.
JCC

To ask Apple to fix this, you need to repost it at
http://www.apple.com/feedback
If I remember right, display of such domains in the original script may depend on whether the domain officials have policies in place to safeguard against potential confusion between names that look the same in cyrillic and latin. But it could also be a simple oversight.
Each browser may be different: Do Firefox and Chrome behave the same?

Nodemanager doesn't display the domain name

Hi
Iam new to wlst scripting and concepts of nodemanager. My aim is to start the server using nodemanager through wlst.
First i invoked the wlst.sh shell script
then i started the nodemanager by using startNodeManager command
Then i tried to connect to nodemanager through nmconnect
Now if i type the command nm() it displays the following
Currently connected to Node Manager to monitor the domain {0}.
It doesn't display my domain name , instead displays {0}. I guess because of the domain name not set properly , i am not able to view the status of the admin server. For this iam using the command nmserverstatus("AdminServer") , it displays the status as UNKNOWN.
Can you please let me know what is that iam missing.

To ask Apple to fix this, you need to repost it at
http://www.apple.com/feedback
If I remember right, display of such domains in the original script may depend on whether the domain officials have policies in place to safeguard against potential confusion between names that look the same in cyrillic and latin. But it could also be a simple oversight.
Each browser may be different: Do Firefox and Chrome behave the same?

Font problem after reinstallation of the os

Hi
I reinstalled lion and after that I get glyphs when writing instead of letters. Does anyone have any solution ?
Thx

aleksandar2507981 wrote:
ok progress update, after i switched to serbian which is supposed to be cyrillic all is good i mean i have latin keyboard working properly...
It's working properly because when the option key is always depressed, the Serbian keyboard switches from cyrillic to latin.
Try an external keyboard. If it works ok with the layout set to US, then your internal keyboard may be damaged and need repair or replacement.

Changing default "other language" fonts

Hi,
I am a Thai mac user. I am pretty ok with Lucilda grande as my default system fonts. But when I switch to thai keyboard layout, the default Thai font for it is quite bad. Is there anyway to change this font? (I don't even know what font it is.) This font seems to be the default font for thai characters in all applications too.
Thanks

I am a Thai mac user. I am pretty ok with Lucilda
grande as my default system fonts. But when I switch
to thai keyboard layout, the default Thai font for it
is quite bad. Is there anyway to change this font? (I
don't even know what font it is.)
I think this font is Lucida Grande (which includes Thai as well as Hebrew, Cyrillic, and Latin). Surely you can tell by opening the Font Panel and switching among them. I think the only way you can choose a different default font is if the particular app you are using has a setting for that. For instance, Nisus Writer Express has preferences that tie keyboard layouts to fonts.

[iPhone] Letter list like in Contacts App

Seeking i18n experts - and if you don't know what 'i18n' is then you aren't one
The Contacts app shows the letters A-Z on the list of contacts whether the contact list uses all of those letters or not. For fun I switched my iPhone's language to Russian and the letters on the contact list changed to an abbreviated list of the Cyrillic and Latin letters. It was abbreviated since the combined length of the two alphabets is too long to fit. Again it showed letters I didn't have corresponding contacts for.
OK, with that background, here's my question. For a given device's language setting / locale, how do I create the appropriate list of letters as shown in the Contacts app?
I've looked at all the NSString methods, NSCharacterSet, NSLocale and none seem to provide this list. There is a way to get something called an 'exemplar' character set from a locale but there is no way to extract the characters from a character set.
Any ideas?
So far I've taken the simple way out of just grabbing the first letter of all the entries in my table but that leaves out the letters that have no corresponding entry. And I don't want to hard code A-Z because then the app ***** for non-Latin alphabetic users.
P.S. Returning my phone back to English was a challenge since I had to pick the right Settings app menus with nothing but Cyrillic looking back at me

Rick,
seems like those indices are present as localized plists in the AddressBook.framework (look for ABContactSections.plist).
I guess you could either duplicate those plists into your app or try to access them directly in the framework (which most likely violates the SDK agreement…).
Andreas

Wrong UserLocale settings for sr-Latn-RS

Hi,
when I setup my CustomSettings.ini and put this:
SkipLocaleSelection=YES
UserLocale=sr-Latn-RS
KeyboardLocale=241a:0000081a
I get this during OSD Wizard:
Time and currency format (Locale) field is empty.
When I change sr-Latn-RS to sr-Latn-BA (Which is for Bosnian language) I get this:
Deployed operating system (Windows 8.1) have Bosnian customization, instead of having Serbian customization as initialy wanted.
Is this some known bug or there is something else I can do?
MDT 2013 and latest ADK installed.

Hi Thomas, thank you for your input. I was aware of this problem and already implemented workaround but I think bug I have is another one, strangely coincides with the bug you're pointing.
The problem is that Serbia and rest of Balkan countries faced a lot of changes during last decade or so and with that a lot of international and standard wide codes has changed not to mention two alphabets we are using, Cyrillic and Latin :) I'm using the
latest available document
here
Hopefully someone from Microsoft may read this and maybe guide me in right direction.

Tagging woes

Hi,
I bought an album from the iTunes Store recently which has many of its track names transliterated from the original Cyrillic to Latin script. I find it a bit confusing so I'd like to retag them all at once but I'm not sure how to do it; the "Get Track Names" function says it only works with tracks imported into iTunes via CD. Is iTunes capable of retagging whole albums automatically through other means or should I find a third party program? Will a third party program even work with content purchased from the store?
I'm sure this will happen again in the future so if anyone has found a workaround please advise me. Manually re-tagging albums is tedious.

Hi,
I bought an album from the iTunes Store recently which has many of its track names transliterated from the original Cyrillic to Latin script. I find it a bit confusing so I'd like to retag them all at once but I'm not sure how to do it; the "Get Track Names" function says it only works with tracks imported into iTunes via CD. Is iTunes capable of retagging whole albums automatically through other means or should I find a third party program? Will a third party program even work with content purchased from the store?
I'm sure this will happen again in the future so if anyone has found a workaround please advise me. Manually re-tagging albums is tedious.

Cp855 encoding

I have Oracle 8.0 database and following situation: Only one column in a table contains cp855 encoded chars. I need to read that column in my selects. When I use select statement i get meaningless results. The database should not be changed in any way. May I have your advise please!
PS
I'm new guy here. Please have a patience if this is something trivial or already solved.

It's me again. I want to describe my solution of this problem. First of all I tried to use translate function but without meaningfull results. Not to mention bulky sintax of this function. I'm C# developer so finally I tried to solve this problem in C#. After I have found some table describing mapping of characters in cp899 codepage it seemed it would be trivial to solve this problem. Problem arises when I noticed that mentioned table contains only mappings for upper case letters. I have slowly identified all the needed letters (Numerous Debugging sessions). Now I have class library which translates cp899 to cyrillic or latin unicode form. The only problem lies in fact that creators of cp899 have used char 45 for Cyrillic letter G, but char 45 is also - (dash). So i have results with - (dash) translated to Cyrillic G. Greetings

Cyrilic to Latin

Similar Messages

Maybe you are looking for