Read Russian Characters and compare them.

All,
My task is to read an XML file and convert all of the Russian Characters in the file to latin characters. (e.g. д would be e and ж would be h)
When I read the Russian Characters I get wierd characters that look like ??. I tried to read the file in as UTF8 but now instead of the wierd characters I get ???? for the characters.
The input file looks like this.
<?xml version="1.0"?>
<russian>
     <para>
          <text>testing</text>
          <ru>&#1080;&#1076;&#1092;&#1088; &#1080;&#1076;&#1092;&#1088; &#1080;&#1076;&#1092;&#1088; &#1080;&#1076;&#1092;&#1088;</ru>
     </para>
     <para><ru>&#1092; &#1090; &#1074;&#1086;&#1076;&#1092;&#1083;&#1074; &#1072;&#1086;&#1092;&#1099;&#1074;&#1076;&#1072;&#1083;&#1092;&#1086; &#1099;&#1074; &#1092;&#1099;&#1076;&#1074;&#1078;&#1083;&#1086;&#1072; </ru></para>
     <ru/>
</russian>
When I read it in I get.
<?xml version="1.0"?>
<?xml-stylesheet type="text/css" href="test.css"?>
<russian>
     <para>
          <text>testing</text>
          <ru>???? ???? ???? ????</ru>
     </para>
     <para><ru>? ? ?????? ?????????? ?? ???????? </ru></para>
     <ru/>
</russian>
So I can't compare the russian characteres to convert them.
Is there anyway to read them in and perserve what they are. I don't care what they look like when the come in just that they are distinguishable so I can compare them and change them to the right latin character.
Thanks for the help.
Michael

Thanks for the reply.
Ok,
You asked for it so I will give it to you.
I took your advice and made sure that the encoding was correct. I resaved the xml file as a UTF-8 encoded page in notepad. (It was already UTF8 but I did it anyway) Dreamweaver confirms that the file is indeed of UTF8 encoding. So assuming that if I type in russian letters in notepad and save the file as a UTF8 file then the characters are UTF8.
So the input file is this.
<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/css" href="test.css"?>
<russian>
     <para>
          <text>testing</text>
          <ru>&#1080;&#1076;&#1092;&#1088; &#1080;&#1076;&#1092;&#1088; &#1080;&#1076;&#1092;&#1088; &#1080;&#1076;&#1092;&#1088;</ru>
     </para>
     <para><ru>&#1040; &#1099;&#1092;&#1083;&#1086; &#1092;&#1074;&#1086;&#1072; &#1092;&#1076;&#1083;&#1099; &#1072;&#1086;&#1078;&#1076;&#1072;&#1074; &#1087;&#1083;&#1086;&#1099;&#1076;&#1072;&#1074;&#1086; &#1087;&#1099;&#1072;&#1087; &#1086;</ru></para>
     <ru/>
</russian>
I read in the file using this code.
BufferedReader in = new BufferedReader(new InputStreamReader(new FileInputStream("test.xml"), "UTF8"));
               StringBuffer sb = new StringBuffer();
               while ((str = in.readLine())!= null){
                    sb.append(str);
                    sb.append(System.getProperty("line.separator"));          
               theString = sb.toString();
               System.out.println(theString);
The System.out.println(theString); outputs this.
<?xml version="1.0"?>
<?xml-stylesheet type="text/css" href="test.css"?>
<russian>
     <para>
          <text>testing</text>
          <ru>???? ???? ???? ????</ru>
     </para>
     <para><ru>? ? ?????? ?????????? ?? ???????? </ru></para>
     <ru/>
</russian>
Later on in my program I have an array that compares the russian to the russian character array.
char[] russianLett = new char[66];
russianLett[0] = '&#1040;';
russianLett[1] = '&#1066;';
russianLett[2] = '&#1042;';
ETC.... for all capital and lowercase russsian letters.
So it doesn't find ? in my array and thus returns ?.
Maybe you can se what is going wrong.
You can download the code from http://www.michaelsworld.us/misc/project.zip .

Similar Messages

  • How to read  *.pdf files and store them in a database?

    Dear programmers,
    I have problem with reading *.pdf files and store them in a database.
    can any one help me, please!
    Is it possible to read more than one file from the local system and store them in a database.
    thnaks in advance.
    bye

    What "problem" are you encountering?
    Depending on your choice of database software, it may or may not support the storage of binary large objects (BLOBs).

  • Need script to read a list and compare it to the contents of a folder...

    I need a script that will read all of the file names within a tab delimited list and then compare them to the files within a directory (named IMAGES) and delete the files of this IMAGES folder that are not included in the list.
    Thanks in advance.

    It won't work because of what you're comparing.
    Consider if files_list does not contain (image_file as string) then.
    Your script defines files_list as the contents of the specified file like:
    "file1.jpg
    file2.jpg
    file3.png"
    and image_file is a list of files which, when coerced to a string looks like:
    "Cabezon:Users:cochino:Desktop:images:file1.jpgCabezon:Users:cochino:Desktop:ima ges:file2.jpgCabezon:Users:cochino:Desktop:images:file3.png".
    There is no way files_list is going to be in image_file.
    Instead you're going to have to iterate through one or other of the lists and find matches.
    Since the list of files you want to keep is around 1,000 items long, and there are 6,000 files in the directory it would be quicker to iterate through the list of files to keep (a loop of 1000 iterations) than it would be to loop through the files to see which ones to delete (a loop of 6000 iterations).
    Therefore your best bet is something like this (untested):
    tell application "Finder"
    --define the source directory
    set sourceDir to folder "Cabezon:Users:cochino:Desktop:images:"
    -- create a temp dir for the files to keep
    set filesToKeepFolder to (make new folder at (path to temporary items) with properties {name:"TempImageDir"})
    -- get the list of files to keep
    set files_list to paragraphs of (read file "Cabezon:Users:cochino:Desktop:Workbook2.txt")
    -- iterate through them
    repeat with eachFile in files_list
    try
    -- move the file to the temp folder
    move file (eachFile as text) of sourceDir to filesToKeepFolder
    end try
    end repeat
    -- by the time you get here all the matched files
    -- have been moved to the temp dir
    -- so now you can throw away the rest of the files
    delete every file of sourceDir
    -- and copy the files you want back in
    move every file of filesToKeepDir to sourceDir
    -- and clean up
    delete filesToKeepDir
    end tell
    The comments should give you some idea as to what it's doing, but in short it walks through the list of files looking for matching files in the specified directory. If it finds a match it moves that file to a temporary directory. At the end of the 1,000 iterations, what's left in the directory are the files that did not match the file read in at the beginning, so these files can be deleted.
    Finally, to clean up, the files you want are copied back into the images directory and the temporary directory is deleted.

  • How to Read the String and break them in to TUPULES

    HI ALL,
    i have one String like THE STRING IS --- ---- --- >>
    3122078,12/12/2005
    3122079,12/1/1988
    3122076,12/12/1999
    I want to break them into STR :=3122078 and STR1:=12/12/2005 .

    SQL> select substr('3122078,12/12/2005',1,instr('3122078,12/12/2005',',')-1) from dual;
    SUBSTR(
    3122078
    SQL> select substr('3122078,12/12/2005',instr('3122078,12/12/2005',',')+1) from dual;
    SUBSTR('31
    12/12/2005
    SQL>

  • Xml reading specific node and putting them in hashtable

    Hi Friends
    I have to create program to print xml filenames and the id (which is a tag) inside the xml file.
    the filenames i have to print are the names of xml files themselves and then
    they have id taginside the file and there can be more then one id now I have
    to add all the id names and filenames to a hashtable. I don't know how to do this. Any help would be great.
    Hashtable should be like this
    id filename
    012125 hbn.xml
    012567 hbn.xml
    345669 xsf.xml
    Till now I hace made the code to create DOM for the files abd I can read the ID'd inside the files, But i do not know how to add id's with their files names in the hashtable.
    Part of my code lokks like this::
    NodeList idList = currDom.getElementsByTagName("ID");
    if( idList == null && idList.getLength() == 0 ) {
    System.out.print("LIST is empty ,return bad... " );
    } else {
    for(int i=0; i< idList.getLength(); i++ ){
    String strval = idList.item(i).getFirstChild().getNodeValue();
    int nodeval = Integer.parseInt(strval);
    System.out.println("grant id value is " + nodeval);
    any sample code will be great.
    Regards Preeti

    use put method of HashTable class.
    hashTable.put(key,value);
    here key and value both should be objects.
    so u can make id as key(u have to wrap it with Integer),and file name as the value.
    -seenu_ch

  • Reading NTFS permissions and changing them with PowerShell

    Hi,
    I have a large folder structure which contains the shares for several sites.  I've been asked to change the permissions for a group on each of these folders from 'full control' to 'read and execute' on the top level only.  My problem is that the
    name of the group to change is different on each folder.  They follow the same naming convention however which I've attempted to show in the example below.
    Folder1 has a group named FOL1-AdminUsers which has full control, there are several other administrative AD groups with permissions to the folder which must remain the same.  Similarly there is a Folder2 which has a group named FOL2-AdminUsers
    which needs to be changed and so on.
    The part of the script I'm having trouble with is reading the existing permissions from a specific folder and searching for the group I need to change.  Everything else has been fairly straight forward but I've just become completely stuck
    on this.  I'd really appreciate any help anybody could give me or if you could point me in the right direction for further assistance.     
    Many thanks,
    Gary.

    Hi Gary,
    you can read access permissions from a folder by using the Get-Acl cmdlet (Get-Acl "C:\ExampleFolder"). This will return an
    DirectorySecurity object. This comes with an Access CodeProperty that will return all permissions on the folder:
    $Acl = Get-Acl "C:\ExampleFolder"
    $Acl.Access
    It has many useful methods as well, so check out its members:
    $Acl | Get-Member
    Finally, there are useful tools for manipulating Acls, notably the official Set-Acl cmdlet or Rohn's AccessControl Module (Thanks Rohn, it's awesome) in the Gallery.
    If the module is a bit complex for you, there are some simple functions - shameless advertisement incoming - you could instead use: New-AccessRule and
    Add-AccessRule.
    Cheers,
    Fred
    There's no place like 127.0.0.1
    Thanks for the compliment!
    Gary, Rhys and Fred already mentioned that the info you're looking for is in the Access property when you use the built-in Get-Acl cmdlet. You could also use the Get-AccessControlEntry function from
    the module Fred mentioned:
    # List all ACEs for a single folder
    Get-AccessControlEntry C:\Folder
    # List all ACEs for specific principals (this example searches for two):
    Get-AccessControlEntry C:\Folder -Principal FOL*AdminUsers, AnotherUserNameHere
    # List ACEs for all subfolders (uses PSv3 syntax):
    dir C:\Folder -Directory -Recurse | Get-AccessControlEntry

  • Read numbers from a .txt file and display them in a graph

    How can I get Labview 7 to read from a txt. file containing a lot of
    coloumns with different datas? There`s only two of the coloumns that are
    interesting to me, the first, that contains the time of the measuring, and
    one in the middle, that contains the measured temperatures. I want Labview
    to read this datas and display them graphicly.
    Thanks from Stale

    Here's one way.
    You can also use the help-> find examples and search for "text".
    2006 Ultimate LabVIEW G-eek.
    Attachments:
    Graph.vi ‏21 KB

  • Read Date and compare

    Hello All,
    I have a text file on a user's system which has a date in the format mm-dd-yyyy i.e. for ex 04-16-2004 and I want to read this file and compare the month, day and year to another date say 04-13-2004 and if the former date is greater than the latter one do an action. Since both the above are strings how can I do that? Is there anything to convert a String to a date?
    TIA

    Calculating Java dates: Take the time to learn how to create and use dates

  • Counter and comparator circuit with hc393 and hc85

    I searched for a full schematic for building a comparator circuit and counter circuit that are connected to each other, using ics hc393 and hc85 but couldnt find any. Can san someone send me a schematic of this circuit?

    Hi aruwin,
    Please have a look at this schematic:
    HC393 ICs give us values from 0000 to 1111 (where QD is the most significant bit).
    HC85 ICs has two 4-bit inputs (where A3 and B3 and the most significant bits) and compares them giving three digital outputs:
    1. A > B
    2. A = B
    3. A < B
    You might ask why there are 3 more inputs (A > B_I, A = B_I, A < B_I). Those are needed if you are working with 8-bit numbers.
    In addition please have a look at this article for more detailed explanation: 
    http://www.electronics-tutorials.ws/combination/comb_8.html
    Kind Regards,
    Max
    Applications Engineer
    National Instruments

  • Reading lines and then comparing them

    Hi,
    I want to read two lines of a single file and then compare them. Right now, I've implemented:
    String line = null;
    String currentLine = null;
    String prevLine = null;
    BufferedReader bf = new BufferedReader(new FileReader(fileA));
    while ((line=bf.readLine()) != null)
                currentLine = line;
                prevLine = line;
            }I understand that this code returns the same line, but I was just hoping that maybe someone could point me in the right direction. Thanks in advance.

    800343 wrote:
    remember to use .equals() for equality comparisons. For comparing objects' states (contents), yes, absolutely. That's what it's for.
    The == tends to act funny, even in situations where it should obviously work.No it doesn't. It works exactly as it's supposed to. It compares the value on the left side to the value on the right side. Always. Whether those values are primitives or references, it behaves the same way.

  • "Example on reading combined signals from a serial port and separating them for display purposes". I am a beginner in Labview and would appreciate if anyone help with that

    I am working on a wireless vital sign monitor. I have 3 signals; heart rate and temperature. I filter and amplify the signals before converting them into digital form. I then pass them via MAX232 before passing them to RS232 serial cable.
    I am therefore working on a program to receive the combined signal and separate them.
    I have come across serial read and write examples on ni.com but am looking for one where I can actually separate combined signals and display them separately.

    Reading the serial port will give you a string. How you divide the channels depends on how the data was formatted before it was sent over the serial channel.
    If you are designing the instrument, as it seems from your query, then you can set up any form you wish. If your data is always floating point numeric, you could use space or tab characters to separate data words. You could use and XML format. If the instrument is provided by a vendor, contact them for the protocol.
    If tabs are used between words and returns between sets of readings, the resutlant string can be interpretted by the Spreadsheet String to Array function in LV.
    Things to avoid are characters often used by serial communications systems as control characters. Carraige returns are
    often used as command terminators by serial protocols, but may also be used by the port.
    Lynn

  • Browsers no longer able to read russian / cyrillic characters

    Ever since i upgrading to Snow Leopard just recently i noticed that my all browsers (safari, firefox, chrome etc) are unable to read russian / cyrillic characters. While i can type russian letters in the search bar or in pages / key notes for example i can not read or write in the browsers.
    As an example if i go to google.ru i can't read the writing ... I've uploaded a screenshot for you to see here: http://cl.ly/0g2u0e10363I2b1a0i1f/
    The text encoding is set how should be also.
    Please let me know if you can help ...

    As an example if i go to google.ru i can't read the writing ... I've uploaded a screenshot for you to see here: http://cl.ly/0g2u0e10363I2b1a0i1f/
    Very strange! There is of course one word which you can in fact read, namely веб at the upper left, the only one which is not a link of some sort. Could you provide another example where the site is showing text and not links?

  • Japanese and Russian characters are not displayed properly

    I have a pdf file which has both russian and Japanese characters in it. Before I install Japanese font pack, the russian characters were displayed properly but for Japanese characters Junk characters are displayed. But once  I installed Japanese font pack, Japanese fonts are displayed properly, but for Russian characters are displayed as junk characters. Please let us know if can display both font types simultaneously in a document or is there any work around for this ?

    What is your Reader version?
    If Reader X or earlier, did you also install the extended font pack?

  • MS Notepad unable to display the Chinese characters I type and display them as squares

    MS Notepad is unable to display the Chinese characters I type and display them as squares. But when I copy those squares on notepad to Wordpad or MS Word, they display the Chinese characters just fine. I've no idea why those Chinese characters I type can't display properly on notepad. I check the font of the notepad and it's the default. I've another Windows Vista desktop computer which has notepad of the similar setting and display Chinese characters just fine. Both are using Chinese (Simplified) - Microsoft Pinyin New Experience Input Style to input those characters. But I don't understand why my Windows 7 is facing this problem.

    Hi,
    Notepad is a very simple text editor BUT it will work if you use the SAME language in Windows. Please try:
    1. go to control panel, click "Clock, Language, and Region"
    2. click "Change location" under the "Region" section
    3. go to the "administrative" tab, then click "change system locale...", then select "Chinese".
    Regards.
    BH
    **Click the KUDOS thumb up on the left to say 'Thanks'**
    Make it easier for other people to find solutions by marking a Reply 'Accept as Solution' if it solves your problem.

  • Working with Chinese characters: How to paste and modify them as sentences; not as individual images?

    Working with Chinese characters: How to paste and modify them as sentences; not as individual images?
    From 2004, this is so far the most detailed answer I could find: http://en.allexperts.com/q/Adobe-Illustrator-1027/FONTS-display-problem-Adobe.htm, but is this really still the most up-to date solution?
    Is it possible at all to work with Chinese characters, if I don't have the Chinese version of Illustrator? (http://www.proz.com/forum/dtp_desktop_publishing/221125-chinese_in_indesign.html)

    Thanks for your reply.
    I'm using Version 17.1.0 (64-bit) - basically CS6.
    Operating system is Windows 8.1. Installed is the simplied Chinese language package.
    I have received the text in a Word document. It contains mixed characters, something like this:
    教授资料
    学术背景:
      博士;曾于波恩大学(Rheinische Friedrich-Wilhelms-Universität Bonn)、台湾师范大学、
    I have tried to paste it in the two following ways:
    1) to create a text box and paste the text. The result is that some of the characters are not showed correctly.
    2) Paste it directly without any preparation. This result into a single "image" or group of images. I can cancel modify each character one at a time, or I can dissolve the group, but the result is the same.

Maybe you are looking for

  • Ipod shuffle with permanently attached cable?

    I got this shuffle from a friend (im not sure what gen) and it has a cable attached to it that cannot be pulled out. the end of the cable has the connector to plug it into the bottom of an ipod or iphone... I don't see the point in this cable because

  • Connecting Ipod to PA System Amp

    I am a bit new to PA systems and amps, so I was wondering if the following setups would work a. Using a 1/4in to 1/4 in. MONO cable, connect one end into a channel on the amp and another end into the ipod using a 1/4 to 1/8 adaptor. b. Using a 1/8 to

  • Date Format in BW

    The date format accepted in the BW system is of the form yyyymmdd. But i have the flat file from the user which contain the below format : yymmdd. Kindly let me know how can i change the date format from yymmdd to yyyymmdd. Thanks in advance, Rachu

  • Do not see my iphone in itunes

    i wanted to sync my iphone again. it had connected once when i had recently purchased it but now itunes does not show the iphone. What should i do?

  • DreamWeaver Keeps Closing on opening

    Hi PLEASE can someone help. I have bought a new mac 2.3 notebook. Dreamweaver will not open at all. it just get to the opening page and then closes . the mac is asking do i want to cancle reopen ??? Please can someone help me. thanks for all your tim