Reading in Latin Extended-A character set from a text file

Hello all,
I am writing a small program that reads in a text file containing special characters (beyond the ASCII char set) and converting it into "regular" characters. For example I would read in a uaccent and replace it with a u.
Now I realize that Unicode support is built into Java from ground up but it goes only so far, you actually have to have the relevant character set to read it. My code is as follows:
InputStreamReader inStreamReader = new InputStreamReader(new FileInputStream("input.txt"), "ISO-8859-1");
BufferedReader bufferedReader = new BufferedReader(inStreamReader);
String line = null;
StringBuffer buff = new StringBuffer();
while((line = bufferedReader.readLine()) != null) {
char[] charArray = line.toCharArray();
for(int i = 0; i < charArray.length; i++) {
int x = (int)charArray;
switch(x) {
case 224: // this is agrave .. we need to replace it with a
buff.append('a');
break;
case 230: // this is aelig .. we need to replace it with ae
buff.append("ae");
break;
///////// and so on
Since I am reading in as ISO-8859-1, this works up to unicode 255. For the rest of the characters, apparently I need a Latin Extended-A and Latin Extended-B character set. How can I get that installed on my Windows OS machine? I am using jdk 1.4.1 on Windows XP. Any help is appreciated.
Thanks,
-vk4t

vkat wrote:
Since I am reading in as ISO-8859-1, this works up to unicode 255. For the rest of the characters, apparently I need a Latin Extended-A and Latin Extended-B character set. How can I get that installed on my Windows OS machine? I am using jdk 1.4.1 on Windows XP. Any help is appreciated.If your file has characters outside of 8859-1's range (0 - 255), then it isn't ISO-8859-1 encoded. You need to know what encoding was used to store the file. It sounds like you it actually may be Unicode text, in which case you need to know which encoding (UTF8, UTF16, etc) was used.

Similar Messages

  • Do you have any fonts with the Latin Extended Additional character set?

    I would like to know what fonts in the Adobe catalog support the Latin Extended Additional character set and/or display all of the following diacritics & characters:
    Macrons: ā  ī ū
    Dot below: ṭ ḍ ṇ ḷ ṃ
    Dot above: ṅ
    Tilde: ñ
    Thanks,
    MZ

    All those characters seem to be included in the Adobe Latin 4 character set.
    I believe that currently the only families that support the characters you're looking for are:
    Source Sans Pro
    Hypatia Sans Pro
    Trajan Pro 3
    Trajan Sans Pro
    Adobe Text Pro

  • How to input unicode character set from oralce form 9i

    Hi,
    Can anyone show me how to input unicode character set from form 9i. I have designed a form and run it but when I input unicode charater in TEXT ITEM on form (FONT_NAME of this TEXT ITEM is New Roman, AriaTime l ...), but it display incorrectly nor stored it in Database.
    Thank you !

    Thank Duncan R Mills !
    My setting NLS_CHARACTER in Database as follow :
    SQL> SELECT * FROM NLS_DATABASE_PARAMETERS;
    PARAMETER VALUE
    NLS_LANGUAGE AMERICAN
    NLS_TERRITORY AMERICA
    NLS_CURRENCY $
    NLS_ISO_CURRENCY AMERICA
    NLS_NUMERIC_CHARACTERS .,
    NLS_CHARACTERSET UTF8
    NLS_CALENDAR GREGORIAN
    NLS_DATE_FORMAT DD-MON-RR
    NLS_DATE_LANGUAGE AMERICAN
    NLS_SORT BINARY
    NLS_TIME_FORMAT HH.MI.SSXFF AM
    PARAMETER VALUE
    NLS_TIMESTAMP_FORMAT DD-MON-RR HH.MI.SSXFF AM
    NLS_TIME_TZ_FORMAT HH.MI.SSXFF AM TZH:TZM
    NLS_TIMESTAMP_TZ_FORMAT DD-MON-RR HH.MI.SSXFF AM TZH:TZM
    NLS_DUAL_CURRENCY $
    NLS_COMP BINARY
    NLS_NCHAR_CHARACTERSET UTF8
    NLS_RDBMS_VERSION 8.1.7.0.0
    18 rows selected.
    Even if I can'nt input unicode character on Oracle Forms, It display incorrectly though I set exactly font_name.

  • Changing the character set in RoboHelp 7 file converted from 5

    The character set of the output files for WinHelp defaults to
    utf-8. How can the default be set to windows-1252 per project or
    for all projects in RoboHelp 7?

    This is related to the posting - "RoboHelp 7 (with patch) IE6
    does not display popup field definitions". Somewhere in our
    environment the character set is getting changed to western
    european (ISO) for the pages that do not load initially in IE6/XP
    from the OAS even though the code generated by RoboHelp is utf-8.
    The WebHelp generated in RoboHelp v5 has a characterset of
    windows-1252. None of the pages with the character set of
    windows-1252 have a problem loading (or changing the character
    set). The problem may be on the OAS or the single signon
    layer.

  • Changing Character Set from AL16UTF16 to AL32UTF8

    Hi,
    I am stuck with a requirement to change the Character Set from AL16UTF16 to AL32UTF8.
    I am trying to install the Oracle Content Database & the installer expects the target database to have a Character Set of AL32UTF8. The current Character Set of the Database is AL16UTF16. I am unable to change the Character set as it simply complains that AL32UTF8 is not a super-set of AL16UTF16.The documents that I have consulted mention that such a transitionis not recommended.
    How do I convince the installer to continue the installation with AL16UTF16 ? Or, is it possile at all to change the Character Set from AL16UTF16 to AL32UTF8 ?
    Please do let me know your thoughts on this.
    Regards,
    Sandeep

    Is your data size above GBs? If not this might help before installation I guess ->
    Changing the Database Character Set of an Existing Database
    http://download-uk.oracle.com/docs/cd/B19306_01/server.102/b14225/ch11charsetmig.htm#sthref1476
    Kind regards,
    Tonguç

  • Oracle 11gr2 - change character set from utf-8 to weiso8859p15

    Hi guys,
    I've a new and empty oracle 11gr2 database for my test environment, during the installation i've choosed the UTF-8 character's set but now i've see that in the production environment there is the WEISO8859 character's set.
    I have a dump of the prod. env. (done with Exp function).
    How can I change the character set from utf-8 to weiso8859p15 ?
    I found this procedure:
    SHUTDOWN IMMEDIATE or a SHUTDOWN NORMAL;
    STARTUP MOUNT;
    ALTER SYSTEM ENABLE RESTRICTED SESSION;
    ALTER SYSTEM SET JOB_QUEUE_PROCESSES=0;
    ALTER SYSTEM SET AQ_TM_PROCESSES=0;
    ALTER DATABASE OPEN;
    ALTER DATABASE NATIONAL CHARACTER SET new_character_set;
    SHUTDOWN IMMEDIATE; -- or SHUTDOWN NORMAL;
    STARTUP;
    is it correct ?
    Thx.
    Kind Regards,
    Stefano.

    evil.stefano wrote:
    Hi guys,
    I've a new and empty oracle 11gr2 database for my test environment, during the installation i've choosed the UTF-8 character's set but now i've see that in the production environment there is the WEISO8859 character's set.
    I have a dump of the prod. env. (done with Exp function).
    How can I change the character set from utf-8 to weiso8859p15 ?
    I found this procedure:
    SHUTDOWN IMMEDIATE or a SHUTDOWN NORMAL;
    STARTUP MOUNT;
    ALTER SYSTEM ENABLE RESTRICTED SESSION;
    ALTER SYSTEM SET JOB_QUEUE_PROCESSES=0;
    ALTER SYSTEM SET AQ_TM_PROCESSES=0;
    ALTER DATABASE OPEN;
    ALTER DATABASE NATIONAL CHARACTER SET new_character_set;
    SHUTDOWN IMMEDIATE; -- or SHUTDOWN NORMAL;
    STARTUP;
    is it correct ?NO
    leave the character set alone.
    UTF-8 will not be a problem.

  • Changing the Database Character Set  From WE8MSWIN1252 to AR8MSWIN1256

    good morning everybody,
    I need your help to know all step which it is necessary to follow to change the Database Character Set From WE8MSWIN1252 to AR8MSWIN1256
    thank you

    If you have not already done so, read up on Character Set Migration and using the CSSCAN tool to verify the database before making any changes.

  • Read a character 1 at a time from a text file

    Hi all,
    I have tried the following to read 1 character at a time from a text file but it did not work;
    int buf_sz = 1;
             try
                BufferedReader br = new BufferedReader( new FileReader( "C:\\bitshift.txt", buf_sz ));
                String rd_line = null;
                while( (rd_line = br.readLine()) != null)
                    System.out.println(rd_line + "\n");
             }I could only get it to read the whole line by removing the buf_sz when creating an instance of the BufferedReader.
    How can I achieve this?
    Thanks

    The Javadoc for Reader.read()
    read
    public int read()
             throws IOException
        Read a single character. This method will block until a character is available, an I/O error occurs, or the end of the stream is reached.
        Subclasses that intend to support efficient single-character input should override this method.
        Returns:
            The character read, as an integer in the range 0 to 65535 (0x00-0xffff), or -1 if the end of the stream has been reached
        Throws:
            IOException - If an I/O error occurs

  • How to read from a text file one character at a time?

    Hello
    I wish to read from a text file in a for loop and that on every iteration i read one charachter and proceed to the next one.
    can anyone help me?
    Lavi 

    lava wrote:
    I wish to read from a text file in a for loop and that on every iteration i read one charachter and proceed to the next one.
    can anyone help me?
    Some additional comments:
    You really don't want to read any file one character at a time, because it is highly inefficient. More typically, you read the entire file into memory with one operation (or at least a large chunk, if the file is gigantic) and then do the rest of the operations in memory.  One easy way to analyze it one byte at a time would be to use "string to byte array" and then autoindex into a FOR loop, for example.
    Of course you could also read the file directly as a U8 array instead of a string.
    Message Edited by altenbach on 06-10-2008 08:57 AM
    LabVIEW Champion . Do more with less code and in less time .

  • Unable to migrate table, character set from WE8MSWIN1252 to AL32UTF8

    Hi,
    On our source db the character set is AL32UTF8
    On our own db, we used the default character set of WE8MSWIN1252 .
    When migrating one of the table, we get an error of this: ORA-29275: partial multibyte character
    So in to alter our character set from WE8MSWIN1252 to AL32UTF8, we get this error:
    ALTER DATABASE CHARACTER SET AL32UTF8
    ERROR at line 1:
    ORA-12712: new character set must be a superset of old character set
    I would sure not like to reinstall the db and migrate the tables again. Thanks.

    See this related thread - Re: Want to change characterset of DB
    You can use the ALTER DATABASE CHARACTER SET command in very few cases. You will most likely have to recreate the database and re-migrate the data.
    HTH
    Srini

  • Reading rule sets from an XML file

    Hi all,
    How can I read rule sets from an XML file? I have been given some rules in XML
    format and using those I have to query some content. I am using WLP4.0
    Also how can I code rules in java?
    Thanks in advance.

    You can have the following classes:
    Players class deriving from Vector (or containing a vector), and then
    Player class with attribute 'name'.
    class Players
               Vector myVector = new Vector();
                void addPlayer(Player p)
                      myVector.add(p);
                Player getPlayer(int index)
                      myVector.get(index);
    class Player
             private String myName = null;
             Player(String name)
                    this.myName = name;
             String getName()
                    return myName;
    }Then while handling the SAX events you can do the following:
    class MySAXHandler implements ContentHandler (or whatever the itnerface is)
                 public void startElement(String name,....)
                          Players p = null;
                          if(name.equals("Players"))
                                 p = new Players();
                         else if (name.equals("Name"))
                                p.add(new Player(value));
    }HTH,
    Kalyan.

  • Approach to converting database character set from Western European to Unicode

    Hi All,
    EBS:12.2.4 upgraded
    O/S: Red Hat Linux
    I am looking for the below information. If anyone could help provide would be great!
    INFORMATION NEEDED: Approach to converting database character set from Western European to Unicode for source systems with large data exceptions
    DETAIL: We are looking to convert Oracle EBS database character set from Western European to Unicode to support Kanji characters. Our scan results show
    both “lossy (110K approx.)” and “truncation (26K approx.)” exceptions in the database which needs to be fixed before the database is converted to Unicode.
    Oracle Support has suggested to fix all open and closed transactions in the source Production instance using forms and scripts.
    We’re looking for information/creative approaches  who have performed similar exercises without having to manipulate data in the source instance.
    Any help in this regard would be greatly appreciated!
    Thanks for yourn time!
    Regards,

    There are two aspects here:
    1. Why do you have such large number of lossy characters? Is this data coming from some very old eBS release, i.e. from before the times of the Java applet interface to Oracle Forms?  Have you analyzed the nature of this lossy data?
    2. There is no easy way around truncation issues as you cannot modify eBS metadata (make columns wider). You must shorten or remove the data manually through the documented eBS interfaces. eBS does not support direct manipulation of data in the database due to complex consistency rules enforced by the application itself (e.g. forms).
    Thanks,
    Sergiusz

  • Unable to alter character set from WE8MSWIN1252 to AL32UTF8

    Hi,
    I am trying to migrate some tables to my database.
    On the source DB, the character set is AL32UTF8
    On My database, the character set is WE8MSWIN1252.
    When I tried to alter the character set, I get this error message:
    ALTER DATABASE CHARACTER SET AL32UTF8
    ERROR at line 1:
    ORA-12712: new character set must be a superset of old character set
    Any help? Thanks.

    Pl refrain from posting duplicate threads :-)
    Unable to migrate table, character set from WE8MSWIN1252 to AL32UTF8
    Srini

  • How can I read a specific character from a text file?

    Hi All!
    I would like to read a specific character from a text file, e.g. the 2012th character in a text file with 7034 characters.
    How can I do this?
    Thanks
    Johannes

    just use the skip(long) method of the input stream that reads the text file and skip over the desired number of bytes

  • Reading a Random Line from a Text File

    Hello,
    I have a program that reads from a text file words. I currently have a text file around 800KB of words. The problem is, if I try to load this into an arraylist so I can use it in my application, it takes wayy long to load. I was wondering if there was a way to just read a random line from the text file.
    Here is my code, and the text file that the program reads from is called 'wordFile'
    import java.awt.*;
    import java.awt.geom.*;
    import javax.swing.*;
    import java.awt.event.*;
    import java.io.*;
    import java.util.*;
    public class WordColor extends JFrame{
         public WordColor(){
              super("WordColor");
              setSize(1000,500);
              setVisible(true);
              add(new WordPanel());
         public static void main(String[]r){
              JFrame f = new WordColor();
    class WordPanel extends JPanel implements KeyListener{
         private Graphics2D pane;
         private Image img;
         private char[]characterList;
         private CharacterPosition[]positions;
         private int charcounter = 0;
         private String initialWord;
         private File wordFile = new File("C:\\Documents and Settings\\My Documents\\Java\\projects\\WordColorWords.txt");
         private FontMetrics fm;
         private javax.swing.Timer timer;
         public final static int START = 20;
         public final static int delay = 10;
         public final static int BOTTOMLINE = 375;
         public final static int buffer = 15;
         public final static int distance = 4;
         public final static Color[] colors = new Color[]{Color.red,Color.blue,Color.green,Color.yellow,Color.cyan,
                                                                          Color.magenta,Color.orange,Color.pink};
         public static String[] words;
         public static int descent;
         public static int YAXIS = 75;
         public static int SIZE = 72;
         public WordPanel(){
              words = readWords();
              setLayout(new BorderLayout());
              initialWord = getWord();
              characterList = new char[initialWord.length()];
              for (int i=0; i<initialWord.length();i++){
                   characterList[i] = initialWord.charAt(i);
              setFocusable(true);
              addKeyListener(this);
              timer = new javax.swing.Timer(delay,new ActionListener(){
                   public void actionPerformed(ActionEvent evt){
                        YAXIS += 1;
                        drawWords();
                        if (YAXIS + descent - buffer >= BOTTOMLINE) lose();
                        if (allColorsOn()) win();
         public void paintComponent(Graphics g){
              super.paintComponent(g);
              if (img == null){
                   img = createImage(getWidth(),getHeight());
                   pane = (Graphics2D)img.getGraphics();
                   pane.setColor(Color.white);
                   pane.fillRect(0,0,getWidth(),getHeight());
                   pane.setFont(new Font("Arial",Font.BOLD,SIZE));
                   pane.setColor(Color.black);
                   drawThickLine(pane,getWidth(),5);
                   fm = g.getFontMetrics(new Font("Arial",Font.BOLD,SIZE));
                   descent = fm.getDescent();
                   distributePositions();
                   drawWords();
                   timer.start();
              g.drawImage(img,0,0,this);
         private void distributePositions(){
              int xaxis = START;
              positions = new CharacterPosition[characterList.length];
              int counter = 0;
              for (char c: characterList){
                   CharacterPosition cp = new CharacterPosition(c,xaxis, Color.black);
                   positions[counter] = cp;
                   counter++;
                   xaxis += fm.charWidth(c)+distance;
         private void drawThickLine(Graphics2D pane, int width, int thickness){
              pane.setColor(Color.black);
              for (int j = BOTTOMLINE;j<BOTTOMLINE+1+thickness;j++){
                   pane.drawLine(0,j,width,j);
         private void drawWords(){
              pane.setColor(Color.white);
              pane.fillRect(0,0,getWidth(),getHeight());
              drawThickLine(pane,getWidth(),5);
              for (CharacterPosition cp: positions){
                   int x = cp.getX();
                   char print = cp.getChar();
                   pane.setColor(cp.getColor());
                   pane.drawString(""+print,x,YAXIS);
              repaint();
         private boolean allColorsOn(){
              for (CharacterPosition cp: positions){
                   if (cp.getColor() == Color.black) return false;
              return true;
         private Color randomColor(){
              int rand = (int)(Math.random()*colors.length);
              return colors[rand];
         private void restart(){
              charcounter = 0;
              for (CharacterPosition cp: positions){
                   cp.setColor(Color.black);
         private void win(){
              timer.stop();
              newWord();
         private void newWord(){
              pane.setColor(Color.white);
              pane.fillRect(0,0,getWidth(),getHeight());
              repaint();
              drawThickLine(pane,getWidth(),5);
              YAXIS = 75;
              initialWord = getWord();
              characterList = new char[initialWord.length()];
              for (int i=0; i<initialWord.length();i++){
                   characterList[i] = initialWord.charAt(i);
              distributePositions();
              charcounter = 0;
              drawWords();
              timer.start();
         private void lose(){
              timer.stop();
              pane.setColor(Color.white);
              pane.fillRect(0,0,getWidth(),getHeight());
              pane.setColor(Color.red);
              pane.drawString("Sorry, You Lose!",50,150);
              repaint();
              removeKeyListener(this);
              final JPanel p1 = new JPanel();
              JButton again = new JButton("Play Again?");
              p1.add(again);
              add(p1,"South");
              p1.setBackground(Color.white);
              validate();
              again.addActionListener(new ActionListener(){
                   public void actionPerformed(ActionEvent evt){
                        remove(p1);
                        addKeyListener(WordPanel.this);
                        newWord();
         private String getWord(){
              int rand = (int)(Math.random()*words.length);
              return words[rand];
         private String[] readWords(){
              ArrayList<String> arr = new ArrayList<String>();
              try{
                   BufferedReader buff = new BufferedReader(new FileReader(wordFile));
                   try{
                        String line = null;
                        while (( line = buff.readLine()) != null){
                             line = line.toUpperCase();
                             arr.add(line);
                   finally{
                        buff.close();
              catch(Exception e){e.printStackTrace();}
              Object[] objects = arr.toArray();
              String[] words = new String[objects.length];
              int count = 0;
              for (Object o: objects){
                   words[count] = (String)o;
                   count++;
              return words;
         public void keyPressed(KeyEvent evt){
              char tempchar = evt.getKeyChar();
              String character = ""+tempchar;
              if (character.equalsIgnoreCase(""+positions[charcounter].getChar())){
                   positions[charcounter].setColor(randomColor());
                   charcounter++;
              else if (evt.isShiftDown()){
                   evt.consume();
              else{
                   restart();
              drawWords();
         public void keyTyped(KeyEvent evt){}
         public void keyReleased(KeyEvent evt){}
    class CharacterPosition{
         private int xaxis;
         private char character;
         private Color color;
         public CharacterPosition(char c, int x, Color col){
              xaxis = x;
              character = c;
              color = col;
         public int getX(){
              return xaxis;
         public char getChar(){
              return character;
         public Color getColor(){
              return color;
         public void setColor(Color c){
              color = c;
    }

    I thought that maybe serializing the ArrayList might be faster than creating the ArrayList by iterating over each line in the text file. But alas, I was wrong. Here's my code anyway:
    class WordList extends ArrayList<String>{
      long updated;
    WordList readWordList(File file) throws Exception{
      WordList list = new WordList();
      BufferedReader in = new BufferedReader(new FileReader(file));
      String line = null;
      while ((line = in.readLine()) != null){
        list.add(line);
      in.close();
      list.updated = file.lastModified();
      return list;
    WordList wordList;
    File datFile = new File("words.dat");
    File txtFile = new File("input.txt");
    if (datFile.exists()){
      ObjectInputStream input = new ObjectInputStream(new FileInputStream(datFile));
      wordList = (WordList)input.readObject();
      if (wordList.updated < txtFile.lastModified()){
        //if the text file has been updated, re-read it
        wordList = readWordList(txtFile);
        ObjectOutputStream output = new ObjectOutputStream(new FileOutputStream(datFile));
        output.writeObject(wordList);
        output.close();
    } else {
      //serialized list does not exist--create it
      wordList = readWordList(txtFile);
      ObjectOutputStream output = new ObjectOutputStream(new FileOutputStream(datFile));
      output.writeObject(wordList);
      output.close();
    }The text file contained one random sequence of letters per line. For example:
    hwnuu
    nhpgaucah
    zfbylzt
    hwnc
    gicgwkhStats:
    Text file size: 892K
    Serialized file size: 1.1MB
    Time to read from text file: 795ms
    Time to read from serialized file: 1216ms

Maybe you are looking for