UTF-8 reading and converting to proper characters.

Hi,
How do I read a UTF-8 encoded document (XML document to be precise) and display it. (as human readable characters).
I have a URL Stream , and when I read it, I am reading bytes.
Those bytes mkeup UTF-8 variable length encoded characters.
I would like to create a string containing, readable, printable, searchable manipulatable characters (Unicode).
How is this achieved ?
At the moment, I am building a string where every byte of the UTF-8 stream is a unicode character, which gives me a big old load of crap.
cheers

My mistake:
When opening a URL object and reading the stream in java, the character conversion is automatically hapening, so everything is fine.
My problem was that I was reading a URL that is encoded in GZIP format.
So now I have to workout how to read a URL that is GZIPPED,
i.e The websit has been written in XML->encoded to UTF-8, and then GZIPPED, before being placed on the server.
This is the URL :
http://feeds.wsjonline.com/wsj/podcast_wall_street_journal_weekend_edition

Similar Messages

How to read and convert a acsm file to a pdf file?

Using Adobe Digital Edition 2.0?

A .acsm file is just a token for the purchace/loan of a book, which will come in a real DRM file, either .pdf or .epub (it's choice, not yours)
The .pdf or .epub will be limited to reading on ADE or other compatible Adobe DRM software or hardware readers;
and these must be authorized by the same ID as the one used when the book was first accessed.
To open the .acsm file in ADE and get ADE to convert it to .epub, .pdf, either:
drag/drop it onto ADE from Windows Explorer
open it by making sure .acsm files are associated with ADE, and double clicking in Window Explorer or opening from your browser.
A bug in Windows 8 sometimes makes it very awkward to make that file association. Post back if that is your issue.

After reinstalling CS6 the bridge photo downloader isn't able to read raw files and fails to convert the raw files to DNG. Previously downloaded raw files, now DNG, open up successfully in Camera Raw 7. How do I get the photo downloader to read and conver

After reinstalling CS6 the bridge photo downloader isn't able to read raw files and fails to convert the raw files to DNG. Previously downloaded raw files, now DNG, open up successfully in Camera Raw 7. How do I get the photo downloader to read and convert raw files. MacBook Pro with Snow Leopard. No such problem before this reinstallation.

You should install Camera Raw 4.6.
Visit this page and follow the instructions carefully:
PC: http://www.adobe.com/support/downloads/detail.jsp?ftpID=4040
Mac: http://www.adobe.com/support/downloads/detail.jsp?ftpID=4039
-Noel

Reading files and converting into xml structure

Hi,
In my application a client requests for the folder structure information to a server through RMI. The server needs to read files and folders on local machine and convert it into some structure (I am thinking of using xml) and send it back. For eg: I am planning to have Server send back something like:
<directory name = "parentdirectory">
<file name = "abc.jpg"/>
<file name = "def.bmp"/>
<directory" name = "subdirectory">
<file name = "hij.jpg"/>
<file name = "klm.bmp"/>
</directory>
</directory>
It is just the names of the files I am interested in and not the contents. Is this a good approach of sending back the data as a string containg xml definition. Is there any better appproach in terms of performance, memory etc? I am currently planning on using DOM for construction of this structure. Is there a source code for reading and converting the folder structure into xml. Just for your information, the clients gets this information and shows it as a tree structure on the GUI.
Thanks!!!!

Is this a good approach of sending back the data as a string containg xml definition. It'll work.
An alternative, more direct approach is to build a memory representation and send this as argument/return value of an RMI call. You'd need to write classes MyDirectory and MyFile; MyFile has just a name; MyDirectory has a name and a collection of MyDirectory and one of MyFile. Make these classes implement Serializable and you can send them over RMI.
The effort to write those trivial classes would be less than to implement XML encoding/decoding, and also in terms of runtime performance and memory it will be hard to beat Java's serialization with anything XML-based. In this case I doubt performance/memory are relevant considerations though.
If for some reason I'd go for sending XML Strings anyway, I wouldn't do the encoding/decoding myself; I'd use XStream to convert Java classes to/from XML and still end up writing the above two classes and be done.
Sorry if you wanted a simple yes or no :-)

Reading and printing from a file

I have a method that reads temperatures from a data file and prints them to the monitor. I use a while loop to read each line of the file, but for some reason it doesnt skip the loop when readLine returns null. My question is how do i get the method to stop reading before it returns null...
   public static void printFile(String textFile) throws IOException
        String inputString="";
        BufferedReader inputDataFile=
        new BufferedReader(new FileReader(CENTIGRADE_DATA_FILE));
        while(inputString!=null)
            inputString= inputDataFile.readLine();
            System.out.println(inputString);
        inputDataFile.close();


ok cool thanks for pointing that out guys i appreciate it...now i need to take those temps and convert them from Strings to doubles so they can be converted i tried this but im getting a NumberFormatException error...
public static void buildReport() throws IOException
        double centigradeTemp=0.00;
        String inputTemp="";
        //Open centigrade temps to be read and converted
        BufferedReader inputDataFile=
        new BufferedReader(new FileReader(CENTIGRADE_DATA_FILE));
        //Open file writer to write report
        PrintWriter outFile=new PrintWriter(new FileWriter("TempReport.txt"));
        //While loop to read a temp, convert it, and write it in the report
        while((inputTemp= inputDataFile.readLine())!=null)
            centigradeTemp=Integer.parseInt(inputTemp);
            outFile.println(fahrenheit(centigradeTemp));
        outFile.close();
    public static double fahrenheit(double centigrade)
        double fahrenTemp=0.00;
        fahrenTemp=(9/5)*centigrade + 32;
        return fahrenTemp;

ITunes no longer recognizes midi files how can I play and convert them?

iTunes 10.3 recognised midi files, I could "move" them to the iTunes library and select them. then click on "Advanced" and "Convert to AAC" (or to MP3 etc).
iTunes 10.5 will not recognise midi files.
I think an older version of GarageBand could also read and convert midi files, but this no longer appears to be the case either in the latest version.
How can I now convet midi files? Has Apple decided to simply ignore them?

Thanks to Limnos, I now have some further things to try, but in the meantime I found I had been maligning GarageBand 11. Indeed its built-in Help system doesn't seem to mention Midi files at all, but I found out how to insert them from Apple's on-line support system at http://support.apple.com/kb/PH2009 and it works superbly well - one can even change the instruments on individual channels etc and then save the lot as an AAC (m4a) or mp3 etc file.
I also loaded up an old version of iTunes, version 10.3, which still recognises Midi files and can convert them to AAC or mp3 but in so doing some volume seems to be lost.
But the winner so far for me has been Audacity, free software. One needs to add a LAME encoder for mp3 output, I installed Soundflower and routed the playback of the midi file available in the Finder (in fact for the whole machine, while playing the midi file) and declared Soundflower as the input for Audacity. I could then use Audacity's effects menu to increase the volume throughout the track and save it. I then, using the Sound preferences of the machine, switched back to the inbuilt speakers and was able to listen to a higher volume version, playable in iTunes 10.5. More faithful to the original midi file than the GarageBand version.
See instructions at http://ask.brothersoft.com/tags/convert-midi-to-mp3/ and in particular one of the links http://ask.brothersoft.com/how-to-convert-a-midi-to-a-mp3-using-audacity-26125.h tml
This is for Windows but for Mac it's very similar/ - still Audacity. BUT the instructions in the last link don't seem to quite work, which is why I had to use Soundflower. I can import the midi file, I can select it, but then the export item on the file menu remains greyed out and unuseable, so I must presumably be missing some plug-in. Also Audacity keeps telling me it can't find various ffmpeg files even though I thought I had installed them correctly. I'm obviously missing something here, but at least I can now get my midi files into aac or mp3 format again at last.
For Soundflower see http://cycling74.com/soundflower-landing-page/ and http://kineme.net/forum/General/soundflowerforlion
and there is a useful tutorial by Nowjobless on YouTube at
http://www.youtube.com/watch?v=r3FGOIW08gA&feature=related
Good luck to anyone else facing the same problems, it remains to be seen how to "convert" midi files to mp3 in Audacity without recourse to Soundflower.

Object reading and writing

i have to read a very large files that consist of header(5000 bytes)main file(10,000 bytes), footer(4000 bytes) this is a binary file.The current code is in c++, the logic implemented is like as:- there are three classes and the data structure defined for these classes for these three classes is same as the information stored in binary.Header data structure is 5000 bytes footer is 4000 bytes and then main.So when ever the large file is read the process invokes read method and pass the object of header and a pointer for reading 5000 bytes.As structure is same all the values read are assigned accordingly to header in single shot.Here is example
class header
char name array[2000];
char Address array[2000];
in reaqd methods i pass object of header. binary file name and a pointer to read first 4000 bytes
so it aitoamtically assign first 2000 to name rest to address
same funtionality i have to do in java.I am not able to do it.
is it posilbe ot i have to read byte array and then fire setmethods for each data to do this or can i simulate this c++ funtionality in java .
pl help

there are
three classes and the data structure defined for these
classes for these three classes is same as the
information stored in binary.So, if I understand what you want to do, you need to read C++ binary data into a Java class, but don't want to go through the process of translating that data and calling individual getters and setters?
First thing: be aware that the byte order of C++ data will differ, depending on the machine that wrote the file. If you're reading files from different machines, you may not get back what you expect (which is one reason that XML exists, to store everything in text rather than binary).
Second: do you have a Java class that exactly mimics the C++ class? You need to be aware of field size and type (eg, a 16-bit C++ int corresponds to a Java short), and recognize that the same types may have different sizes depending on the C++ implementation.
If you have considered these two points, then look at java.lang.reflect.
1) Define a static array containing the names of the fields in your object, in the order they appear in the data. This is necessary because Class.getFields() does not guarantee order of the fields returned.
2) Call getClass() on the passed object.
3) For every field in your list from step 1, call getField() on the Class object that you got from step 2.
4) Read the appropriate amount of data from the file (see below), and use the appropriate set method on the Field object that you got in step 3.
How do you know how much data to read from the file? There are a couple of ways to do this. You can use the getType() method on the Field object (from step 3), then use this to find an appropriate "read and convert" routine. Or, if you want to drive from the C++ code, the table that you define in step 1 could identify the C++ type.
This is, of course, only a skeleton description of what you need to do. But as outlined, it should take only a day or two to implement. The real problem is if your C++ structures change frequently, or you'll be processing data from multiple sources (so byte order and field size may change). In that case, you may (1) find it best to implement a parser to build your field table, and (2) drive everything off the field table, which would then have to contain additional information about the appropriate conversion routine.
Also, you could check SourceForge or other open-source sites, to see if someone has already done this.

Convert smart quotes and other high ascii characters to HTML

I'd like to set up Dreamweaver CS4 Mac to automatically convert smart quotes and other high ASCII characters (m-dashes, accent marks, etc.) pasted from MS Word into HTML code. Dreamweaver 8 used to do this by default, but I can't find a way to set up a similar auto-conversion in CS 4. Is this possible? If not, it really should be a preference option. I code a lot of HTML emails and it is very time consuming to convert every curly quote and dash.
Thanks,
Robert
Digital Arts

I too am having a related problem with Dreamweaver CS5 (running under Windows XP), having just upgraded from CS4 (which works fine for me) this week.
In my case, I like to convert to typographic quotes etc. in my text editor, where I can use macros I've written to speed the conversion process. So my preferred method is to key in typographic letters & symbols by hand (using ALT + ASCII key codes typed in on the numeric keypad) in my text editor, and then I copy and paste my *plain* ASCII text (no formatting other than line feeds & carriage returns) into DW's DESIGN view. DW displays my high-ASCII characters just fine in DESIGN view, and writes the proper HTML code for the character into the source code (which is where I mostly work in DW).
I've been doing it this way for years (first with GoLive, and then with DW CS4) and never encountered any problems until this week, when I upgraded to DW CS5.
But the problem I'm having may be somewhat different than what others have complained of here.
In my case, some high-ASCII (above 128) characters convert to HTML just fine, while others do not.
E.g., en and em dashes in my cut-and-paste text show as such in DESIGN mode, and the right entries
    –
    —
turn up in the source code. Same is true for the ampersand
    &
and the copyright symbol
    ©
and for such foreign letters as the e with acute accent (ALT+0233)
    é
What does NOT display or code correctly are the typographic quotes. E.g., when I paste in (or special paste; it doesn't seem to make any difference which I use for this) text with typographic double quotes (ALT+0147 for open quote mark and ALT+0148 for close quote mark), which should appear in source code as
    “[...]”
DW strips out the ASCII encoding, displaying the inch marks in DESIGN mode, and putting this
    "[...]"
in my source code.
The typographic apostrophe (ALT+0146) is treated differently still. The text I copy & paste into DW should appear as
    [...]’[...]
in the source code, but instead I get the foot mark (both in DESIGN and CODE views):
I've tried adjusting the various DW settings for "encoding"
    MODIFY > PAGE PROPERTIES > TITLE/ENCODING > Encoding:
and for fonts
    EDIT > PREFERENCES > FONTS
but switching from "Unicode (UTF-8)" to "Western European" hasn't solved the problem (probably because in my case many of the higher ASCII characters convert just fine). So I don't think it's the encoding scheme I use that's the problem.
Whatever the problem is, it's caused me enough headaches and time lost troubleshooting that I'm planning to revert to CS4 as soon as I post this.
Deborah

After scanning my document and converting to Microsoft Word the size of characters are different

After scanning my document and converting to Microsoft Word the size of characters are different and things like puntuation are distorted. How do I get the uniformity like the original?

Of course what lands in the Word file will differ from the viewed picture/image of text created by the scanner.
(The output of all scanners is always an image file. For an image of textual content the best output file format is TIFF.)
So you scan the hardcopy of text.
The scanner output image (picture) is brought into PDF.
At this point the only PDF page content that you can export to Word is the image (nope, no "text" just the image).
Consequently you use Acrobat's OCR feature to do OCR of the image of text.
With a decent paper source, proper resolution and a black and white image you'll get acceptable accuracy of recognition of the pictures of the characters.
(the Optical Character Recognition)
Acrobat's Searchable Image and Searchable Image (Exact) provides output that used text rendering mode 3 (no fill, no stroke for the glyphs).
So, invsible / hidden text.
The third OCR method is ClearScan.
You could play with each of the three to see what goes into a Word file.
Might try export to RTF, DOC and DOCX as well.
Anyway -- What is exported is the OCR output; Not the image of text.
And, of course, the image of the text is not the imprint on the paper that was scanned.
Each step to the way you have some deviation.
Once you have the exported PDF content in a Word file you can use Word to cleanup as desired / needed.
OR
Prop up the hardcopy and transcribe to a Word file.
Be well...

Reading and writing Special Characters to Oracle DB

Hi All,
I need to insert data from CSV to Oracle DB and then use the same data for creating XML file in UTF-8 format.
I have few fields in the CSV file which has � and � special characters. I'm able to read � and write in UTF-8 , but the same procedure is resulting in some other ascii character for �.
While reading data from CSV file :
Reader l_fileReader = new InputStreamReader(p_in,"ISO-8859-1");
Can anyone help me.
Thanks,
Ramki.

Does anyone has some pointers or clues?

How to convert muti-byte characters from US7ASCII database to UTF-8

Hi Guys,
We have a source database with CHARCTERSET as US7ASCII and our traget database has characterset of UTF-8. We have "©" symbol in the source databse and when we are inserting this value into our target database it is being converted as "¿".
How can get make sure that "©" symbol is inserted on the target database. Both the databases are on version 10.2 but have a different CHARACTERSET. In the oracle documentation it mentioned that if the target database characterset is not a superset of source database characterset then this will happen but in our case UTF-8 is a superset of US7ASCII.
Thanks,
Ramu Kalvakuntla
Edited by: user11905624 on Sep 15, 2009 2:58 PM

user11905624 wrote:
When I tried the DUMP('COLUMN','1016), this is what I got:
Typ=96 Len=1 CharacterSet=US7ASCII: a9Considering 7-bit ASCII standard character set, the code 0xA9 is invalid.
This has likely happened due to a pass-through scenario. See [NLS_LANG FAQ|http://www.oracle.com/technology/tech/globalization/htdocs/nls_lang%20faq.htm] (example of wrong setup). E.g. Windows 125x code pages all defines a character 'copyright sign' with encoding A9.
If proper char set conversion takes place, I would expect (illegal) codes 0x80-FF to be "catched" and converted to replacement character (like U+fffd).
Going back to the issue, how exactly are you transfering data or retrieving and inserting from source to target database?
Edited by: orafad on Sep 17, 2009 10:56 PM

Read and write a .CSV file contains cirillic characters issue

Hi guys,
I am a developer of a web application project which uses Oracle Fusion Middleware technologies. We use JDeveloper 11.1.1.4.0 as development IDE.
I have a requirement to get a .csv file from WLS to application running machine. I used a downloadActinLinsener in front end .jspx in order to do that.
I use OpenCSV library to read and write .csv files.
Here is my code for read and write the .csv file,
public void dwdFile(FacesContext facesContext, OutputStream out) {
System.out.println("started");
String [] nextLine;
try {
FileInputStream fstream1 = new FileInputStream("Downloads/filetoberead.CSV");
DataInputStream in = new DataInputStream(fstream1);
BufferedReader br = new BufferedReader(new InputStreamReader(in,"UTF-8"));
CSVReader reader = new CSVReader(br,'\n');
//CSVReader reader = new CSVReader(new FileReader("Downloads/ACTIVITY_LOG_22-JAN-13.csv"),'\n');
List<String> list=new ArrayList();
while ((nextLine = reader.readNext()) != null) {
if(nextLine !=null){
for(String s:nextLine){
list.add(s);
System.out.println("list size ; "+list.size());
OutputStreamWriter w = new OutputStreamWriter(out, "UTF-8");
CSVWriter writer = new CSVWriter(w, ',','\u0000');
for(int i=0;i<list.size();i++){
System.out.println("list items"+list.get(i));
String[] entries = list.get(i).split(",");
writer.writeNext(entries);
//System.out.println("list items : "+list.get(i));
writer.close();
} catch (IOException e) {
e.printStackTrace();
say the filetoberead.CSV contains following data,
0,22012013,E,E,ASG,,O-0000,O,0000,100
1,111211,LI,0,TABO,B,M002500003593,,,К /БЭ60072715/,КАРТЕНБАЙ
2,07,Balance Free,3
1,383708,LI,0,BDSC,B,НЭ63041374,,,Т /НЭ63041374/,ОТГОНБААТАР
2,07,Balance Free,161
It reads and writes the numbers and english characters correct. All cirillic characters it prints "?" as follows,
0,22012013,E,E,ASG,,O-0000,O,0000,100
1,111211,LI,0,TABO,B,M002500003593,,,? /??60072715/,?????????
2,07,Balance Free,3
1,383708,LI,0,BDSC,B,??63041374,,,? /??63041374/,???????????
2,07,Balance Free,161
can somone please help me to resolve this problem?
Regards !
Sameera

Are you sure that the input file (e.g. "Downloads/filetoberead.CSV") is in UTF-8 character set? You can also check it using some text editor having a view in hex mode. If each Cyrillic character in your input file occupies a single byte (instead of two), then the file is not in UTF-8. Most probably it is in Cyrillic for Windows (CP1251).
If this is the case, you should modify the line
BufferedReader br = new BufferedReader(new InputStreamReader(in,"UTF-8"));toBufferedReader br = new BufferedReader(new InputStreamReader(in,"windows-1251"));Dimitar

Reading in any file and converting to a byte array

Okay what I am trying to do is to write a program that will read in any file and convert it into a int array so that I can then manipulate the values of the int array and then re-write the file. Once I get the file into an int array I want to try and compress the data with my own algorithm as well as try to write my own encryption algorithm.
What I have been looking for is code samples that essentially read in the file as a byte array and then I have been trying to convert that byte array into an int array which I could then manipulate. So does anyone have any sample code that essentially takes a file and converts it into an int array and then converts it back into a byte array and write the file. I have found code that is close but I guess I am just too new to this. Any help would be appreciated.

You can read a whole file into a byte array like this:File f = new File("somefile");
int size = (int) f.length();
byte[] contents = new byte[size];
DataInputStream in = new DataInputStream(
new BufferedInputStream(new FileInputStream(f)));
in.readFully(contents);
in.close();Note that you need to add in the proper exception handling code. You could also use RandomAccessFile instead of the DataInputStream.
Writing a byte array to a file is easier; just construct the FileOutputStream, call write on it with the byte array, and close the stream.

Need help to read and write using UTF-16LE

Hello,
I am in need of yr help.
In my application i am using UTF-16LE to export and import the data when i am doing immediate.
And sometimes i need to do the import in an scheduled formate..i.e the export and imort will happend in the specified time.
But in my application when i am doing scheduled import, they used the URL class to build the URL for that file and copy the data to one temp file to do the event later.
The importing file is in UTF-16LE formate and i need to write the code for that encoding formate.
The problem is when i am doing scheduled import i need to copy the data of the file into one temp place and they doing the import.
When copying the data from one file to the temp i cant use the UTF-16LE encoding into the URL .And if i get the path from the URl and creating the reader and writer its giving the FileNotFound exception.
Here is the excisting code,
protected void copyFile(String rootURL, String fileName) {
URL url = null;
try {
url = new URL(rootURL);
} catch(java.net.MalformedURLException ex) {
if(url != null) {
BufferedWriter out = null;
BufferedReader in = null;
try {
out = new BufferedWriter(new FileWriter(fileName));
in = new BufferedReader(new InputStreamReader(url.openStream()));
String line;
do {
line = in.readLine();
if(line != null) {
out.write(line, 0, line.length());
out.newLine();
} while(line != null);
in.close();
out.close();
} catch(Exception ex) {
Here String rootURL is the real file name from where i have to get the data and its UTF-16LE formate.And String fileName is the tem filename and it logical one.
I think i tried to describe the problem.
Plz anyone help me.
Thanks in advance.

Hello,
thanks for yr reply...
I did the as per yr words using StreamWriter but the problem is i need a temp file name to create writer to write into that.
but its an logical one and its not in real so if i create Streamwriten in that its through FileNotFound exception.
The only problem is the existing code build using URL and i can change all the lines and its very difficult because its vast amount of data.
Is anyother way to solve this issue?
Once again thanks..

Have Windows XP and Adobe 9 Reader and need to send a series of large documents to clients as a matter of urgency When I convert 10 pages a MS-Word file to Pdf this results in file of 6.7 MB which can't be emailed. Do I combine them and then copy

I have Windows XP and Adobe 9 Reader and need to send a series of large documents to clients as a matter of urgency When I convert 10 pages a MS-Word file to Pdf this results in file of 6.7 MB which can't be emailed. Do I combine them and then copy to JPEG 2000 or do I have to save each page separately which is very time consuming Please advise me how to reduce the size and send 10 pages plus quickly by Adobe without the huge hassles I am enduring

What kind of software do you use for the conversion to pdf? Adobe Reader can't create pdf files.

UTF-8 reading and converting to proper characters.

Similar Messages

Maybe you are looking for