File encoding question

Just attempting to open some file and read it to screen with BufferedReader and no encoding is handled anywhere...
I tried to save an html file to .txt and just use that as a test file, but yeah encoding was different from ???(the standard encoding I guess) and stopped at some point in the text file.
I did open notepad2 and say under encoding to be ANSI, read a bit on the forums and see UTF-16 mentioned a bit as well... So my question is just what type of encoding is a normal standard text file?

Hi,
rugae wrote:
So if I change my regional settings to Chinese or something and create a document with Chinese characters and attempt to read the file If you create the file with a editor, you save it with the encoding that is appointed in your editor. If you save it with java, use OutputStreamWriter with a defined charset (see http://www.exampledepot.com/egs/java.io/WriteToUTF8.html).
say with BufferedReader or just Scanner line by line with no implementation at all to handle different encodings, to console or another file is that going to give me correct outputs? (yes/no would be sufficient to clear it up for me...)Maybe ;-) Finally you have to know the encoding of the file or you have to guess it. If it is a HTML document, there are HTML headers that shows the encoding. If it is transported via HTTP then there is also a HTTP header Content-Type with a encoding declaration. Firefox shows this to the user with [Tools]-[Page Info]. http://www.beijing.gov.cn/ for ex is encoded in GB2312. Some Taiwanese pages may be encoded in BIG5. Encodings aka charsets are defined by IANA http://www.iana.org/assignments/character-sets.
If you know the encoding(charset) of the file, then read it with InputStreamReader and the right charset as shown in http://www.exampledepot.com/egs/java.io/ReadFromUTF8.html.
You can feel confident, if you have saved the file in UTF-(8 or 16) and opened it with the same encoding. UTF encodings are recommended.
greetings
Axel
ps: In java Charset.defaultCharset() shows you the default charset of this Java virtual machine.
Edited by: Axel_Richter on Oct 9, 2007 8:12 PM
Edited by: Axel_Richter on Oct 9, 2007 8:17 PM

Similar Messages

  • AAC encoding, using Shuffle-saved iTunes for a Nano, file fidelity question

    Love the little shuffle but don't understand possible limitations. I only have a Shuffle but may add a Nano. I'm trying to understand file encoding, decoding, etc..
    When I upload a CD into iTunes (iTunes having been loaded upon initial Shuffle set-up) are the files automatically encoded as AAC files or are they saved in iTunes as the full-sized files (I don't recall the correct encoding for full CD-quality fidelity), or are they saved already encoded as AAC files?
    While the fidelity of the AAC file is acceptable for in the noisy car or while running/skiing I would like to know whether adding a Nano to my quiver would allow me to utilize the 1000+ CD's I've already uploaded into iTunes, and whether the files in my current iTunes would load into the Nano as the higher-fidelity files. I would like to use the Nano to play tunes through my high-end stereo system and realize CD-like fidelity, not compromised AAC-level fidelity. Anyone familiar with all this? I would appreciate any help.

    ExAlfa wrote:
    When I upload a CD into iTunes (iTunes having been loaded upon initial Shuffle set-up) are the files automatically encoded as AAC files or are they saved in iTunes as the full-sized files (I don't recall the correct encoding for full CD-quality fidelity), or are they saved already encoded as AAC files?
    Probably currrently set for 128kbps AAC... go to Preferences>Advanced>Importing
    where you can set the encoder to be whatever you want as well as the bitrate.
    While the fidelity of the AAC file is acceptable for in the noisy car or while running/skiing I would like to know whether adding a Nano to my quiver would allow me to utilize the 1000+ CD's I've already uploaded into iTunes, and whether the files in my current iTunes would load into the Nano as the higher-fidelity files.
    That's dependent on how they were imported...(see above)
    I would like to use the Nano to play tunes through my high-end stereo system and realize CD-like fidelity, not compromised AAC-level fidelity. Anyone familiar with all this? I would appreciate any help.
    You can keep your music in the library as AIFF or Apple Lossless and
    set iTunes to convert the higher rate files to 128kps AAC on loading to the shuffle.

  • File encoding cp1252 problem

    Hi there,
    I have a problem concerning the file encoding in a web application.
    I'll sketch the problem for you.
    I'm working on adjustments and bug fixes of an e-mail archive at the company i work for. With this archive, users can search e-mails using a Struts 1.0 / JSP web application, read them, and send them back to their mail inbox.
    Recently a bug has appeared, concerning character sets.
    We have mails with french characters or other uncommon characters in it.
    Like the following mail:
    Subject: Test E-mail archief co�rdinatie Els
    Content: Test co�rdinatie r�d�marrage ... test weird characters � � �
    In the web application itself, everything is fine...but when i send this mail back to my inbox, the subject gets all messed up:
    =?ANSI_X3.4-1968?Q?EMAILARCHIVE_*20060419007419*_Tes?=
    =?ANSI_X3.4-1968?Q?t_E-maill_archief_co=3Frdinatie_Els?=
    The content appears to be fine.
    We discovered this problem recently, and a lot of effort and searching has been done to solve it.
    Our solution was to put the following line in catalina.sh , with what our Tomcat 4.1 webserver starts.
    CATALINA_OPTS="-server -Dfile.encoding=cp1252"
    On my Local Win2K computer, the encoding didn't pose a problem, so catalina.sh wasn't changed. It was only a problem (during testing) on our Linux test server ... a VMWare server which is a copy of our production environment.
    On the VMWare, i added the line to the catalina.sh file. And it worked fine.
    Problem Solved !
    Yesterday, we were putting the archive in production. On our production server ... BANG --> NullPointerException.
    We thought it has something to do with jars he couldn't find, older jars, cache of tomcat ... but none of this solved the problem.
    We put the old version back into production, but the same NullPointerException occured.
    We then put the "CATALINA_OPTS="-server -Dfile.encoding=cp1252" " line in comment ... and then it worked again.
    We put the new version into production (without the file encoding line), and it worked perfectly, except for those weird ANSI characters.
    Anyone have any experience with this?
    I use that same file encoding to start a batch, but there i call it Cp1252 (with a capital C) ... might that be the problem? But i have to be sure...because the problem doesn't occur in the test environment, and i can't just test in production ... and switch off the server whenever i'd like to.
    Does anyone see if making cp1252 --> Cp1252 might be a solution, or does anyone have another solution?
    Thanks in advance.

    First, I will start by saying that JInitiator was not intended to run on Win7, especially 64bit. So, it may be time to think about moving to the Java Plugin. Preferably one which is certified with your Forms version.
    To your issue, I suspect you need to change the "Region and Language" settings on the client machine. This can be found on the Control Panel. If that doesn't help, take a look at this:
    http://stackoverflow.com/questions/4850557/convert-string-from-codepage-1252-to-1250

  • Jinitiator 1.3.1.2.6 on win 7 64 and win xp (different file.encoding)

    Hello,
    our customer has moved from windows XP to Windows 7 and he uses Jinitiator 1.3.1.2.6...
    In some "Forms" I have implemented a PJC to save datas from clob to local file system..
    But there is a problem....
    If I run the same application with Windows XP I get file.encoding=Cp1250 which is ok....
    If I run the same application with Windows 7 (64) I get file.encoding=CP1252 and here is the problem...
    Is there any way to run Jinitiator (or set up file.encoding to/with) Cp1250?
    Maybe is this a local problem with windows?
    thank you..

    First, I will start by saying that JInitiator was not intended to run on Win7, especially 64bit. So, it may be time to think about moving to the Java Plugin. Preferably one which is certified with your Forms version.
    To your issue, I suspect you need to change the "Region and Language" settings on the client machine. This can be found on the Control Panel. If that doesn't help, take a look at this:
    http://stackoverflow.com/questions/4850557/convert-string-from-codepage-1252-to-1250

  • File encoding with UTF-8

    Hello all,
    My scenario is IDoc -> XI -> File (txt).
    Everything was working fine until i have to handle eastern european language with weird symbol
    So in my file adapter receiver, i'm using the file encoding code UTF-8 and when i look my field in output, everything is fine.
    BUT, when i look in binary, the length of these field is not longer fixed because a special character takes 2 bytes instead of one.
    I would like to know if it's possible to handle those characters with a file encoding code UTF-8 in a fixed length field of 40 characters for example don't want a variable length for my fields...
    Thanks by advance,
    JP

    I agree with you. In XI, i don't have this problem, i have it in my ouput file when i edit my text file in binary mode !
    My field should be on 40 characters but the special symbol which take 2 bytes instead of 1 make the length of my output fields variable !!!
    My question was to know if there is a way to have a fixed length in my output file..
    Sorry if i wasn't clear in my first post.
    JP

  • Is it possible to change the default file encoding?

    I have just learned that the "file.encoding" system property should be treated as read-only.
    (http://developer.java.sun.com/developer/bugParade/bugs/4163515.html)
    I am using this property to tell javac that the command arguments file has some other encoding than the system deafult, like this:
    javac -J-Dfile.encoding=UTF-8 @files-to-compile.lst
    On windows xp with us english locale it worked for all the SDK releases I checked, but for Windows 2000 Japanese Edition only one of the J2SDK 1.4.1 releases worked.
    My question is: is there an acceptable way to tell the JVM what the default encoding is? Or inform javac about the encoding of the argument file?
    The reason for having a UTF-8 encoded javac argument list file is that our application generates Java source files that can have unicode characters in their names. Seemingly Windows supports unicode file names so I did not want to restrict file names to those supported by the system encoding.

    Use javac's "-encoding" option.
    $ javac 
    Usage: javac <options> <source files>
    where possible options include:
      -g                        Generate all debugging info
      -g:none                   Generate no debugging info
      -g:{lines,vars,source}    Generate only some debugging info
      -nowarn                   Generate no warnings
      -verbose                  Output messages about what the compiler is doing
      -deprecation              Output source locations where deprecated APIs are used
      -classpath <path>         Specify where to find user class files
      -sourcepath <path>        Specify where to find input source files
      -bootclasspath <path>     Override location of bootstrap class files
      -extdirs <dirs>           Override location of installed extensions
      -d <directory>            Specify where to place generated class files
      -encoding <encoding>      Specify character encoding used by source files
      -source <release>         Provide source compatibility with specified release
      -target <release>         Generate class files for specific VM version
      -help                     Print a synopsis of standard options

  • Encoding questions

    Hello,
    I have some encoding questions, my projects are PAL-AVCHD-Full HD 1080i 25 5.1 channel. I export the videos in Windows Media, here come the questions.
    What’s the difference between one and two encoding passes?
    What does “Allow interlaced processing” do?
    Thanks,
    Enrique

    Enrique,
    Let's look at the 1-pass vs 2-pass Encoding.
    When one sets the general parameters for Encoding and chooses 1-pass, the parameters are applied over the entire Timeline. This is the quickest method for Encoding and will be pretty good, based on the chosen parameters.
    With 2-pass, the Encoding program will first look at the footage, in the scheme of the chosen parameters, but while doing so, will look for faster motion (both camera and subject) in the Timeline. Where it finds the motion, it will mark it for the highest setting in the parameters. The highest settings will be applied to the sections with the most motion, and the "average" will be applied to the rest. This yields a better Encoded file, especially where there is motion. This Encoding method takes longer, as the Encoder must look at all of the footage first to decide what to do, and then to apply the Encoding parameters.
    The Encoders used by Hollywood, are usually at least 9-pass, and often quite a bit more. They are also run by experts, who do nothing but Encode all day, and are highly-paid for their work and their expertise. This is why the high motion footage in a commercial DVD will look smoother, than what we can accomplish, and the file will likely even be smaller. Those Encoding programs cost hundreds of thousands of $, so it's not like we could download one and use it, even if we were experts and could figure out the settings.
    So to wrap up, a good Encoding engine will likely do a fair job when set to 1-pass, but especially with higher motion footage, likely a better job with 2-pass, and the only cost is time. Unless I am trying to Encode a quick reference file, I will always use multi-pass schemes, to get the best possible output, and just have a cup, or two, of coffee, while I wait. Also, there are better Encoders, than the ones included in our NLE's and authoring programs. Grass Valley's ProCoder is one that gets really high marks - this side of Hollywood.
    Good luck,
    Hunt

  • I got new hard driver for my MacBook I don't have the cd but I do have flash drive that has the software I need help because when I turn on my laptop it shows me a file with question mark how can I install the software from the flash driver?

    I got new hard driver for my MacBook I don't have the cd but I do have flash drive that has the software I need help because when I turn on my laptop it shows me a file with question mark how can I install the software from the flash driver?

    Hold down the Option key while you boot your Mac. Then, it should show you a selection of devices. Click your flash drive and it will boot from that.

  • How to set File Encoding to UTF-8 On Save action in JDeveloper 11G R2?

    Hello,
    I am facing issue when I am modifying a File using JDeveloper 11G R2. JDeveloper is changing the Encoding of the File to System default Encoding (ANSI) instead of UTF-8. I have updated the Encoding to UTF-8 in "Tools | Preferences | Environment | Encoding" option and restarted the JDeveloper. I have also updated "Project Properties | Compiler | Character Encoding" option to UTF-8. None of them are working.
    I am using below version of JDeveloper,
    Oracle JDeveloper 11g Release 2 11.1.2.3.0
    Studio Edition Version 11.1.2.3.0
    Product Version: 11.1.2.3.39.62.76.1
    I created a file in UTF-8 Encoding. I opened it, do some changes and Save it.
    When I open the "Properties" tab using "Help | About" Menu, I can see that the Properties of JDeveloper are showing encoding as Cp1252. Is it related?
    Properties
    sun.jnu.encoding
    Cp1252
    file.encoding
    Cp1252
    Any idea how to make sure JDeveloper saves the File in UTF-8 always?
    - Sujay

    I have already done that. That is the first thing I did as mentioned in my Thread. I have also added below 2 options in jdev.conf and restarted JDeveloper, but that also did not work.
    AddVMOption -Dfile.encoding=UTF-8
    AddVMOption -Dsun.jnu.encoding=UTF-8
    - Sujay

  • How to set the file.encoding in jvm?

    I have some error in showing Chinese by used servlet,
    some one tole me that I can change the file.encoding in jvm to zh_CN, how can I do that?

    Add the java argument in your servlet engine.
    e.g
    java -Dfile.encoding=ISO8859-1
    garycafe

  • File encoding in sender file comunication channel

    hello everyboy,
    i have a strange situation.
    2 PI 7.0 installation: develop and production. Identical. Same SP level, java vm, etc etc
    I have a interface file to idoc.
    File sender comunication channel are FTP and with content conversion.
    They are identical!!!!!!
    but....
    in production i added parameter File Encoding = ISO-8859-1 because if i have presence of strange characters....it work better.
    the same files...in develop installation, they work without this parameter.
    why?
    there are a place maybe in Config Tool or j2ee admin tool where is set this parameter?
    thanks in advance
    Edited by: apederiva on Mar 12, 2010 3:55 PM

    Hi,
    Make sure your both the systems are unicode so that you will not have any issues. Also please see this document for how to work with character encodings in PI. Also we dont have any special config in j2ee admin tool.
    http://www.sdn.sap.com/irj/scn/go/portal/prtroot/docs/library/uuid/502991a2-45d9-2910-d99f-8aba5d79fb42?quicklink=index&overridelayout=true
    Regards,
    ---Satish

  • HOW SPECIFY FILE.ENCODING=ANSI FORMAT IN J2SE ADAPTER.

    Hi All,
    we are using j2se plain adapter,   we need the outputdata in ANSI FORMAT.
    Default file.encoding=UTF-8
    how to achive this.
    thanks in advance.
    Regards,
    Mohamed Asif KP

    File adapter would behave in a similar fashion on J2ee. Providing u the link to ongoing discussion
    is ANSI ENCODING possible using file/j2see adapter
    Regards,
    Prateek

  • Text File Encoding used by TextEdit/OS X

    Hi all folks,
    does someone know the code page are used with the text file encoding "Western (EBCDIC US)"
    available from the "Customize Encodings List" in the TextEdit "Plain Text File Encoding" Preferences.
    The text file encoding "Western (EBCDIC Latin 1)" works well, but "EBCDIC US" does not,
    the character set is very limited.
    Thanks for any help,
    Lutz

    Yeah unfortunately they're all listed as 0kb files. I guess that means the faulty hard drive didn't transfer them properly, even though the Mac did the copy confirmation sound.
    Hundreds of folio files... all gone. ;___;

  • XI File Adapter Custom File Encoding for  issues between SJIS and CP932

    Dear SAP Forum,
    Has anybody found a solution for the difference between the JVM (IANA) SJIS and MS SJIS implementation ?
    When users enter characters in SAPGUI, the MS SJIS implementation is used, but when the XI file adapter writes SJIS, the JVM SJIS implementation is used, which causes issues for 7 characters:
    1. FULLWIDTH TILDE/EFBD9E                 8160     ~     〜     
    2. PARALLEL TO/E288A5                          8161     ∥     ‖     
    3. FULLWIDTH HYPHEN-MINUS/EFBC8D     817C     -     −     
    4. FULLWIDTH CENT SIGN/EFBFA0             8191     ¢     \u00A2     
    5. FULLWIDTH POUND SIGN/EFBFA1            8192     £     \u00A3     
    6. FULLWIDTH NOT SIGN/EFBFA2              81CA     ¬     \u00AC     
    7. REVERSE SOLIDUS                             815F     \     \u005C
    The following line of code can solve the problem (either in an individual mapping or in a module)
    String sOUT = myString.replace(\u0027~\u0027,\u0027〜\u0027).replace(\u0027∥\u0027,\u0027‖\u0027).replace(\u0027-\u0027,\u0027−\u0027).replace(\u0027¢\u0027,\u0027\u00A2\u0027).replace(\u0027£\u0027,\u0027\u00A3\u0027).replace(\u0027¬\u0027,\u0027\u00AC\u0027);
    But I would prefer to add a custome Character set to the file encoding. Has anybody tried this ?

    Dear SAP Forum,
    Has anybody found a solution for the difference between the JVM (IANA) SJIS and MS SJIS implementation ?
    When users enter characters in SAPGUI, the MS SJIS implementation is used, but when the XI file adapter writes SJIS, the JVM SJIS implementation is used, which causes issues for 7 characters:
    1. FULLWIDTH TILDE/EFBD9E                 8160     ~     〜     
    2. PARALLEL TO/E288A5                          8161     ∥     ‖     
    3. FULLWIDTH HYPHEN-MINUS/EFBC8D     817C     -     −     
    4. FULLWIDTH CENT SIGN/EFBFA0             8191     ¢     \u00A2     
    5. FULLWIDTH POUND SIGN/EFBFA1            8192     £     \u00A3     
    6. FULLWIDTH NOT SIGN/EFBFA2              81CA     ¬     \u00AC     
    7. REVERSE SOLIDUS                             815F     \     \u005C
    The following line of code can solve the problem (either in an individual mapping or in a module)
    String sOUT = myString.replace(\u0027~\u0027,\u0027〜\u0027).replace(\u0027∥\u0027,\u0027‖\u0027).replace(\u0027-\u0027,\u0027−\u0027).replace(\u0027¢\u0027,\u0027\u00A2\u0027).replace(\u0027£\u0027,\u0027\u00A3\u0027).replace(\u0027¬\u0027,\u0027\u00AC\u0027);
    But I would prefer to add a custome Character set to the file encoding. Has anybody tried this ?

  • File.encoding - Help

    Hi,
    We are converting German and English DATA to XML. And
    German data (UTF-8) is running through the system fine if I logged
    and tested. But strangely its giving ?? chars when I run as a
    different user. (Sun Solaris / JDK 1.4)
    When I looks at the properties.
    for me:
    file.encoding = ISO8859-1
    for the other user its
    file.encoding=ISO646-US
    Where is this property being set, and how can I change it. This is
    killing me.. I could not find any thing on my system that enforces the
    char set. I am not using any special writers or readers..
    Thanks in advance
    - Ravi

    I do not think that the Latin-1 Supplement characters
    \u00c4\u00e4\u00df\u00dc\u00fc\u00d6\u00f6
    (used in German)
    which are encoded in UTF-8 can be displayed correctly with other encoding than UTF-8, to my knowledge.
    I suppose your browser automatically sets encoding to UTF-8 when reading those data while other users' browsers do not.

Maybe you are looking for

  • Looking for a Business Partner that works with Collab Suite

    I am currently looking at various "collaboration" solutions on the market today. MS, Lotus, Oracle - in an effort to find the best for my company. The oracle is the only one that offers the Voice/Email integration...which is interesting for me...my q

  • Video from iPhone won't play on iPad

    I received a video from iPhone in an email but iPad won't play it? Any way to get it to play on iPad.

  • REQUEST for ARMv6 or lower support !!!

    As there are many people using older version of andriod phone and the new Samsung Galaxy Y is being huge in countries like India, it's time to have Adobe AIR support running on it. I did by own research on friends around me who using Andriod phone an

  • Why is my lexmark printer suddenly not printing?

    This is what pops up:  /Library/Printers/Lexmark/filter/CUPSDriver.app/Contents/MacOS/CUPSDriver failed Please help!

  • 8.2 update - unable to upload photos from Camera app

    I downloaded the 8.2 update, and now I can't seem to upload photos from the Camera app to Facebook when I check-in. It only brings up my videos from the Camera app, not the photos. Does anyone know how to fix this?