Garbage characters when retrieving HTML via Java

I wanted to use Java to extract my characters profile from http://www.magelo.com. The results returned from the URL are basically garbled characters; however retrieving cnn.com or yahoo.com the results are fine. So, the only thing I can think of is that the magelo webserver detects that its Java and, since it may not want people to not use this approach for mining concerns, changes the output to garbage. I tried setting the connections property "User-Agent" and others to mimic a browser, but nothing worked...the only way I can view non-garbaled HTML is by using a webbrowser. Not exactly sure that this is what is going on -- but I can't come up with another explaination.

code & garbage.

Similar Messages

  • Multibyte character was garbage characters, when multipart requested (Multipartリクエストで文字化けが発生する) on WebLogic12(12.1.2.0)

    When using File Upload functionality of Servlet3 specification, other item's value(<input type="text">) was garbage characters.
    Need special settings?
    WebLgic12c(12.1.2.0)のファイルアップロード機能(Servlet3仕様の機能)にて、アップロードファイル以外の項目の値が文字化けしました。
    これは、何か設定が必要なのでしょうか?
    【Note】
    When normal request(application/x-www-form-urlencoded), submitted value is not garbage characters.
    Filename & File content of uploaded file is not garbage characters.
    I confirmed by debugger that stored value in temporary file is not garbage characters.
    HttpServletRequest#setCharacterEncoding("UTF-8") is used.
    enctype="multipart/form-data"を指定しないリクエストでは文字化けは発生していません。
    アップロードしたファイルのファイル名及びアップロードファイルの中身自体は文字化けしていない。
    アップロード時に出力される一時ファイルの中身をデバッグ実行して確認したところ、この段階では文字化けしていなかった。
    HttpServletRequest#setCharacterEncoding("UTF-8")も実行しています。
    【Environment Information】
    OS : MacOS X 10.8.5
    JVM : Oracle Java7
    VM Encoding : UTF-8 (-Dfile.encoding=UTF-8)
    WebPage Encoding : UTF-8
    OS LANG : LANG=ja_JP.UTF-8
    IDE : STS(Spring Tool Suite)
    Boot Platform : WTP for Weblogic12.1.2.0
    Framework : Spring MVC(3.2.4)
    I want to know how to solve this behavior.
    なにかご存知の方いましたら、解決方法をご教授頂ければと思います。
    Message was edited by: user11123661 modified main language.(japanese -> english).

    The basic problem is not obscure, it has come up countless times since Tiger was released. See this note and try Fix C (dingbat) to see if it will help:
    http://homepage.mac.com/thgewecke/woutlook.html

  • Leading garbage characters when using CipherInputStream

    So, after receiving an encrypted message, I can decrypt it perfectly except that I get a random amount of leading garbabe characters. Using the same plaintext, here are examples of the beginning of the output file for two runs (using od -c to look at the files):
    0000000 315 7 004 371 242 \0 w ` t h e L L S E
    and
    0000000 1 " 246 317 0 j 321 V t h e L L S E
    The part beginning with "The LLSE..." is correct.
    The fact that the leading garbage appears to be random leads me to believe that there is something wrong with the crypto rather with the file I/O.
    Here's the relevant code chunk:
    Cipher sharedCypher = Cipher.getInstance("DES/CFB8/PKCS5Padding");
    SecretKeySpec DESKeySpec = new SecretKeySpec(clientDESKey.getEncoded(), "DES");
    IvParameterSpec iv = new IvParameterSpec(clientDESKey.getEncoded());
    sharedCypher.init(Cipher.DECRYPT_MODE, DESKeySpec, iv);
    CipherInputStream cis = new CipherInputStream(new FileInputStream(WD), sharedCypher);     
    String time = ""+System.currentTimeMillis();
    File outputFileW = new File("/tmp/new_wireless_data."+time);
    System.out.println(System.currentTimeMillis()+": Output file is" + outputFileW.getAbsolutePath());
    outputFileW.createNewFile();
    FileOutputStream fos = new FileOutputStream(outputFileW);
    byte[] putCypherBytes = new byte[8];
    int i=0;
    while((i=cis.read(putCypherBytes)) !=  -1) {
       fos.write(putCypherBytes, 0, i);
    }Any thoughts on cleaning this up would be greatly appreciated.
    Best,
    Glenn

    Well, I seriously doubt its the JDK or JCE. I dont have time right now to
    download that exact version and test it, but instead I'll give you the code
    I pieced together from your posts. Put your text to be encrypted in a file
    called plaintext.txt in the directory you run the class in. The decrypted
    text should be appear in new_wireless_data...
    If you can run this without error, then your problem most likely lies
    on your server-side.
    import java.io.File;
    import java.io.FileInputStream;
    import java.io.FileOutputStream;
    import javax.crypto.Cipher;
    import javax.crypto.CipherInputStream;
    import javax.crypto.SecretKey;
    import javax.crypto.spec.IvParameterSpec;
    import javax.crypto.spec.SecretKeySpec;
    import sun.misc.BASE64Decoder;
    public class CryptoTest {
        public static String encryptedFile = "encrypted.txt";
        public static String plainTextFile = "plaintext.txt";
        public static SecretKey clientDESKey;
        public static String keyString = "emePXfjmLNw=";
        public static void main(String [] args) {
            try {
                BASE64Decoder base64Decoder = new BASE64Decoder();
                byte[] keyBytes = base64Decoder.decodeBuffer(keyString);
                clientDESKey = new SecretKeySpec(keyBytes, "DES");
                _main(args);
            } catch (Exception e) {
                System.out.println("ERROR! : " + e.getMessage());
                e.printStackTrace();
        public static void _main(String [] args) throws Exception {
            Cipher serverCypher = Cipher.getInstance("DES/CFB8/PKCS5Padding");
            serverCypher.init(Cipher.ENCRYPT_MODE, clientDESKey,
                new IvParameterSpec(clientDESKey.getEncoded()));
            File inputFileW = new File(plainTextFile);        
            CipherInputStream cis1 =
                new CipherInputStream(
                    new FileInputStream(inputFileW),serverCypher);               
            File outputFileW1 = new File(encryptedFile);
            FileOutputStream fos1 = new FileOutputStream(outputFileW1);
            byte[] putCypherBytes1 = new byte[8];
            int i1=0;
            while((i1=cis1.read(putCypherBytes1)) !=  -1) {
               fos1.write(putCypherBytes1, 0, i1);
            inputFileW.delete();
            //=========================================================
            Cipher sharedCypher = Cipher.getInstance("DES/CFB8/PKCS5Padding");
            SecretKeySpec DESKeySpec = new SecretKeySpec(clientDESKey.getEncoded(), "DES");
            IvParameterSpec iv = new IvParameterSpec(clientDESKey.getEncoded());
            sharedCypher.init(Cipher.DECRYPT_MODE, DESKeySpec, iv);
            CipherInputStream cis = new CipherInputStream(new FileInputStream(encryptedFile), sharedCypher);     
            String time = ""+System.currentTimeMillis();
            File outputFileW = new File("new_wireless_data."+time);
            System.out.println(System.currentTimeMillis()+": Output file is" + outputFileW.getAbsolutePath());
            outputFileW.createNewFile();
            FileOutputStream fos = new FileOutputStream(outputFileW);
            byte[] putCypherBytes = new byte[8];
            int i=0;
            while((i=cis.read(putCypherBytes)) !=  -1) {
               fos.write(putCypherBytes, 0, i);
    }

  • Garbage characters when using CR XI R1 with Sharp M350 printer PCL5e driver

    My application uses CR XI R1. We have a customer who has a Sharp M350 printer. When printing or previewing a report from my application with this printer set as the default, the text is garbled - the characters look like gibberish.
    If they change the default printer to another printer, it works fine. but they need to be able to print to the Sharp printer so I need to find a solution to the problem.
    Changing the printer driver to a PCL6 has no effect. Does anyone have any suggestions for fixing this?

    A few questions:
    1) What development language are you using?
    2) Have you ever applied any CR Service Packs?
    3) Have you checked for any updated for the Sharp M350 printer driver?
    4) Can you duplicate the issue on your development system?
    5) Are you able to print correctly to that printer using the CR designer? (even if you have to as a test, install it on one of the client machines?)
    6) What Crystal Reports SDK are you using? RDC? If so, what is the CR dlls referenced in your app?
    Ludek

  • Garbage characters displayed when saving InfoView Page Layout in Chinese

    HI,
    I'm new to BO, so I don't know whether I'm giving required information or not. We integrated BO with our product.Following are the configuration details.
    Server: win2k3 sp2
    DataBase: Embedded Sybase SQL
    Clinet: winxp sp3(Chinese) + FF3.6.8
    I logged in to BO InfoView with Chinese language as "Product Locale". Went to InfoView Page Layout Page , selected Save As option. Now I can see some garbage characters along with some Chinese characters in the Title text field. I want to eliminate those garbage characters.
    I want to know the following.
    1. what could be the reason to get such type of garbage characters there ?
    2. How to eliminate them ?
    3.Are we missing any required data while integrating BO with our product due to which this problem comes ?
    Thanks in Advance,
    Vasu

    HI,
    I'm new to BO, so I don't know whether I'm giving required information or not. We integrated BO with our product.Following are the configuration details.
    Server: win2k3 sp2
    DataBase: Embedded Sybase SQL
    Clinet: winxp sp3(Chinese) + FF3.6.8
    I logged in to BO InfoView with Chinese language as "Product Locale". Went to InfoView Page Layout Page , selected Save As option. Now I can see some garbage characters along with some Chinese characters in the Title text field. I want to eliminate those garbage characters.
    I want to know the following.
    1. what could be the reason to get such type of garbage characters there ?
    2. How to eliminate them ?
    3.Are we missing any required data while integrating BO with our product due to which this problem comes ?
    Thanks in Advance,
    Vasu

  • Garbage characters in iTunes when copying songs in chinese language

    I copied some English and Chinese songs from my external harddisk to iTunes. Instead of displaying the artists and song name in simplified/traditional chinese for the Chinese songs, it is showing garbage characters. The English songs are displaying correctly. What should I do?

    Tried to convert ID3 tag(all ID3 versions and reverse unicode) but still doesn't work.
    They may need to be converted to Unicode from a legacy chinese encoding. Try ConvertZ
    http://www.bumpersoft.com/Educationand_Science/Languages/ConvertZ12649.htm

  • Garbage Characters in Netscape

    Hi,
    I open a 'jsp' in a new popup (window.open). Netscape shows some garbage characters at the top of the page. (Works fine with IE). I tried deleting ALL the code from the popup jsp to trace out the problem (even removed the HTML tags). But Netscape still shows the garbage characters. Where are they coming from??? Any help is appreciated!
    Ashish Bhave

    Hi,
    That is very strange, I've seen this only once before when reporting an error into an error cluster indicator over a real time target but this was a one time event. Does the PC you're on have any issues other than this, i.e. occasional Blue Screens or crashes? The only thing I can think of is a memory location on your PC that's having issues and occasionally LabVIEW is using this space.
    It may be worth calling into your local branch or e-mailing direct via www.ni.com/support they may recommend a re-installation of the NI software and provide you with a tool to ease this process. But this is certinaly the first time I've heard of this on the LabVIEW dialog boxes! Have any changes been made to the machine itself in terms of language additions or software addition/removal?
    Kind Regards,
    Applications Engineer

  • Calling EJB with HTML via SERVLET

    Hi,
    I used a writen example that calls EJB from HTML via SERVLET. Example name is Bonus. The problem I have is that the HTML throw error while calling SERVLET. I dont figure out what seams to be a problem. Someone know?
    I wonder if the problem is in servlet? The EJB is fine!
    christian
    HTML CODE:(bonus.html)
    <HTML>
    <HEAD>
    <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=windows-1250"/>
    <TITLE>untitled1</TITLE>
    </HEAD>
    <BODY BGCOLOR = "WHITE">
    <BLOCKQUOTE>
    <H3>Bonus Calculation</H3>
    <FORM METHOD="GET" ACTION="BonusAlias">
    <P>Enter social security Number:<P>
    <INPUT TYPE="TEXT" NAME="SOCSEC"></INPUT>
    </P>
    Enter Multiplier:
    <P>
    <INPUT TYPE="TEXT" NAME="MULTIPLIER"></INPUT>
    </P>
    <INPUT TYPE="SUBMIT" VALUE="Submit">
    <INPUT TYPE="RESET">
    </FORM>
    </BLOCKQUOTE>
    </BODY>
    </HTML>
    SERVLET CODE:(BonusServlet.java)
    package mypackage5;
    import mypackage5.Calc;
    import mypackage5.CalcHome;
    import mypackage5.impl.CalcBean;
    import javax.servlet.*;
    import javax.servlet.http.*;
    import java.io.*;
    import javax.naming.*;
    import javax.rmi.PortableRemoteObject;
    import java.beans.*;
    public class BonusServlet extends HttpServlet {
    CalcHome homecalc;
    public void init(ServletConfig config) throws ServletException{
    //Look up home interface
    try{
    //InitialContext ctx = new InitialContext();
    //Object objref = ctx.lookup("Calc");
    //homecalc = (CalcHome)PortableRemoteObject.narrow(objref, CalcHome.class);
    Context context = new InitialContext();
    CalcHome calcHome = (CalcHome)PortableRemoteObject.narrow(context.lookup("Calc"), CalcHome.class);
    Calc calc;
    catch (Exception NamingException) {
    NamingException.printStackTrace();
    public void doGet (HttpServletRequest request, HttpServletResponse response)
    throws ServletException, IOException {
    String socsec = null;
    int multiplier = 0;
    double calc = 0.0;
    PrintWriter out;
    response.setContentType("text/html");
    String title = "EJB Example";
    out = response.getWriter();
    out.println("<HTML><HEAD><TITLE>");
    out.println(title);
    out.println("</TITLE></HEAD><BODY>");
    try{
    Calc theCalculation;
    //Get Multiplier and Social Security Information
    String strMult = request.getParameter("MULTIPLIER");
    Integer integerMult = new Integer(strMult);
    multiplier = integerMult.intValue();
    socsec = request.getParameter("SOCSEC");
    //Calculate bonus
    double bonus = 100.00;
    theCalculation = homecalc.create();
    calc = theCalculation.calcBonus(multiplier, bonus);
    catch (Exception CreateException){
    CreateException.printStackTrace();
    //Display Data
    out.println("<H1>Bonus Calculation</H1>");
    out.println("<P>Soc Sec: " + socsec + "<P>");
    out.println("<P>Multiplier: " +
    multiplier + "<P>");
    out.println("<P>Bonus Amount: " + calc + "<P>");
    out.println("</BODY></HTML>");
    out.close();
    public void destroy() {
    System.out.println("Destroy");

    The error is that page cannot be found! When I run only the servlet it works, when I run the HTML page and enter the field throws eror that the page cannot be found!
    thanks
    Christian

  • Removing numerals/garbage characters from search in Section 508 build WebHelp

    I need to remove (prevent inclusion of) numerals and garbage characters from search results in WebHelp when compiled with Section 508 Output enabled. I need to have Section 508 enabled. Can that be done?
    Thanks for your time!

    Thanks, Jeff. I tried including the characteres in the Stop list and recompiling. (I even closed the project and reopened.) The characters still appear. The characters I'm trying to remove are: !, #, ', (, ), -, /, :, ;, and ,. I am also trying to exclude some numbers (100.00usd, 1999.99, 2999.99, 3999.99, 4999.99, and 6pm).
    The Stop list consists of those characters and the default text. Forgot to mention is the first post ... I'm using RH9.
    The Section 508 flag does create 508-compliant HTML output and that is one of the requirements for this help system. This is the first time I am enabling this flag.

  • AIX + Korean : garbage characters

    I have tomcat installed on AIX machine. When the user enters some korean characters I get some garbage characters at tomcat side. This issue is only if tomcat is on AIX machine. Anybody aware of this issue?

    and again missunderstood ;-)
    begin owa_util.print_cgi_env; end;
    =
    PLSQL_GATEWAY = WebDb
    GATEWAY_IVERSION = 3
    SERVER_SOFTWARE = Oracle-Application-Server-10g/9.0.4.0.0 Oracle-HTTP-Server
    GATEWAY_INTERFACE = CGI/1.1
    SERVER_PORT = 7780
    SERVER_NAME = los-bd4.intranet.l-os.de
    REQUEST_METHOD = POST
    PATH_INFO = /wwv_flow.show
    SCRIPT_NAME = /pls/htmldb
    REMOTE_ADDR = 10.220.110.200
    SERVER_PROTOCOL = HTTP/1.1
    REQUEST_PROTOCOL = HTTP
    REMOTE_USER = HTMLDB_PUBLIC_USER
    HTTP_CONTENT_LENGTH = 297
    HTTP_CONTENT_TYPE = application/x-www-form-urlencoded
    HTTP_USER_AGENT = Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.8.1.3) Gecko/20070309 Firefox/2.0.0.3
    HTTP_HOST = 10.220.110.22:7780
    HTTP_ACCEPT = text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
    HTTP_ACCEPT_ENCODING = gzip,deflate
    HTTP_ACCEPT_LANGUAGE = de-de,de;q=0.8,en;q=0.5,en-us;q=0.3
    HTTP_ACCEPT_CHARSET = ISO-8859-1,utf-8;q=0.7,*;q=0.7
    HTTP_REFERER = http://10.220.110.22:7780/pls/htmldb/f?p=4500:1003:2050165973690504::NO:::
    HTTP_ORACLE_ECID = 1177512000:10.220.110.22:2296:0:624,0
    WEB_AUTHENT_PREFIX =
    DAD_NAME = htmldb
    DOC_ACCESS_PATH = docs
    DOCUMENT_TABLE = wwv_flow_file_objects$
    PATH_ALIAS =
    REQUEST_CHARSET = AL32UTF8
    REQUEST_IANA_CHARSET = UTF-8
    SCRIPT_PREFIX = /pls
    HTTP_COOKIE = ISCOOKIE=true; ORACLE_PLATFORM_REMEMBER_UN=MERKER:zentral; WWV_FLOW_USER2=5352414403960629
    Anweisung wurde verarbeitet.

  • Garbage Characters in CDE Window Title Bar

    I recently patched our Solaris 8 Sun workstations (a mixture of Blade 150, Ultra10, Blade 1500, and Blade 2000) with the recommended patch cluster from December 19th, 2005. At the same time, I updated the systems' Java to 1.4.2_10, and installed Update 5 to StarOffice7, which is still on the systems along with the newer StarOffice8.
    When initially installed before these recent patches, StarOffice8 behaved correctly. After updating the systems, when I open StarOffice8 in a CDE session, I get garbage characters in the title bar of the window. The letters and fonts in the application menus are normal. StarOffice8 windows under Gnome sessions are titled correctly.
    I've read some emails about issues with LC_CTYPE and setting it to en_US.ISO8859-15, using a wrapper script to start soffice. The current setting on my machines is en_US.ISO8859-1. That solution doesn't work consistently on different machines (or even on the same machine).
    In fact, even without attempting to solve the problem, there is inconsistent behavior. On one Ultra10 which I have as a testbed machine, StarOffice8 behaves correctly, whether I'm logged in as a normal user, or as root. On a Blade 1500, it behaves correctly when logged in as root, but not as a normal user. I've also patched StarOffice8 with Update1 which was released today, but it doesn't fix the problem.
    Anybody else having similar problems after recent patching? Or have any suggestions for a solution?
    Jeff Bailey

    More Info on Unsolved Problem:
    If I use Mozilla to open a document file, with soffice set as the helper application, the CDE window title displays properly. If I leave that instance of StarOffice open, and open new documents using the menus within StarOffice, subsequent CDE titles also display normally. Also with that original Mozilla-driven StarOffice application still open, if I use "soffice whatever.doc" on a command line, those CDE windows appear normal.
    Mozilla is set to use "en_US" as the language for webpages, with Western ISO-8859-1 as the default character coding. The Solaris 8 workstation's /etc/default/init file is configured as:
    TZ=US/Eastern
    CMASK=022
    LC_COLLATE=en_US.ISO8859-1
    LC_CTYPE=en_US.ISO8859-1
    LC_MESSAGES=C
    LC_MONETARY=en_US.ISO8859-1
    LC_NUMERIC=en_US.ISO8859-1
    LC_TIME=en_US.ISO8859-1
    While the Mozilla "wrapper" is a workaround for now, I don't see it as a final solution.
    Jeff Bailey

  • Garbage characters in excel file opened by jsp

              I am storing an excel file as blob in database. While retrieving
              from database when I open in the jsp page , it shows a lot of garbage characters
              and all formatting is lost. I am using content type "application/vnd.ms-excel".
              I am also setting correct mime type in web.xml as application/excel. It is weblogic
              6.1 sp2 with oracle 8.1.6. the pdf and word docs are working well. Please help
              soon.
              Thanks
              

    Download the Open G Toolkit from www.openg.org. There is a VI called Quit Application.vi that works great. I have used it with the very stupid Brooks 0154 SmartDDE Controller program to reset the application.
    Be sure to save the document and close the DDE communication first.
    Michael
    www.abcdefirm.com
    Michael Munroe, ABCDEF
    Certified LabVIEW Developer, MCP
    Find and fix bad VI Properties with Property Inspector

  • Garbage characters

    I uploaded my first podcast test <http://www.sooline.org/podcasts> and the iWeb pages display garbage characters from my ISP's Unix Web server. It looks fine when viewed from the folder on my hard drive. Is there anything I can do short of manually editing the HTML of every file every time I update the site? Any suggestions or recommendations would be much appreciated. -- Rick
    Dual G5   Mac OS X (10.4.4)  

    See this note for fixes:
    http://homepage.mac.com/thgewecke/iwebchars.html

  • Garbage characters in web pages

    Produced a page in iWeb then uploaded it to my webspace using Secure FTP. Now I have garbage characters in it with  wherever there's a return and a bunch of garbage for every apostrophe.
    Obviously I have some sort of problem with the character set but how to get rid of it?
    I've tried copying text over from a TXT file without the returns, adding them later in iWeb, but it made no difference.

    When I open iWeb
    and open this project, I have no idea which set iWeb
    will end up using. I have to check the file
    attributes afterward to figure out which one was the
    set "du jour".
    You are aware that iWeb does not open published files, right? It only opens the Domain.sites file where its data is kept:
    http://homepage.mac.com/thgewecke/iwebdata.html
    Also iWeb cannot register any changes you make in a published site with an editor, so that needs to be redone every time the site is republished.
    Normally if you have multiple projects it would be a good idea to keep their data in separate Domain files.
    PageSpinner file open function cannot see any folder
    or file created by iWeb except the one called "Sites"
    and that one is empty.
    I have no problem using PageSpinner to open the files created by iWeb. Since iWeb does not create anything called "Sites," I'm wondering if the Open function was going instead to Home/Sites or some other location rather than Home/Documents/HTML or wherever you published your pages.
    As far as my ISP goes, I'll take your word for it,
    but I can't imagine an ISP of this size getting this
    wrong and not receiving/responding to complaints.
    None of the major companies devoted to hosting web pages which most people use have this problem as far as I know. ISP's for whom hosting is a sideline sometimes do not pay attention to the issue and leave their server set to force all browsers to use ISO-8859-1 encoding. They will not likely get any complaints if their clients are all using Roman script and traditional web editors which default to that setting as well.

  • Garbage characters in dialogs

    Something strange started happening this morning...
    I am getting garbage characters in some, but not all of the dialog windows.
    The screenshot attached occured when reloading a VI after changing its location. It's repeatable and happens again after a reboot.
    Anyone seen this kind of behavior? Any solutions?
    Something which may or may not be related - Some weeks ago when building a VI into an executable, the resulting front panel had similar characters in the menu bar. The only way i found to solve that one, was to select "support all languages" in the build specifications - run-time languages (from only English ticked before).
    Labview 2010, TestStand 2010
    Attachments:
    dialog_error.jpg ‏115 KB
    dialog_error.jpg ‏115 KB

    Hi,
    That is very strange, I've seen this only once before when reporting an error into an error cluster indicator over a real time target but this was a one time event. Does the PC you're on have any issues other than this, i.e. occasional Blue Screens or crashes? The only thing I can think of is a memory location on your PC that's having issues and occasionally LabVIEW is using this space.
    It may be worth calling into your local branch or e-mailing direct via www.ni.com/support they may recommend a re-installation of the NI software and provide you with a tool to ease this process. But this is certinaly the first time I've heard of this on the LabVIEW dialog boxes! Have any changes been made to the machine itself in terms of language additions or software addition/removal?
    Kind Regards,
    Applications Engineer

Maybe you are looking for