Can we download a web page using SOA?

Hi,
We have a requirement to download a web page whether html or xml format by using SOA middleware.Is it possible with this? Have anyone tried with this earlier? Any suggestions regarding this will be of great help.
Thanks,
SOA Team

Hello Iyanu_IO,
Thank you for answering my question "can I download a web page as a PDF".
I downloaded and installed the add-on and it works really well.
Regards
Paul_Abrahams

Similar Messages

  • How can I download a web page's text only?

    Hello All
    I have some code (below) which can download the html of a web page and this is usually good enough for me. However, sometimes it would just be easier to work with if I could download just the text only of a given page. And sometimes the text that appears on
    the given page is not even available in the html source, I guess because it is the result of some script or other in the html rather than being definied by the html itself.
    What I would like to know is, is there any way I can download the "final text" of a web page rather than its html source? Even if that text is generated by a script in the html? As far as I can see the only way to do this is to load the page in a
    browser control and then get the document text from that - but that's far from ideal.
    And I dare say somebody out there knows better!
    Anyway, here's my existing code:
    Public Function downloadWebPage(ByVal url As String) As String
    Dim txt, ua As String
    Try
    Dim myWebClient As New WebClient()
    ua = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR 1.0.3705;)"
    myWebClient.Headers.Add(HttpRequestHeader.UserAgent, ua)
    myWebClient.Headers("Accept") = "/"
    Dim myStream As Stream = myWebClient.OpenRead(url)
    Dim sr As New StreamReader(myStream)
    txt = sr.ReadToEnd()
    myStream.Close()
    Return txt
    Catch ex As Exception
    End Try
    Return Nothing
    End Function

    A WebBrowsers DocumentText looks like below. The outertext of each HTML element does not necessarily contain all of the text displayed on a WebBrowser Document. Nor does the HTML innertext or anything in the documents code necessarily contain all of the
    text displayed on a WebBrowser Document. Although I suppose the code tells where to display something from somewhere.
    Therefore you would require
    screen scraping the image displayed in a WebBrowser Control in order to collect all of the characters displayed in the image of the WebBrowser Document.
    <!DOCTYPE html>
    <html>
    <head>
    <meta charset="utf-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>What&#39;s My User Agent?</title>
    <meta content="IE=edge,chrome=1" http-equiv="X-UA-Compatible" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <meta name="author" content="Bruce Horst" />
    <link rel="shortcut icon" href="/favicon.ico" type="image/x-icon" />
    <link href='http://fonts.googleapis.com/css?family=Raleway:400,700' rel='stylesheet' type='text/css'>
    <link href="//maxcdn.bootstrapcdn.com/font-awesome/4.2.0/css/font-awesome.min.css" rel="stylesheet">
    <script src="/js/mainjs?v=Ux43t_hse2TjAozL2sHVsnZmOvskdsIEYFzyzeubMjE1"></script>
    <link href="/css/stylecss?v=XWSktfeyFOlaSsdgigf1JDf3zbthc_eTQFU5lbKu2Rs1" rel="stylesheet"/>
    </head>
    <body>
    Code for OuterHTML using WebBrowser
    Option Strict On
    Public Class Form1
    WithEvents WebBrowser1 As New WebBrowser
    Dim WBDocText As New List(Of String)
    Private Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load
    Me.Location = New Point(CInt((Screen.PrimaryScreen.WorkingArea.Width / 2) - (Me.Width / 2)), CInt((Screen.PrimaryScreen.WorkingArea.Height / 2) - (Me.Height / 2)))
    Button1.Anchor = AnchorStyles.Top
    RichTextBox1.Anchor = CType(AnchorStyles.Bottom + AnchorStyles.Left + AnchorStyles.Right + AnchorStyles.Top, AnchorStyles)
    End Sub
    Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
    WebBrowser1.Navigate("http://www.whatsmyuseragent.com/", Nothing, Nothing, "User-Agent: Mozilla/5.0 (Windows NT 6.1; Trident/7.0; rv:11.0) like Gecko)")
    End Sub
    Private Sub WebBrowser1_DocumentCompleted(sender As Object, e As WebBrowserDocumentCompletedEventArgs) Handles WebBrowser1.DocumentCompleted
    RichTextBox1.Clear()
    WBDocText.Clear()
    Dim First As Boolean = True
    If WebBrowser1.IsBusy = False Then
    For Each Item As HtmlElement In WebBrowser1.Document.All
    Try
    If First = True Then
    First = False
    WBDocText.Add(Item.OuterText)
    Else
    Dim Contains As Boolean = False
    For i = WBDocText.Count - 1 To 0 Step -1
    If WBDocText(i) = Item.OuterText Then
    Contains = True
    End If
    Next
    If Contains = False Then
    WBDocText.Add(Item.OuterText)
    End If
    End If
    Catch ex As Exception
    End Try
    Next
    If WBDocText.Count > 0 Then
    For Each Item In WBDocText
    Try
    RichTextBox1.AppendText(Item)
    Catch ex As Exception
    End Try
    Next
    End If
    End If
    End Sub
    End Class
    Some text returned
    What's My User Agent?
    Your User Agent String is: Analyze
    Mozilla/5.0 (Windows NT 6.1; Trident/7.0; rv:11.0) like Gecko)
    (adsbygoogle = window.adsbygoogle || []).push({}); Your IP Address is:
    207.109.140.1
    Client Information:
    JavaScript Enabled:Yes
    Cookies Enabled:Yes
    Device Pixel Ratio:1DevicePixelRation.com
    Screen Resolution:1366 px x 768 pxWhatsMyScreenResolution.com
    Browser Window:250 px x 250 px
    Local Time:9:48 am
    Time Zone:-7 hours
    Recent User Agents Visiting this Page:
    You!! Mozilla/5.0 (Windows NT 6.1; Trident/7.0; rv:11.0) like Gecko)
    Mozilla/5.0 (compatible; MSIE 10.0; Windows Phone 8.0; Trident/6.0; IEMobile/10.0; ARM; Touch; NOKIA; Lumia 520)
    Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.71 Safari/537.36
    Mozilla/5.0 (compatible; MSIE 10.0; Windows Phone 8.0; Trident/6.0; IEMobile/10.0; ARM; Touch; NOKIA; Lumia 630)
    Mozilla/5.0 (compatible; MSIE 10.0; Windows Phone 8.0; Trident/6.0; IEMobile/10.0; ARM; Touch; NOKIA; Lumia 630)
    curl/7.35.0
    curl/7.35.0
    curl/7.35.0
    curl/7.35.0
    <!-- google_ad_client = "ca-pub-0399362207612216"; /* WMUA LargeRect */ google_ad_slot = "5054480278"; google_ad_width = 336; google_ad_height = 280; //-->
    La vida loca

  • Can't fully download long web-pages using HttpURLConnection

    I'm using these two methods:
    /**keeps document encoding or sets Cp1251 as default
          * @param uri is uri to document
          * @return {@link InputStreamReader} with encoding
         public static InputStreamReader getInputStreamReader(String uri) throws IOException{
              URL url = new URL(uri);
              HttpURLConnection urlConn = (HttpURLConnection)url.openConnection();
              urlConn.setConnectTimeout(timeout);
              urlConn.setDefaultUseCaches( false );
            urlConn.setUseCaches( false );
            urlConn.setRequestProperty( "Cache-Control", "max-age=0,no-cache" );
            urlConn.setRequestProperty( "Pragma", "no-cache" );
            String encoding = urlConn.getContentEncoding();
            if(encoding == null || encoding.equals("") ){
                 String Content_Type = urlConn.getHeaderField("Content-Type");
                 if(Content_Type.contains("charset=")){
                     String[] t = Content_Type.split("charset=");
                     if(t != null && t.length>0){
                          encoding = t[t.length-1];
            if(encoding == null || encoding.equals("")){
                 encoding = "windows-1251";
            InputStreamReader isr = new InputStreamReader(urlConn.getInputStream(), encoding);
            return isr;
         /**Returns content in default charset
          * @param uri is uri to document you want to get
          * @return String with encoding equal to {@link Charset.defaultCharset()}
         public static String fetchEncodedURI(String uri) throws IOException{
              InputStreamReader isr = getInputStreamReader(uri);
              ByteArrayOutputStream baos = new ByteArrayOutputStream();
              OutputStreamWriter osw = new OutputStreamWriter( baos, Charset.defaultCharset());
              char[] buffer = new char[256];
              int n = -1;
              while ( (n = isr.read(buffer))!= -1 ){
                   osw.write(buffer, 0, n);
            String outputString = new String(baos.toByteArray());
            osw.flush();
            osw.close();
            baos.flush();
            baos.close();
            return outputString;
         }The problem is that some pages are cut! I make output of result string (outputString) ans see for example just 2/3 of the real page content.
    Is it possible to overcome this difficulty?
    Edited by: Holod on 01.10.2008 11:31

    ejp wrote:
    Personally, I'd suggest reading the whole page as bytes into baos and covert to string with one call to boas.toString(encoding)Or use a StringWriter. The way you're doing it is the worst of both worlds.Can you help me please with a sample link?
    Or please tell, is it right solution?
             /**Returns content in default charset
          * @param uri is uri to document you want to get
          * @return String with encoding equal to {@link Charset.defaultCharset()}
         public static String fetchEncodedURI(String uri) throws IOException{
              InputStreamReader isr = getInputStreamReader(uri);
              ByteArrayOutputStream baos = new ByteArrayOutputStream();
              //OutputStreamWriter osw = new OutputStreamWriter( baos, Charset.defaultCharset());
              StringWriter sw = new StringWriter();
              char[] buffer = new char[256];
              int n = -1;
              int previousPosititon = 0;
              while ( (n = isr.read(buffer))!= -1 ){
                   sw.write(buffer,previousPosititon,n);
                   previousPosititon+=n;
                   //osw.write(buffer, 0, n);
            //osw.flush();
         //String outputString = new String(baos.toByteArray());
            String outputString = sw.toString();
            //osw.close();
            baos.flush();
            baos.close();
            return outputString;
         }Edited by: Holod on 03.10.2008 5:41

  • How can I download my web pages back into IWeb

    I have created a site with about 10 pages photos everything it is still on web. problem is the local files are not on my MacPro, lost them somehow. Is there a way to get them back from the web?

    No unfortunately not, as iWeb has no import facility and so is unable to open any html or css files/an already published site.
    If you really don't have access to your domain.sites files, which is what you need and these can be found under your User/Library/Application Support/iWeb/domain.sites, then you are look at re-building your site from scratch.
    You can use copy and paste to re-create it, but you are still looking at re-building if you don't have that all important domain.sites file.
    Also, if you have to re-build, then consider using different software instead of iWeb, as Apple no longer actively supports or develops it.
    There are various alternatives such as RapidWeaver, Sandvox, Freeway Express/Pro, Flux 4, WebAcappella 4 and Quick n Easy Website Builder as well as EasyWeb that is in development by Rage Software and Adobe Muse.
    You can download free trials of most of these so that you can try before you buy.

  • How can I capture long web pages using RoboScreen Capture?

    Hi,
    I want to capture long webpages or scrolling windows using RoboScreen Capture. There is no such mode availble in the menu.
    Help would be appreciated.
    Thanks,
    Deepti

    Hi,
    Thank you for the welcome.
    Please have a look at this post
    http://help.adobe.com/en_US/FrameMaker/8.0/help.html?content=Chap-23-TechCommSuite_2.html
                                                          OR
    http://help.adobe.com/en_US/framemaker/using/WS4279000F-ACA5-403b-B46F-BD80A744B03B.html
    Under "You can use RoboScreen Capture to perform the following tasks: ", check out the fourth bulleted point .
    I am using RoboHelp 8 and RoboScreen Capture is part of it.
    Regards,
    Deepti

  • Change web page using iweb

    How can I change my web page using iweb?

    Then you would need to publish to one of the 3 options that iWeb provides. Now, MobileMe ceases to exist come June, 30th. If you do not allready have a 3rd party host you will need to get one.
    I use ix webhosting. Their plans start as low as $3.95 USD a month & they have excellant customer service.

  • Since the last 3 upgrades (thru Firefox 7.0) I can't print a web page. This is true using 2 different printers. I have an iMac 10.6.8.

    Since the last 3 upgrades (thru Firefox 7.0) I can't print a web page. This is true using 2 different printers. I have an iMac 10.6.8.

    Just read that the default memory storage on FF is set to 5 mbs. You can up that by clicking on tools, advanced and overriding the default and set it to 50 mbs (if you use lots of tabs) or 10 mbs (if you use just a few).
    I open a blank tab and clear the cache periodically, while I am working. I can have several tabs open and watch videos on Youtube, by doing this. Have you cleared your cache?
    Since I have an older computer, I've also used CCleaner for years. I use it often, but always before signing off. You can get it free...search for CCleaner, download from Pirifoam free. I didn't change any of the settings and it clears crap left behind from uninstalling programs, clears all browsers at once of: cookies, history, passwords, etc. I love it!
    I had freezing, before using the "open blank tab, clear the cache" thing. Now I have no problems. Plus, as I said...you can change the default memory usage for FF and that should help too.
    Hope this helps! Good luck! :)

  • How can I add a podcast episode to an existing web page using iWeb?

    How can I add an episode to an existing web page using iWeb?
    I could probably figure this out but I am afraid if I make changes to the site and re-upload it to the podcast area I will have just doubled it. I see them repeated from time to time.
    What is the proper protocal? Thanks
    Mac G4   Mac OS X (10.4.3)  

    Hi apple-owner,
    Method 1.
    To create this scatter-plot, I selected the whole of Columns A and B. (Shift click on the Column reference tabs).
    The plot ignores blank rows, but they are "ready" for new data:
    Method 2
    If you did not select whole columns, you can extend the "active" rows. Rows 1-5 selected:
    Add more data rows and drag the fill handle (small white circle, bottom right) down
    Regards,
    Ian.

  • TS3274 I can't access some web pages with safari like, forever 21, express. Even though I used to access them previously. Why?

    I can't access some web pages with safari like forever 21, express .com. Even though I used to access them previously

    Ashlee...
    Since the apps you mentioned require an internet connection try running the connectivity tests >  iTunes for Windows: Network Connectivity Tests

  • Why the speed of MacBook Air to download a web page is lower than iPad2? Even worse sometimes Ipad2 can but MBA can't load the web pages.

    Why the speed of MacBook Air to download a web page is lower than iPad2?
    Even worse ,Why sometimes iPad2 can but MBA can't load the web pages?

    Why the speed of MacBook Air to download a web page is lower than iPad2?
    Even worse ,Why sometimes iPad2 can but MBA can't load the web pages?

  • I recently downloaded a web page but then i deleted history which deleted the web page. there is still a short cut to it but says file does not exist. can i retrieve this as web page no longer exixts

    as the question states i downloaded a web page but before i put it on a memory stick i changed the options on my firefox and the file is no longer there.
    there is a shortcut in the 'recently changed' folder in windows explorer but when i click on it it says the file has moved or no longer exists.
    Is there anyway to retrieve this as the web page no longer exists

    Try:
    *Extended Copy Menu (fix version): https://addons.mozilla.org/firefox/addon/extended-copy-menu-fix-vers/

  • Can't export a web page

    I've just upgradded to iLife '08. When I try to save a web page the same as I had with the previous version of iPhoto, it doesn't work. The following steps are taken:
    Highlight eight photos in a folder
    Select File>Export
    In the Export Photos window, select Web Page tab
    Make other selections on the page and click Export
    Then I've tried various file destination folders and the Desktop
    When I click OK, an Exporting sub-window appears for a moment, then it changes to say
    Unable to generate web page and only gives me the option to Cancel
    It gives me no clues as to why it is unable to generate a web page. It has created the folders for the Images, Pages and Thumbnails but there isn't anything in those folders. I believe this is the same steps I used many times with iPhoto '06 to create a web pages.
    Does anyone have an idea on how to save web pages with the new iPhoto? How can I revert back to the old version of iPhoto?
    Thank you for you help.
    Bill H

    Bill:
    Welcome to the Apple Discussions. Try the following:download and run BatChmod on the iPhoto Library folder with the settings shown here, putting your administrator login name, long or short, in the owner and group sections. You can either type in the path to the folder or just drag the folder into that field. See if this will help.
    Next, log into another account or boot into Safe Mode and see if you can export to a web page. If so then there's something amiss with your account. If not, then a reinstall of iPhoto seems appropriate.
    Do you Twango?
    TIP: For insurance against the iPhoto database corruption that many users have experienced I recommend making a backup copy of the Library6.iPhoto database file and keep it current. If problems crop up where iPhoto suddenly can't see any photos or thinks there are no photos in the library, replacing the working Library6.iPhoto file with the backup will often get the library back. By keeping it current I mean backup after each import and/or any serious editing or work on books, slideshows, calendars, cards, etc. That insures that if a problem pops up and you do need to replace the database file, you'll retain all those efforts. It doesn't take long to make the backup and it's good insurance.
    I've written an Automator workflow application (requires Tiger), iPhoto dB File Backup, that will copy the selected Library6.iPhoto file from your iPhoto Library folder to the Pictures folder, replacing any previous version of it. It's compatible with iPhoto 08 libraries. iPhoto does not have to be closed to run the application, just idle. You can download it at Toad's Cellar. Be sure to read the Read Me pdf file.

  • Download a web page, how to ?

    Can any one help me with code for downloading a web page given the url address ? I can download the page, but the problem is it doesn't download the associated images, javascripts etc. neither does it create an associated folder as one might expect when saving a page using browser.
    Below is the code snippet -
                        URL url = new URL(address);
                        out = new BufferedOutputStream(
                             new FileOutputStream(localFileName));
                        conn = url.openConnection();
                        in = conn.getInputStream();
                        byte[] buffer = new byte[1024];
                        int numRead;
                        long numWritten = 0;
                        while ((numRead = in.read(buffer)) != -1) {
                             out.write(buffer, 0, numRead);
                             numWritten += numRead;
                        System.out.println(localFileName + "\t" + numWritten);

    javaflex wrote:
    I don't think web crawler would work
    webcrawler simply takes every link or url on the given address and digs into it ..
    Would it work for javascripts ? Given a url like xyz.com/a.html,
    1. the code above would downlod the plain html.
    2. parse html to find javascripts, images (anything else I need to look at ?)
    3. download those
    4. put everything in one folder (but the question is then do I need to rename the pointers in the dwnlded html to point at the other contents on the disk ? )
    This is naive approach - anything better ?
    thanks.More advanced web-crawlers parse the JavaScipt source files (or embedded JS sources inside HTML files) and (try) to execute the script in order to find new links. So the answer is: yes, some crawlers do. I know for a fact that Heritrix can do this quite well, but it is a rather "large" crawler and can take a while to get to work with. But it really is one of the best (if bot the best) open source Java web-crawlers around.

  • Where to Download WPC [ Web Page Composer ] and How to install it ?

    Hi Experts,
    I need to download the Web Page Composer and install for our use in my company. Can anyone help me
    on this where to get it and and how to install ?
    thanks
    Suresh

    Hi,
    Chech the SAP Note Number: [1080110 |https://www.sdn.sap.com/irj/servlet/prt/portal/prtroot/com.sap.km.cm.docs/oss_notes/sdn_oss_ep_km/~form/handler]
    Also some links that may help you:
    https://www.sdn.sap.com/irj/sdn/go/portal/prtroot/docs/library/uuid/d07b5354-c058-2a10-a98d-a23f775808a6
    There are also lots of documents available on SDN, so just use SDN search.
    Regards,
    Praveen Gudapati

  • Can not open any web page in Safari

    can not open any web page in Safari on the IPad.  All other wireless features work fine.

    If this does not work then try a reset ,
    Reset your iPad by pressing the 'Sleep' and 'Home' button at the same time for about 15 seconds or so. Your iPad  will then go through a reset / reboot procedure and will be ready for use within about a minute.
    Don't worry about doing this as you will not lose data or settings.
    Good luck and do report back.

Maybe you are looking for

  • The volume for "IMG_2333.JPG" cannot be found.

    I imported a lot of photos from different CDs (all from different digital cameras) into iPhoto 09. When I try to place them into the softcover book template, I get recurring error messages that say 'The volume for "IMG_2333.JPG" (or whatever the imag

  • Value="${bean.param}" doesn't work on tomcat 5, WHY ?

    Hello, I have a web application with jsf 1.1. My application has to play some flash movies. Here is my playMovie.jsp : <h:form id="form"  onkeydown="return sendInfo(event);">        <h:commandButton id="redirect" action="#{marketTest.whichSessionActi

  • Problem with stopCellEditing in JTable

    hi there, i have a JTable whose model is of defaultTableModel. after entering the values in the table and i click the button im not able to get the values from the cells. i have to click else where to get the values of the cells. i tried using table.

  • Convert XML values to internal table

    Hi Experts How can i convert XML values to internal table . i am getting all the values into the string. this is my example <?xml version="1.0" encoding="UTF-8"?><TEST><ZTEST><DEPTNO>HEADOFFICE</DEPTNO><DNAME>IT</DNAME><LOC>HYD</LOC><MANDT>003</MANDT

  • How to make Cmd-H hide extras like in AI and PS

    This bugs me, since I primarily use AI. I'd like Cmd-H to Hide extras instead of hiding the app. I've tried to change the shortcut in the Keyboard shortcuts menu, but when I type it in it just hides InDesign. Is there a way to fix this in ID CC?