Can we download a web page using SOA?

Hi,
We have a requirement to download a web page whether html or xml format by using SOA middleware.Is it possible with this? Have anyone tried with this earlier? Any suggestions regarding this will be of great help.
Thanks,
SOA Team

Hello Iyanu_IO,
Thank you for answering my question "can I download a web page as a PDF".
I downloaded and installed the add-on and it works really well.
Regards
Paul_Abrahams

Similar Messages

How can I download a web page's text only?

Hello All
I have some code (below) which can download the html of a web page and this is usually good enough for me. However, sometimes it would just be easier to work with if I could download just the text only of a given page. And sometimes the text that appears on
the given page is not even available in the html source, I guess because it is the result of some script or other in the html rather than being definied by the html itself.
What I would like to know is, is there any way I can download the "final text" of a web page rather than its html source? Even if that text is generated by a script in the html? As far as I can see the only way to do this is to load the page in a
browser control and then get the document text from that - but that's far from ideal.
And I dare say somebody out there knows better!
Anyway, here's my existing code:
Public Function downloadWebPage(ByVal url As String) As String
Dim txt, ua As String
Try
Dim myWebClient As New WebClient()
ua = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR 1.0.3705;)"
myWebClient.Headers.Add(HttpRequestHeader.UserAgent, ua)
myWebClient.Headers("Accept") = "/"
Dim myStream As Stream = myWebClient.OpenRead(url)
Dim sr As New StreamReader(myStream)
txt = sr.ReadToEnd()
myStream.Close()
Return txt
Catch ex As Exception
End Try
Return Nothing
End Function

A WebBrowsers DocumentText looks like below. The outertext of each HTML element does not necessarily contain all of the text displayed on a WebBrowser Document. Nor does the HTML innertext or anything in the documents code necessarily contain all of the
text displayed on a WebBrowser Document. Although I suppose the code tells where to display something from somewhere.
Therefore you would require
screen scraping the image displayed in a WebBrowser Control in order to collect all of the characters displayed in the image of the WebBrowser Document.
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>What's My User Agent?</title>
<meta content="IE=edge,chrome=1" http-equiv="X-UA-Compatible" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<meta name="author" content="Bruce Horst" />
<link rel="shortcut icon" href="/favicon.ico" type="image/x-icon" />
<link href='http://fonts.googleapis.com/css?family=Raleway:400,700' rel='stylesheet' type='text/css'>
<link href="//maxcdn.bootstrapcdn.com/font-awesome/4.2.0/css/font-awesome.min.css" rel="stylesheet">
<script src="/js/mainjs?v=Ux43t_hse2TjAozL2sHVsnZmOvskdsIEYFzyzeubMjE1"></script>
<link href="/css/stylecss?v=XWSktfeyFOlaSsdgigf1JDf3zbthc_eTQFU5lbKu2Rs1" rel="stylesheet"/>
</head>
<body>
Code for OuterHTML using WebBrowser
Option Strict On
Public Class Form1
WithEvents WebBrowser1 As New WebBrowser
Dim WBDocText As New List(Of String)
Private Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load
Me.Location = New Point(CInt((Screen.PrimaryScreen.WorkingArea.Width / 2) - (Me.Width / 2)), CInt((Screen.PrimaryScreen.WorkingArea.Height / 2) - (Me.Height / 2)))
Button1.Anchor = AnchorStyles.Top
RichTextBox1.Anchor = CType(AnchorStyles.Bottom + AnchorStyles.Left + AnchorStyles.Right + AnchorStyles.Top, AnchorStyles)
End Sub
Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
WebBrowser1.Navigate("http://www.whatsmyuseragent.com/", Nothing, Nothing, "User-Agent: Mozilla/5.0 (Windows NT 6.1; Trident/7.0; rv:11.0) like Gecko)")
End Sub
Private Sub WebBrowser1_DocumentCompleted(sender As Object, e As WebBrowserDocumentCompletedEventArgs) Handles WebBrowser1.DocumentCompleted
RichTextBox1.Clear()
WBDocText.Clear()
Dim First As Boolean = True
If WebBrowser1.IsBusy = False Then
For Each Item As HtmlElement In WebBrowser1.Document.All
Try
If First = True Then
First = False
WBDocText.Add(Item.OuterText)
Else
Dim Contains As Boolean = False
For i = WBDocText.Count - 1 To 0 Step -1
If WBDocText(i) = Item.OuterText Then
Contains = True
End If
Next
If Contains = False Then
WBDocText.Add(Item.OuterText)
End If
End If
Catch ex As Exception
End Try
Next
If WBDocText.Count > 0 Then
For Each Item In WBDocText
Try
RichTextBox1.AppendText(Item)
Catch ex As Exception
End Try
Next
End If
End If
End Sub
End Class
Some text returned
What's My User Agent?
Your User Agent String is: Analyze
Mozilla/5.0 (Windows NT 6.1; Trident/7.0; rv:11.0) like Gecko)
(adsbygoogle = window.adsbygoogle || []).push({}); Your IP Address is:
207.109.140.1
Client Information:
JavaScript Enabled:Yes
Cookies Enabled:Yes
Device Pixel Ratio:1DevicePixelRation.com
Screen Resolution:1366 px x 768 pxWhatsMyScreenResolution.com
Browser Window:250 px x 250 px
Local Time:9:48 am
Time Zone:-7 hours
Recent User Agents Visiting this Page:
You!! Mozilla/5.0 (Windows NT 6.1; Trident/7.0; rv:11.0) like Gecko)
Mozilla/5.0 (compatible; MSIE 10.0; Windows Phone 8.0; Trident/6.0; IEMobile/10.0; ARM; Touch; NOKIA; Lumia 520)
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.71 Safari/537.36
Mozilla/5.0 (compatible; MSIE 10.0; Windows Phone 8.0; Trident/6.0; IEMobile/10.0; ARM; Touch; NOKIA; Lumia 630)
Mozilla/5.0 (compatible; MSIE 10.0; Windows Phone 8.0; Trident/6.0; IEMobile/10.0; ARM; Touch; NOKIA; Lumia 630)
curl/7.35.0
curl/7.35.0
curl/7.35.0
curl/7.35.0

La vida loca

Can't fully download long web-pages using HttpURLConnection

I'm using these two methods:
/**keeps document encoding or sets Cp1251 as default
      * @param uri is uri to document
      * @return {@link InputStreamReader} with encoding
     public static InputStreamReader getInputStreamReader(String uri) throws IOException{
          URL url = new URL(uri);
          HttpURLConnection urlConn = (HttpURLConnection)url.openConnection();
          urlConn.setConnectTimeout(timeout);
          urlConn.setDefaultUseCaches( false );
        urlConn.setUseCaches( false );
        urlConn.setRequestProperty( "Cache-Control", "max-age=0,no-cache" );
        urlConn.setRequestProperty( "Pragma", "no-cache" );
        String encoding = urlConn.getContentEncoding();
        if(encoding == null || encoding.equals("") ){
             String Content_Type = urlConn.getHeaderField("Content-Type");
             if(Content_Type.contains("charset=")){
                 String[] t = Content_Type.split("charset=");
                 if(t != null && t.length>0){
                      encoding = t[t.length-1];
        if(encoding == null || encoding.equals("")){
             encoding = "windows-1251";
        InputStreamReader isr = new InputStreamReader(urlConn.getInputStream(), encoding);
        return isr;
     /**Returns content in default charset
      * @param uri is uri to document you want to get
      * @return String with encoding equal to {@link Charset.defaultCharset()}
     public static String fetchEncodedURI(String uri) throws IOException{
          InputStreamReader isr = getInputStreamReader(uri);
          ByteArrayOutputStream baos = new ByteArrayOutputStream();
          OutputStreamWriter osw = new OutputStreamWriter( baos, Charset.defaultCharset());
          char[] buffer = new char[256];
          int n = -1;
          while ( (n = isr.read(buffer))!= -1 ){
               osw.write(buffer, 0, n);
        String outputString = new String(baos.toByteArray());
        osw.flush();
        osw.close();
        baos.flush();
        baos.close();
        return outputString;
     }The problem is that some pages are cut! I make output of result string (outputString) ans see for example just 2/3 of the real page content.
Is it possible to overcome this difficulty?
Edited by: Holod on 01.10.2008 11:31

ejp wrote:
Personally, I'd suggest reading the whole page as bytes into baos and covert to string with one call to boas.toString(encoding)Or use a StringWriter. The way you're doing it is the worst of both worlds.Can you help me please with a sample link?
Or please tell, is it right solution?
         /**Returns content in default charset
      * @param uri is uri to document you want to get
      * @return String with encoding equal to {@link Charset.defaultCharset()}
     public static String fetchEncodedURI(String uri) throws IOException{
          InputStreamReader isr = getInputStreamReader(uri);
          ByteArrayOutputStream baos = new ByteArrayOutputStream();
          //OutputStreamWriter osw = new OutputStreamWriter( baos, Charset.defaultCharset());
          StringWriter sw = new StringWriter();
          char[] buffer = new char[256];
          int n = -1;
          int previousPosititon = 0;
          while ( (n = isr.read(buffer))!= -1 ){
               sw.write(buffer,previousPosititon,n);
               previousPosititon+=n;
               //osw.write(buffer, 0, n);
        //osw.flush();
     //String outputString = new String(baos.toByteArray());
        String outputString = sw.toString();
        //osw.close();
        baos.flush();
        baos.close();
        return outputString;
     }Edited by: Holod on 03.10.2008 5:41

How can I download my web pages back into IWeb

I have created a site with about 10 pages photos everything it is still on web. problem is the local files are not on my MacPro, lost them somehow. Is there a way to get them back from the web?

No unfortunately not, as iWeb has no import facility and so is unable to open any html or css files/an already published site.
If you really don't have access to your domain.sites files, which is what you need and these can be found under your User/Library/Application Support/iWeb/domain.sites, then you are look at re-building your site from scratch.
You can use copy and paste to re-create it, but you are still looking at re-building if you don't have that all important domain.sites file.
Also, if you have to re-build, then consider using different software instead of iWeb, as Apple no longer actively supports or develops it.
There are various alternatives such as RapidWeaver, Sandvox, Freeway Express/Pro, Flux 4, WebAcappella 4 and Quick n Easy Website Builder as well as EasyWeb that is in development by Rage Software and Adobe Muse.
You can download free trials of most of these so that you can try before you buy.

How can I capture long web pages using RoboScreen Capture?

Hi,
I want to capture long webpages or scrolling windows using RoboScreen Capture. There is no such mode availble in the menu.
Help would be appreciated.
Thanks,
Deepti

Hi,
Thank you for the welcome.
Please have a look at this post
http://help.adobe.com/en_US/FrameMaker/8.0/help.html?content=Chap-23-TechCommSuite_2.html
OR
http://help.adobe.com/en_US/framemaker/using/WS4279000F-ACA5-403b-B46F-BD80A744B03B.html
Under "You can use RoboScreen Capture to perform the following tasks: ", check out the fourth bulleted point .
I am using RoboHelp 8 and RoboScreen Capture is part of it.
Regards,
Deepti

Change web page using iweb

How can I change my web page using iweb?

Then you would need to publish to one of the 3 options that iWeb provides. Now, MobileMe ceases to exist come June, 30th. If you do not allready have a 3rd party host you will need to get one.
I use ix webhosting. Their plans start as low as $3.95 USD a month & they have excellant customer service.

Since the last 3 upgrades (thru Firefox 7.0) I can't print a web page. This is true using 2 different printers. I have an iMac 10.6.8.

Since the last 3 upgrades (thru Firefox 7.0) I can't print a web page. This is true using 2 different printers. I have an iMac 10.6.8.

Just read that the default memory storage on FF is set to 5 mbs. You can up that by clicking on tools, advanced and overriding the default and set it to 50 mbs (if you use lots of tabs) or 10 mbs (if you use just a few).
I open a blank tab and clear the cache periodically, while I am working. I can have several tabs open and watch videos on Youtube, by doing this. Have you cleared your cache?
Since I have an older computer, I've also used CCleaner for years. I use it often, but always before signing off. You can get it free...search for CCleaner, download from Pirifoam free. I didn't change any of the settings and it clears crap left behind from uninstalling programs, clears all browsers at once of: cookies, history, passwords, etc. I love it!
I had freezing, before using the "open blank tab, clear the cache" thing. Now I have no problems. Plus, as I said...you can change the default memory usage for FF and that should help too.
Hope this helps! Good luck! :)

How can I add a podcast episode to an existing web page using iWeb?

How can I add an episode to an existing web page using iWeb?
I could probably figure this out but I am afraid if I make changes to the site and re-upload it to the podcast area I will have just doubled it. I see them repeated from time to time.
What is the proper protocal? Thanks
Mac G4 Mac OS X (10.4.3)

Hi apple-owner,
Method 1.
To create this scatter-plot, I selected the whole of Columns A and B. (Shift click on the Column reference tabs).
The plot ignores blank rows, but they are "ready" for new data:
Method 2
If you did not select whole columns, you can extend the "active" rows. Rows 1-5 selected:
Add more data rows and drag the fill handle (small white circle, bottom right) down
Regards,
Ian.

TS3274 I can't access some web pages with safari like, forever 21, express. Even though I used to access them previously. Why?

I can't access some web pages with safari like forever 21, express .com. Even though I used to access them previously

Ashlee...
Since the apps you mentioned require an internet connection try running the connectivity tests > iTunes for Windows: Network Connectivity Tests

Why the speed of MacBook Air to download a web page is lower than iPad2? Even worse sometimes Ipad2 can but MBA can't load the web pages.

Why the speed of MacBook Air to download a web page is lower than iPad2?
Even worse ,Why sometimes iPad2 can but MBA can't load the web pages?

Why the speed of MacBook Air to download a web page is lower than iPad2?
Even worse ,Why sometimes iPad2 can but MBA can't load the web pages?

I recently downloaded a web page but then i deleted history which deleted the web page. there is still a short cut to it but says file does not exist. can i retrieve this as web page no longer exixts

as the question states i downloaded a web page but before i put it on a memory stick i changed the options on my firefox and the file is no longer there.
there is a shortcut in the 'recently changed' folder in windows explorer but when i click on it it says the file has moved or no longer exists.
Is there anyway to retrieve this as the web page no longer exists

Try:
*Extended Copy Menu (fix version): https://addons.mozilla.org/firefox/addon/extended-copy-menu-fix-vers/

Can't export a web page

I've just upgradded to iLife '08. When I try to save a web page the same as I had with the previous version of iPhoto, it doesn't work. The following steps are taken:
Highlight eight photos in a folder
Select File>Export
In the Export Photos window, select Web Page tab
Make other selections on the page and click Export
Then I've tried various file destination folders and the Desktop
When I click OK, an Exporting sub-window appears for a moment, then it changes to say
Unable to generate web page and only gives me the option to Cancel
It gives me no clues as to why it is unable to generate a web page. It has created the folders for the Images, Pages and Thumbnails but there isn't anything in those folders. I believe this is the same steps I used many times with iPhoto '06 to create a web pages.
Does anyone have an idea on how to save web pages with the new iPhoto? How can I revert back to the old version of iPhoto?
Thank you for you help.
Bill H

Bill:
Welcome to the Apple Discussions. Try the following:download and run BatChmod on the iPhoto Library folder with the settings shown here, putting your administrator login name, long or short, in the owner and group sections. You can either type in the path to the folder or just drag the folder into that field. See if this will help.
Next, log into another account or boot into Safe Mode and see if you can export to a web page. If so then there's something amiss with your account. If not, then a reinstall of iPhoto seems appropriate.
Do you Twango?
TIP: For insurance against the iPhoto database corruption that many users have experienced I recommend making a backup copy of the Library6.iPhoto database file and keep it current. If problems crop up where iPhoto suddenly can't see any photos or thinks there are no photos in the library, replacing the working Library6.iPhoto file with the backup will often get the library back. By keeping it current I mean backup after each import and/or any serious editing or work on books, slideshows, calendars, cards, etc. That insures that if a problem pops up and you do need to replace the database file, you'll retain all those efforts. It doesn't take long to make the backup and it's good insurance.
I've written an Automator workflow application (requires Tiger), iPhoto dB File Backup, that will copy the selected Library6.iPhoto file from your iPhoto Library folder to the Pictures folder, replacing any previous version of it. It's compatible with iPhoto 08 libraries. iPhoto does not have to be closed to run the application, just idle. You can download it at Toad's Cellar. Be sure to read the Read Me pdf file.

Download a web page, how to ?

Can any one help me with code for downloading a web page given the url address ? I can download the page, but the problem is it doesn't download the associated images, javascripts etc. neither does it create an associated folder as one might expect when saving a page using browser.
Below is the code snippet -
                    URL url = new URL(address);
                    out = new BufferedOutputStream(
                         new FileOutputStream(localFileName));
                    conn = url.openConnection();
                    in = conn.getInputStream();
                    byte[] buffer = new byte[1024];
                    int numRead;
                    long numWritten = 0;
                    while ((numRead = in.read(buffer)) != -1) {
                         out.write(buffer, 0, numRead);
                         numWritten += numRead;
                    System.out.println(localFileName + "\t" + numWritten);

javaflex wrote:
I don't think web crawler would work
webcrawler simply takes every link or url on the given address and digs into it ..
Would it work for javascripts ? Given a url like xyz.com/a.html,
1. the code above would downlod the plain html.
2. parse html to find javascripts, images (anything else I need to look at ?)
3. download those
4. put everything in one folder (but the question is then do I need to rename the pointers in the dwnlded html to point at the other contents on the disk ? )
This is naive approach - anything better ?
thanks.More advanced web-crawlers parse the JavaScipt source files (or embedded JS sources inside HTML files) and (try) to execute the script in order to find new links. So the answer is: yes, some crawlers do. I know for a fact that Heritrix can do this quite well, but it is a rather "large" crawler and can take a while to get to work with. But it really is one of the best (if bot the best) open source Java web-crawlers around.

Where to Download WPC [ Web Page Composer ] and How to install it ?

Hi Experts,
I need to download the Web Page Composer and install for our use in my company. Can anyone help me
on this where to get it and and how to install ?
thanks
Suresh

Hi,
Chech the SAP Note Number: [1080110 |https://www.sdn.sap.com/irj/servlet/prt/portal/prtroot/com.sap.km.cm.docs/oss_notes/sdn_oss_ep_km/~form/handler]
Also some links that may help you:
https://www.sdn.sap.com/irj/sdn/go/portal/prtroot/docs/library/uuid/d07b5354-c058-2a10-a98d-a23f775808a6
There are also lots of documents available on SDN, so just use SDN search.
Regards,
Praveen Gudapati

Can not open any web page in Safari

can not open any web page in Safari on the IPad. All other wireless features work fine.

If this does not work then try a reset ,
Reset your iPad by pressing the 'Sleep' and 'Home' button at the same time for about 15 seconds or so. Your iPad will then go through a reset / reboot procedure and will be ready for use within about a minute.
Don't worry about doing this as you will not lose data or settings.
Good luck and do report back.

Can we download a web page using SOA?

Similar Messages

Maybe you are looking for