Extracting HTML Data
I am developing a Java Web Crawler that extracts data from a given website. Is it possible to extract text without the use of regular expressions? If so how?
Thanx
I think you need to be more specific.
from a single String, it's very easy to extract whatever you want, using substring. Sure, you need to look for and expect certain characters in certain places, but it would do the trick.
Additionally, there may be libraries already out there that do exactly what you're asking. I'm too lazy to look myself, but if you're looking for an easy way out, that would probably be the way to go.
Now, if you're trying to write this yourself, I can't imagine it being all that hard to just go through and find a few characters that are boundaries to the text you're looking for, and grab all the text in between those boundaries (chances are these boundaries would be html tags, but in reality it could be whatever). Or, if you're looking to see if the html contains just a specific set of text, just use stringname.contains("whatever you want").
Like I said though, I think you need to be more specific in your question.
Similar Messages
-
Extracting HTML data in BSP from Browser
Hello,
I am displaying an Adobe Interactive form as HTML using an IFRAME in BSP, I know there is another way of using Adobe Interactive form in BSP, but it would occupy the entire BSP page and would overwrite any other BSP elements and they will not be displayed. Since the requirement is to enhance the existing BSP page and display buttons like SAVE, SUBMIT on the top of the BSP page and have the Interactive Adobe form occupy the rest of page as HTML display in IFRAME. I do understand that such functionality can easily achieved using web dynpro ABAP or JAVA, but I have very limited options, I used the below code to render the Interactive Adobe form :-
DATA: cached_response TYPE REF TO if_http_response.
CREATE OBJECT cached_response
TYPE
cl_http_response
EXPORTING
add_c_msg = 1.
cached_response->set_data( file_content ).
cached_response->set_header_field( name = if_http_header_fields=>content_type
value = file_mime_type ).
cached_response->set_status( code = 200 reason = 'OK' ).
cached_response->server_cache_expire_rel( expires_rel = 180 ).
DATA: guid TYPE guid_32.
CALL FUNCTION 'GUID_CREATE'
IMPORTING
ev_guid_32 = guid.
CONCATENATE runtime->application_url '/' guid INTO display_url.
cl_http_server=>server_cache_upload( url = display_url
response = cached_response ).
Once displayed as HTML using the IFRAME in BSP, is there a way I can capture the data entered Interactive Adobe Form in BSP? I can still extract data even if it were in XML format or XSTRING. Please let me know if there is way to extract the data.
Regards,
Shishir.PHi,
Have you gone through this link
[check this|http://www.sdn.sap.com/irj/scn/go/portal/prtroot/docs/library/uuid/d0e58022-2a39-2a10-69a8-c1a892e2b3f4?quicklink=index&overridelayout=true]
Cheers,
bhavana -
Extraction of data from ECC to 3rd Party systems
Hi All,
I want to know all the options available for extracting data from ECC to a 3rd party system (custom datawarehouse like Teradata, hyperion etc). Also, I want know if there is a best practice documentation available for extraction of data from ECC to any 3rd party system?
Thanks,
SB.Hi SB,
Check the following link
http://expertisesapbi.blogspot.com/2010/06/how-to-transfer-data-from-sap-system-to.html
Ranganath. -
STATIC "HTML Data Set Photo Gallery" with CAPTIONS
Because of search engines and validation I would really like to use the STATIC "HTML Data Set Photo Gallery" http://labs.adobe.com/technologies/spry/demos/gallery_pe/static/china.html.
But I definitely need captions for the Photos. I found possibilities to add them via XML, but not for the HTML-Version.
As far as I got I think I have to edit this function from "gallery_hds.js" and add some RegEx, to extract caption text.
function PhotosFilter(ds, row, rowIndex)
var tnStr = row.thumbimg;
if (tnStr)
row.path = tnStr.replace(/.*<a[^>]*href="?([^"]*)"?.*/i, "$1");
row.thumbpath = tnStr.replace(/.*<img[^>]*src="?([^"]*)"?.*/i, "$1");
return row;
In the following rows within the html-document
<span class="thumbnail"><a href="../../gallery/galleries/paris/images/paris_02.jpg"><img src="../../gallery/galleries/paris/thumbnails/paris_02.jpg" alt="paris_02.jpg" /></a></span>
I would insert text before the closing span-tag (or before the closing a-tag?) - like so
<span class="thumbnail"><a href="../../gallery/galleries/paris/images/paris_02.jpg"><img src="../../gallery/galleries/paris/thumbnails/paris_02.jpg" alt="paris_02.jpg" /></a>Here is some HTML Text</span>
This text must be extracted via RegEx - similar to the rows within the if-Statement of the upper mentioned function "PhotosFilter". But here my RegEx-knowledge is definitely to short!
Thank you very much in advance for any help or suggestion.
Angela FenglerBecause of search engines and validation I would really like to use the STATIC "HTML Data Set Photo Gallery" http://labs.adobe.com/technologies/spry/demos/gallery_pe/static/china.html.
But I definitely need captions for the Photos. I found possibilities to add them via XML, but not for the HTML-Version.
As far as I got I think I have to edit this function from "gallery_hds.js" and add some RegEx, to extract caption text.
function PhotosFilter(ds, row, rowIndex)
var tnStr = row.thumbimg;
if (tnStr)
row.path = tnStr.replace(/.*<a[^>]*href="?([^"]*)"?.*/i, "$1");
row.thumbpath = tnStr.replace(/.*<img[^>]*src="?([^"]*)"?.*/i, "$1");
return row;
In the following rows within the html-document
<span class="thumbnail"><a href="../../gallery/galleries/paris/images/paris_02.jpg"><img src="../../gallery/galleries/paris/thumbnails/paris_02.jpg" alt="paris_02.jpg" /></a></span>
I would insert text before the closing span-tag (or before the closing a-tag?) - like so
<span class="thumbnail"><a href="../../gallery/galleries/paris/images/paris_02.jpg"><img src="../../gallery/galleries/paris/thumbnails/paris_02.jpg" alt="paris_02.jpg" /></a>Here is some HTML Text</span>
This text must be extracted via RegEx - similar to the rows within the if-Statement of the upper mentioned function "PhotosFilter". But here my RegEx-knowledge is definitely to short!
Thank you very much in advance for any help or suggestion.
Angela Fengler -
Extract Excel Data to RT (perl)
Hi i am trying to extract some data from Excel and export into a perl program (Request Tracker). I am not sure what technologies i should use.
Eg. convert to xml? cvs? or to java? use JavaEE?
Please advice
Thanksarmalcolm wrote:
Also, modern versions of Excel will sav files in an xml format, so you could then go straight to perl for a completely java-free solution :)Doesn't even need to do that! ;-)
[Active State Perl and Excel|http://aspn.activestate.com/ASPN/docs/ActivePerl-5.6/faq/Windows/ActivePerl-Winfaq12.html] -
Auto extraction of data...
Hi,
Is there a software or tool out there I can purchase that
allows me to accomplish the following tasks? I've looked at screen
scrape, which only takes care of my "data extraction" requirement.
I've also looked at iMarcos, which only takes care of the auto
submit of the form.
Tasks I need to accomplish:
I need a software / tool for which I can configure to
automatically extract certain data from an email message that comes
into my Microsoft Office Outlook box and auto paste it into a
textarea (memo) box in a web page and then auto submit the form
(containing the textarea field with pasted data) into a table in a
database.
Thanks in advance!Hi cf_menace,
I have a very simple template (below) that runs to retrieve
only the first 5 emails (MAXROWS="5") from my mail box, but it
takes a long time to return the result set. Can you tell me why? Is
it something in the CF administrator that I have to configure to
make this faster? Please see code below:
<!--- This view-only example shows the use of CFPOP
--->
<HTML>
<HEAD>
<TITLE>CFPOP Example</TITLE>
</HEAD>
<BODY>
<H3>CFPOP Example</H3>
<P>CFPOP allows you to retrieve and manipulate mail
in a POP3 mailbox. This view-only example shows how to
create one feature of a mail client, allowing you to display
the mail headers in a POP3 mailbox.
<!--- <P>Simply uncomment this code and run with a
mail-enabled CF Server to
see this feature in action. --->
<CFIF IsDefined("form.server")>
<!--- make sure server, username are not empty --->
<CFIF Trim(form.server) is not "" and Trim(form.username)
is not "">
<CFPOP SERVER="#server#" USERNAME="#username#"
PASSWORD="#pwd#" ACTION="GETHEADERONLY" NAME="GetHeaders"
MAXROWS="5">
<H3>Message Headers in Your Inbox</H3>
<P>Number of Records:
<CFOUTPUT>#GetHeaders.RecordCount#</CFOUTPUT></P>
<UL>
<CFOUTPUT QUERY="GetHeaders">
<LI>Row: #CurrentRow#: From: #From# -- Subject:
#Subject#
</CFOUTPUT>
</UL>
</CFIF>
<FORM ACTION="CFPOP.cfm" METHOD="POST">
<P>Enter your mail server: <INPUT TYPE="Text"
NAME="server">
<BR>Enter your username: <INPUT TYPE="Text"
NAME="username">
<BR>Enter your password: <INPUT TYPE="password"
NAME="pwd">
<P><INPUT TYPE="Submit" VALUE="Get Message
Headers">
</FORM>
</BODY>
</HTML> -
Save HTML data in a Oracle Column
what would be the best way to Save HTML data in a Oracle Column?
while varchar2 can be used for upto 4000 bytes. it would still mean escaping a lot of special character. Is there a better way to do this? any help would be greatly appreciated.Besides the XML types available to you and the associated Oracle provided packages to input and extract XML I have heard arguments that both should be stored in the database. That is you should store the extracted data in normal Oracle columns so it can be used like any other attribute and that you should store the XML as XML which can then be used as XML.
For data that is only inserted and deleted I can see this method but if updates to information within the XML is required then you just added another set of work requirements and complexity.
Who is going to access the data? What tools are the users going to use? Where else does the data need to be provied to and in what format? The answers to who and how the data will be used should provide you with the answer of what form the data should be stored in.
My personal view is that a relational database should be used for what it was designed for, storing relational data.
HTH -- Mark D Powell -- -
Extracting the date value from digital signature/certificate
Hello,
I'd like to extract the date from the signature properties and copy the value over to the date field as shown in snapshot.
I am aware that we can change the appearance of the digital signature to make the date visible but in most case, it is too small to read on hardcopies.
We resort by manually typing in the date, zooming into PDF to read visible date (if any) associated with signature image, to click on the signature image to open the Signature Properties dialog, or to open the Signatures tab window docked to the left.
Manual typing in the date expose us to discrepancy problem when the PDF was created vs. the actual date the PDF was signed (date value associated with digital signature/certificate). For example, person A created a PDF with date typed in and then sent that file over to person B (approving the document), who may digitally sign it a few days later.
Hope I am making sense.
Regards,
Devin
Note: I have originally posted my question in other thread at http://forums.adobe.com/message/3296355You can get the data and other signature properties using the signatureInfo field method: http://livedocs.adobe.com/acrobat_sdk/9.1/Acrobat9_1_HTMLHelp/JS_API_AcroJS.88.756.html
But for you application you really should be setting the date field before the signature is applied, since changing it afterwards would invalidate the signature. You can execute a script that sets the valud of the data field with the current date using the "Signaute Signed" event, which you'll see as one of the tabs of the signature field properties dialog. -
Error while extracting the data in R/3 production system,
Hi Team,
We got the following error while extracting the data in R/3 production system,
Error 7 When Sending an IDoc R3 3
No Storage space available for extending the inter 44 R3 299
No storage space available for extending the inter R3 299
Error in Source System RSM 340
Please guide us to fix the issueIt´s very difficult to help you without knowing
- what is going to be transferred
- where you get this error
- system configuration
- actual memory usage
- operating system
- database and configuration etc. etc.etc. etc.
I suggest you open an OSS call and let the support have a look on your system. It´s much easier if one has system access to find out the cause for that problem.
Markus -
Unable to extract the data from ECC 6.0 to PSA
Hello,
I'm trying to extract the data from ECC 6.0 data source name as 2LIS_11_VAHDR into BI 7.0
When i try to load Full Load into PSA , I'm getting following error message
Error Message: "DataSource 2LIS_11_VAHDR must be activated"
Actually the data source already active , I look at the datasource using T-code LBWE it is active.
In BI on datasource(2LIS_11_VAHDR) when i right click selected "Manage" system is giving throughing below error message
"Invalid DataStore object name /BIC/B0000043: Reason: No valid entry in table RSTS"
If anybody faced this error message please advise what i'm doing wrong?
Advance thanksECC 6.0 side
Delete the setup tables
Fill the data into setup tables
Schedule the job
I can see the data using RSA3 (2LIS_11_VAHDR) 1000 records
BI7.0(Service Pack 15)
Replicate the datasource in Production in Backgroud
Migrate Datasource 3.5 to 7.0 in Development
I did't migrate 3.5 to 7.0 in Production it's not allowing
When i try to schedule the InfoPakage it's giving error message "Data Source is not active"
I'm sure this problem relate to Data Source 3.5 to 7.0 convertion problem in production. In Development there is no problem because manually i convert the datasource 3.5 to 7.0
Thanks -
How to extract Slide data in 3rd part application from clipboard
I need to be able to copy/paste or drag/drop from PowerPoint into another application (C# WPF). In my OnDrop method the DragEventArgs Data has these formats:
[0] "Preferred DropEffect" string
[1] "InShellDragLoop" string
[2] "PowerPoint 12.0 Internal Slides" string
[3] "ActiveClipBoard" string
[4] "PowerPoint 14.0 Slides Package" string
[5] "Embedded Object" string
[6] "Link Source" string
[7] "Object Descriptor" string
[8] "Link Source Descriptor" string
[9] "PNG" string
[10] "JFIF" string
[11] "GIF" string
[12] "Bitmap" string
[13] "System.Drawing.Bitmap" string
[14] "System.Windows.Media.Imaging.BitmapSource" string
[15] "EnhancedMetafile" string
[16] "System.Drawing.Imaging.Metafile" string
[17] "MetaFilePict" string
[18] "PowerPoint 12.0 Internal Theme" string
[19] "PowerPoint 12.0 Internal Color Scheme" string
The "PowerPoint 14.0 Slides Package" is a byte array... can this be converted into Slides?
If not how would I go about getting high-resolution images + slide text from a drag/drop?
[Originally posted here: http://answers.microsoft.com/en-us/office/forum/office_2013_release-powerpoint/how-to-extract-slide-data-in-3rd-part-application/a0b5ed64-eb77-49bb-bf44-e0732e23a5eb]What I'd like to do:
Open PowerPoint
In PPT open a presentation
In PPT select a slide
Drag it to my 3rd party WPF application
In the 3rd party WPF application drop handler get the slide data (text, background image, etc...).
When I do this I get the DragEventArgs Data (the clipboard data) and it has the 20 supported formats I listed in the 1st post. From these formats #4 seemed like it could have some useful info.
WPF
<Window x:Class="PowerPointDropSlide.MainWindow"
xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
Title="MainWindow" Height="350" Width="525" AllowDrop="True" Drop="UIElement_OnDrop" DragOver="UIElement_OnDragOver">
<Grid HorizontalAlignment="Stretch" VerticalAlignment="Stretch" Background="LightBlue">
<TextBlock Text="Drop something here!"/>
</Grid>
</Window>
Handlers:
public void UIElement_OnDragOver(object sender, DragEventArgs e)
public void UIElement_OnDrop(object sender, DragEventArgs e)
string[] supportedFormats = e.Data.GetFormats();
object pptSlidesPackage = e.Data.GetData("PowerPoint 14.0 Slides Package"); -
Not able to extract performance data from .ETL file using xperf commands.
Xperf Commands:
xperf –i C:\TempFolder\Test.etl -o C:\TempFolder\BootData.csv –a process
Getting following error after executing above command:
"33288636 Events were lost
in this trace.
Data may be unreliable
This is usually caused
by insufficient disk bandwidth for ETW lo
gging.
Please try increasing the minimum
and maximum number of buffers
and/or
the buffer size.
Doubling these values would be a good first at
tempt.
Please note, though, that
this action increases the amount of me
mory
reserved
for ETW buffers, increasing memory pressure on your sce
nario.
See "xperf -help start"
for the associated command line options."
I changed page size file but its does not work for me.
Any one have idea, how to solve this problem and extract ETL file data.I want to mention one point here. I have total 4 machines out of these 3 machines above
commands working properly. Only one machine has this problem.<o:p></o:p>
Hi,
I consider that you can try to use xperf to collect the trace etl file and see if it can be extracted on this computer:
Refer to following articles:
start
http://msdn.microsoft.com/en-us/library/windows/hardware/hh162977.aspx
Using Xperf to take a Trace (updated)
http://blogs.msdn.com/b/pigscanfly/archive/2008/02/16/using-xperf-to-take-a-trace.aspx
Kate Li
TechNet Community Support -
How to extract authorization data to standart BW DSO's from SAP R/3 system
Hi All,
Does anyone have any experience about this topic? I want to use SAP R/3 as a source system and after i extracted the data to business content DSO's in BW ,i will generate authorization objects from DSO 's.
I am using standar BC DSO 's
0TCA_DS01 Authorization data - Values
• 0TCA_DS02 Authorization data - Hierarchies
• 0TCA_DS03 Descriptive Text Authorizations
• 0TCA_DS04 Assignment User Authorizations
• 0TCA_DS05 Generate users for Authorizations
I have deep research but cant find anything.
Best Regards
OzanHi Ozan,
You can go though thread provided by Suman, These DSO's will help to maintain Analysis Authorizations in BW automatically In-short you don't need to maintain it, it will come from R/3 and same will be configured in BW.
Regards,
Ganesh -
How to extract Inventory data from SAP R/3 system
Hi friends How to extract Inventory data from SAP R/3 system? What are report we may expect from the Inventory?
Hi,
Inventory management
https://www.sdn.sap.com/irj/servlet/prt/portal/prtroot/com.sap.km.cm.docs/documents/a1-8-4/how%20to%20handle%20inventory%20management%20scenarios.pdf
How to Handle Inventory Management Scenarios in BW (NW2004)
https://www.sdn.sap.com/irj/sdn/go/portal/prtroot/docs/library/uuid/f83be790-0201-0010-4fb0-98bd7c01e328
Loading of Cube
ref.to page 18 in "Upgrade and Migration Aspects for BI in SAP NetWeaver 2004s" paper
http://www.sapfinug.fi/downloads/2007/bi02/BI_upgrade_migration.pdf
Non-Cumulative Values / Stock Handling
https://www.sdn.sap.com/irj/sdn/go/portal/prtroot/docs/library/uuid/93ed1695-0501-0010-b7a9-d4cc4ef26d31
Non-Cumulatives
http://help.sap.com/saphelp_nw2004s/helpdata/en/8f/da1640dc88e769e10000000a155106/frameset.htm
http://help.sap.com/saphelp_nw2004s/helpdata/en/80/1a62ebe07211d2acb80000e829fbfe/frameset.htm
http://help.sap.com/saphelp_nw2004s/helpdata/en/80/1a62f8e07211d2acb80000e829fbfe/frameset.htm
Here you will find all the Inventory Management BI Contents:
http://help.sap.com/saphelp_nw70/helpdata/en/fb/64073c52619459e10000000a114084/frameset.htm
2LIS_03_BX- Initial Stock/Material stock
2LIS_03_BF - Material movements
2LIS_03_UM - Revaluations/Find the price of the stock
The first DataSource (2LIS_03_BX) is used to extract an opening stock balance on a
detailed level (material, plant, storage location and so on). At this moment, the opening
stock is the operative stock in the source system. "At this moment" is the point in time at
which the statistical setup ran for DataSource 2LIS_03_BX. (This is because no
documents are to be posted during this run and so the stock does not change during this
run, as we will see below). It is not possible to choose a key date freely.
The second DataSource (2LIS_03_BF) is used to extract the material movements into
the BW system. This DataSource provides the data as material documents (MCMSEG
structure).
The third of the above DataSources (2LIS_03_UM) contains data from valuated
revaluations in Financial Accounting (document BSEG). This data is required to update
valuated stock changes for the calculated stock balance in the BW. This information is
not required in many situations as it is often only the quantities that are of importance.
This DataSource only describes financial accounting processes, not logistical ones. In
other words, only the stock value is changed here, no changes are made to the
quantities. Everything that is subsequently mentioned here about the upload sequence
and compression regarding DataSource 2LIS_03_BF also applies to this DataSource.
This means a detailed description is not required for the revaluation DataSource.
http://help.sap.com/saphelp_bw32/helpdata/en/05/c69480c357354a8846cc61f7b6e085/content.htm
http://help.sap.com/saphelp_bw33/helpdata/en/ed/16c29a27db6e4d81a015be8673eb80/content.htm
These are the standard data sources used for Inventory extraction.
Hope this helps.
Thanks,
JituK -
How to extract PS data from sap r/3 to bw
Hi,
How to extract PS data from sap r/3 to bw
PS data like plans,budget,accurals&commitmnets
can any one help me regarding this..
Thanks in Advance,
Shankar.HI sankar,
you can refer the belkow link to find the details on the relevant extractors and infoproviders
http://help.sap.com/erp2005_ehp_04/helpdata/EN/17/416d030524064cb2b8d58ffb306f3a/frameset.htm
Regards,
Sathya
Maybe you are looking for
-
Converting existing MIDI loops
Hi, I have a bunch of previously acquired and used MIDI loops, which I should like to import and use in GarageBand. When double-clicked, the MIDI loops open in QuickTime, with which I can export to an AIF file - but these are full-lenth, 3-4 minute l
-
My itunes on my PC keeps telling me I need to update my phone to the latest 10.6.3 and according to my phone I have no new updates. This is preventing me from downloading music from my itunes to my iphone. Please help!
-
Will the Apple Magic Mouse work with my Mac Mini?
Hi I have a mid 2007 Mac mini - the Model 2,1 - running Mac OS X Snow Leopard, 10•6•8 I'm thinking of treating myself to the Apple Magic Mouse, or Trackpad. Will the Magic Mouse, work with this model and version of the OS - ? (I'm ALSO told I need to
-
I have recently changes my sons iPod apple ID account over to his own for new apps etc. however when I try to sync his iPof I keep getting a message that says 'can not load data class' and iTunes freezes up. It will not sync. Thanks
-
Help with superstar2 GPS receiver
hello every1, I have a superstar2 gps receiver...I am trying to get the binary data into LABVIEW 8...but I get only junk characters(encoded binary) into labview..when I run the simple read and write binary.vi file....But if I rightclick on the read b