Stripping out data from unstructured documents

I have hundreds of word and html documents that I need to strip out certain information. The html docs are completely unstructured. The word documents may or may not have the same structure. How can I leverage PL/SQL to extract out the data that I need? I have seen scripts where using PL/SQL you can give a byte position number. This may work for extracting out some of the data if positioned in the same place but I am looking to simplify the process and get the right information out in one pass. Any help would be greatly appreciated. I am not necessarily looking for an exact answer but rather information that can lead me in the right direction. Of course the exact answer wouldn't hurt.

I read all of the thread for that particular post and I still don't believe I have what I need.
Maybe it is the instr function that you wanted me to look at and maybe not. I just don't know how I would use that to extract data.
What I am looking to do is to use some kind of PL/SQL statement to extract a unique identifier that will be in the document, dates that the document was created or appended, and then some region numbers. So can I use instr to find a value from each document, hold it in a flat file and then import this information back into the database? The value of course will be different in every document.
These files are currently not in a database. I am trying to get this information out of the document to store as metadata for each document. Am I looking at the right way to do this using PL/SQL or is there some other method of data extraction that I should consider

Similar Messages

  • Display data from multiple document Libraries in List View Webpart

    Hi All,
    I want to display data from multiple document libraries into one list view webpart(custom i have created)
    I went through the following link http://blogs.msdn.com/b/ramg/archive/2009/04/22/implementing-a-simple-cross-site-collection-list-view-webpart.aspx
    but it tells to display only from one document library.
    My motive behind displaying data in the list view webpart is to achieve the functionality of Check In ,Check Out and other OOB features.
    With Regards,
    Jaskaran Singh

    Hi,
    As there is no such OOTB feature, a workaround is to create a visual web part to gather items from libraries and implement functionalities like Check in, Check out files
    in different libraries.
    The links below will provide more details:
    Create Visual Web Parts in SharePoint
    2010
    A demo about displaying list items in visual web part:
    http://www.dotnetcodesg.com/Article/UploadFile/2/217/Web%20Part%20in%20SharePoint%20To%20Show%20All%20List%20and%20List%20Items.aspx
    About the Check In and Check Out:
    How to Check In a document programmatically
    SPFile.CheckIn method
    and SPFile.CheckOut method
    Best regards
    Patrick Liang
    TechNet Community Support

  • How to get the raw data from particular document's schedule ?

    Hello,
    I am now able to get the data from a document usign RESTful Web Services SDK and what I need is to
    get the data not from the current version of the document but from the schedule that were executed some time ago
    with the older data than the current data.
    Any hints ?

    Hey Jacek,
    Please, look at the /schedules into Raylight API.
    Regards,
    Anthony

  • How to extract audit log data from every document library in site collection using powershell?

    Hi All,
    I have n number of document library in one site collection,
    My query is- How to extract audit log data from every document library in a site collection using powershell?
    Please give solution as soon as possible?

    Hi inguru,
    For SharePoint audit log data, These data combine together in site collection. So there is no easy way to extract audit log data for document library.
    As a workaround, you can export the site collection audit log data to a CSV file using PowerShell Command, then you can filter the document library audit log data in Excel.
    More information:
    SharePoint 2007 \ 2010 – PowerShell script to get SharePoint audit information:
    http://sharepointhivehints.wordpress.com/2014/04/30/sharepoint-2007-2010-powershell-script-to-get-sharepoint-audit-information/
    Best Regards
    Zhengyu Guo
    TechNet Community Support

  • Stripping out dashes from variable

    I have a jsp page that dynamically build a list of select boxes based on information out of a record set. As I build the select box I am including an onChange function that submits the form once the user selects something from the drop down list. Basically it looks like this: out.println("<select class='select' name='"+selectName+"' onChange='document."+formName+".submit();'>");
    Az you can see the select name and formname are also dynamically added based on the record set return. My problem is in the database there are dashes in some of the data so I am getting something that looks like this in my resulting html: select class='select' name='A-10' onChange='document.A-10.submit();'>");
    This dash is causing the onChange to not work so I need a way to strip out the dash so that the results would be A10 as apposed to A-10
    Any thoughts??
    p.s. This is how I filling the variable:
    String selectName = request.getParameter("name");
    String formName = request.getParameter("DataSet");

    first of all, I dont understand why you use multiple forms on one page, but thats just me. Second, you can use
    documents.forms[0].submit() to avoid the names. So just loop through your forms and use
    out.println("document.forms[" + i + "].submit()");

  • Export data from SAP Document Management System to File System(FileStore)

    Hi,
    We need to extract/ export data (documents and metadata) from SAP Document Management System to windows File System (File Store), can anyone suggest us tool or methodology to do the same.
    Thanks,
    Nilesh

    I'm also looking for a solution for this problem. We are capturing comments in BW-BPS layouts. They get stored in BW's document management system and we would like to export them out of the system for external reporting into an ACCESS database.

  • Need to pull certain data from certain documents

    First off if you know of a more acceptable place to post this please let me know
    Ok here is my scenario, i receive these documents in excel files they have users first and last name, badge number, email, E.T.C. Now i want to extract just the badge number out of this excel document. So my first question is i want to do this with a java program so what do i want to research about to make a java program that searches one of these excel documents and assuming it is space dillimeted pulls just the badge number out?
    Then i have another document called NCAS, this document has every employee listed only unique thing here is badge number and username but i want to take the badge number out of the first document and use this badge number to search my NCAS document. This will bring up a line in the document with user badge number username phone , e.t.c. again this document is space dillemited. So i want it to pull a field out of this document which i believe is the first one copy it.
    Then for my third thing i want to transfer this copied username over to a esb (oracle product) which has a database adapter on it and use it to input this username and query through our oracle database that username after that im going to use esb database connector to assign users responsibilities. but thats another project im just trying to extract certain information from certain documents i will worry about insserting this into a database later
    I guess what i would ask is can someone direct me to something that explains how to pull entries out of a document or something to that nature maybe at the same time how can i implement my java program that is pulling data out of files and incorprate that with an esb project that involves database adapters
    Message was edited by:
    vande

    Hi Vande,
    Just a comment. You will often get that type of response to a question that Google answers well, especially if it's a recurring question that has been asked and answered lots of times. Have a look at http://www.catb.org/~esr/faqs/smart-questions.html for some pointers.
    If you had said,
    "I searched Google/this forum, found a few ideas, and am looking for feedback from anyone who has tried this and might be able to help me pick the best way"
    you would have likely got a response more in line with what you were looking for. Also note that searching this forum for "java excel" found lots of interesting hits.
    Best,
    John

  • Editing with the iOS 8 Photos app strips EXIF data from photos.

    The new iOS 8 Photos app has some nice editing features and extensions from other apps is a good idea, however any editing done strips out all of the EXIF data from the photo.

    I have the same problem. It's awful; if you edit and upload georeferencing and other stuff is lost. Fail.
    It didn't strip it when I edited in Snapseed and then saved back to the photo library.

  • Way to Copy Doc,Due and Posting Date from Sale Document to Other Sale Doc

    Hey SAP B1 Community,
                                  Most of the users facing Problems When they create Sale Order and then want to create Del and Invoice Doc. They have to Change Doc and Del Date, There is solution to avoid to write  dates again.
    First Create UDF with the name ie "DDate", in DDate udf write FMS which get the Date From $[ORDR.DocDate]
    in Del and Invoice Form Create FMS for Del and Document Date which get Value form UDF "DDate"
    It ll decrease the time to Post Del and Invoice.
    Rahil Hassan
    0300-4655753

    this method is to avoid current date
    when
    they use "Copy to" Delivery and then Copy to Invoice the system fills

  • Using a formatted search which incorporates copying data from base document

    Hi
    I have a user selling tiles.  They sell by sq meter but will only sell whole boxes.  I have a formatted search on the quantity field to calculate the number of sq meters in a box.  They also sell indivudual units and will key this value directly into the quantity field.  All this works fine.
    However if I enter this as a sales order and copy to a delivery, then the formatted search fires and the quantity field gets refreshed.  This results in the incorrect value where the user had keyed data directly into the qty field in the base document.
    Therefore I need to incorporate my base document values into my query where by if there is a base document, the query will pull the quantity data from the base document.  My query so far is as follows
    SELECT (CAST($[$38.U_ActMtr.0] AS DECIMAL(10, 2))*CAST(T0.U_SqmBox AS DECIMAL(10, 2))) FROM OITM T0 WHERE T0.ItemCode = $[$38.1.0]
    Any suggestions?
    David

    If I understand your requirements well, you want to save the base quantity, when the delivery is based upon a SO, and to compute it when the DLN is not copied.
    Try to use this modified FS:
    declare @q dec(19,6)
    set @q=$[$38.11]
    If $[$38.43]<>-1
    Select @q
    Else
    SELECT (CAST($[$38.U_ActMtr.0] AS DECIMAL(10, 2))*CAST(T0.U_SqmBox AS DECIMAL(10, 2)))
    FROM OITM T0 WHERE T0.ItemCode = $[$38.1.0]

  • Spool out data from tables by querying user_tables

    I want to write a script to export data from tables that I query out of user_tables.
    like
    for all the tables returned by this query
    select table_name from user_tables where table_name like '%LK';
    I want to run a select * from on them and print the data out.
    Is it possible to do this?
    Thanks,

    I have a nice filedump routine that takes any query provided and produces a flat file. You have options regarding headers, separators, trailing separators, append vs. write mode, etc. Simple to use and works great. I built it into a mail routine that I worked on. We use it regularly in our production environment without issue.
    For details ...see my posting called :
    PL/SQL Mail Utility :: Binary/Ascii/Cc/Bcc/FileDump
    BarryC
    http://www.myoracleportal.com

  • Need to bring out data from Infocube to Flatfile

    Hi All,
    I have a cube with historcal data in one system.
    Now my requiremnet is to creat a exactly similar cube in the other system, and need to load the historical data.done.
    we are considering the below Options:
    1) Fetching the data from the cube( From First System) in to a Flat file and load into New cube (other System).
                                             or
    2) Going For a Open hub to bring out the data into Flat file and load the data into new cube in the othersystem.
    Note: Since OpenHub is the tool which we need to buy ( we are thinking. we are not sure)
    There is no kind of relation ship between the 2 systems to copy the cube from one system to other system.
    Please suggest the best options to get it.
    Regards,
    Sri

    Thank you all for your prompt responses.......
    Actually the copany got split into two. one of them have  kept aside the cube with other company's information.
    So they are not even ready to provide display access to their system. So creating RFC connection and Using APD to fetch the data to a flat file might not do.
    What we are thikinng is Fetch the data from From the Cube output into a spread sheet and load that into the new cubs.
    we have similar concern with Master Data Objects, we need to take out the attributes, Text and Heirarchies data and load in to new system.
    Here how do bring out the data for Master data attributes, Texts and Hierarchies? does this work similarly with these how we do with the cube?
    Especially with Hierarchies?
    Please suggest
    Regards,
    Srikanth

  • IPhone Calendar stripping Calendar dates from Outlook Calendar?

    My iPhone is stripping the dates OFF my Outlook Calendar when it synch's. How do I stop this? I need the calendar in BOTH places - This is wrecking havoc on my calendars. I am using mobileme if that has any bearing on it. Anyone help?
    miprofgenie

    Your OS detail provided is Windows 2000. Is this accurate?
    Outlook 2003 or 2007 is supported. Are you running either version?
    If you are syncing calendar events over the air with your MM account on your iPhone, are you also syncing calendar events direct with Outlook 2003 or 2007 via the iTunes sync process?

  • Want to Strip GPS Data from Photos

    Does anyone know an easy way to get iWeb to strip the GPS data out of photos being published as part of the Web pages? The box in the iPhoto preferences to include location data in published images is unchecked, but this only seems to affect MobileMe galleries, not iWeb.
    Steve

    Sorry, I should clarify. I am dealing with my daughter's pages, and don't use iWeb that much myself. I realize the PNG images on the regular pages don't have this issue. The issue is if there is a photo page and sharing is allowed so visitors can download an image, the downloaded JPG file has the GPS coordinates in the Exif data. We would like to allow sharing, but get rid of the GPS Exif data without undue hassle (exporting and re-importing images, etc. etc.)
    Steve

  • Writing out data from a database table

    Hi guys,
    I am kinda new to ABAP Programming. I need to know how i can write the data stored in my database table which has the following structure:
    data: begin of tablename occurs 100,
    endof tablename.
    I don't know how to loop into it and get the data out from it into a flat file on the application and the presentation server. right now, no data is getting written into the file because I am not looping through the table where the data is is present.
    I need help asap.
    Thanks,
    Minal

    Hi,
    Plz go through this link it will help u a lot
    http://www.sapdevelopment.co.uk/file/file_updown.htm
    also see this link
    http://www.sapdevelopment.co.uk/bdc/bdc_dbupdate.htm
    Thanks & Regards,
    Judith.
    Message was edited by: Judith Jessie Selvi

Maybe you are looking for

  • Disk Errors Reappear Within Day After Repair-Hard Drive Dead?

    Over the last few weeks, I have been experiencing crashes on my Macbook Pro. The crash would appear as such: I'll be using Chrome and all of a sudden pages won't load. The mouse continues to work, but if I navigate to the desktop, folder won't open.

  • How do you reduce file size of a PDF if Save As doesnt work?

    I have been trying to reduce the size of a PDF, using the Save As > Optimized PDF method. I have also tried Save As > Reduced Size PDF. This is on a file made up of 20 pages of full page Jpegs. (These were generated by the script in Photoshop that ge

  • Making it easier to award points for individual answers on the same slide.

    Hi One of my biggest requests would be for it to be made a lot easier to award points for individual answers. For example: I would like to use the 'fill in the blanks' quiz to create something that is reffered to in educational circles as 'cloze proc

  • Downloading Purchased Music

    When trying to download purchased music I am getting an error message - You do not have enough privileges for this operation. Can someone please help me.

  • Problem in valuation type in STO

    We are using STO w/o delivery for stock transfer between plants of same c/code. We are using split valuation (say val. types X, Y, Z) for some materials. While preparing STO, we can change valuation types. Our ERP team had applied some patches of Inv