Extract the content of word document

Hi Experts,
I need to extract the content of word document in to an internal table using OLE automation.
Suggest us with a sample code or any FM.
Regards
Paul

One way or probably Only way I can think off is using HWPF for reading/writing MS Word documents. It is a Jakartha project which is still under development though. I was able to read and Write WORD documents pretty nicely. Like I said, HWPF is still under development . You can still try using it, it is quite simple. Let me know if need help with coding using HWPF, I used it before. Check out the following link.
http://jakarta.apache.org/poi/hwpf/index.html.
Hope that helps. Thanks.

Similar Messages

  • Printing the contents of word document in the smartform

    Hi all,
       Can anyone help me out in solving my problem. Problem is that i need to read a MS word document from a specified location given on the selection and print it in the smart form with the exact formatting options as in the MS word. If possible provide me with some smaple code for this.
    Thanks in advance,
    Swamy Mantha

    Hi all,
      it seems all are busy with their work.. A gentle request. Can any one solve the problem which i had posted earlier. for convenience i am posting the query once again.
    Can anyone help me out in solving my problem. Problem is that i need to read a MS word document from a specified location given on the selection and print it in the smart form with the exact formatting options as in the MS word. If possible provide me with some smaple code for this.
    Thanks in advance for ur efforts,
    Swamy Mantha

  • Extracting content of word document

    Hi everybody,
    I need to create an application, where the user can enter formated text. (like in Transaction SO10).
    If I understand it correctly, the document bound to datasource cannot be in RTF format.
    Does anybody know, how i can extract the content (the text and the formats) from a the native .doc format?
    Thanx in advance
    Bernhard

    Mike,
    In order to perform versioning you would upload each version of the word doc into the database using a upload document screen. You can automatically put a version number on each upload. The documents can be downloaded and will automatically open word. If I remember correctly you can pop the browser with the correct url directly from word (after saving the doc of course).
    Keep Smiling,
    Bob R

  • Cannot search file content on Word document with embedded Excel table

    Cannot search file content on Word document with embedded Excel table. I have Windows 8.1 64-bit and Office 2010 Professional. Only phrases from within Excel tables are not searchable. I have many Word documents with embedded Excel table.
    I use it for my invoices. Those invoices are converted to pdf to be sent via mail. Searching the same phrases in related pdf files Works fine. And yes, folders are indexed, searching service is active......... For example I can find all invoices that have
    specific address or name, which is located in word document, but cannot find invoices with specific item name or price, being that information is in embedded Excel table. (not linked, embedded). I thought that is a question for Windows forum, but guys directed
    me here on Office forum. To clarify, I do not use Ctrl+F inside some document, but Windows Search in my folders. Probably the same happens in Office 2013.
    Thank you.

    Hi, I have a lot of Word documents (invoices, offers). Main part of those documents is embedded Excel file because it is easier to do mathematics in Excel than in Word. There are columns with description, unit price, quantity, taxes... Now, I need
    to find who bought HP switch 2530-24G last year. I open folder with last year invoices and search "2530". Cannot find any. But if that document was converted to pdf for mail, than I can find that phrase. Windows search does not work for content if the content
    is in embedded file.

  • How to extract the content of a user uploaded txt file in web dynpro?

    Hi,
    I'm working on a java web dynpro component. This component consists of document upload field, where users should be able to upload .txt documents. These uploaded text documents should then be somehow read, and thir content displayed. I am already able to upload documents using the upload field, and store it in the context, but I'm still not able to extract the content of these text documents for displaying.
    Does anyone have any suggestions of how I could do this?
    Any help will be greatly appreciated!
    Thanks!

    Hi Alain,
        You can do through this document on how to upload/download files in Webdynpro.
    [https://www.sdn.sap.com/irj/scn/go/portal/prtroot/docs/library/uuid/202a850a-58e0-2910-eeb3-bfc3e081257f]
    Once you have the uploaded file in your context if you are storing it as a byte array then convert it to a string using the String constructor String(byte[] bytes)  and then you can store this string in an attribute of type String which could be bound to a UI element (TextArea) to display the contents.
    If you are using an IWDResource then you will get an inputstream from which you can read the data and convert it to a string for display as mentioned above.
    Hope this helps.
    Sanyev

  • Can we Read/Display the content of Word/PDF file  in Flex 3/4 ?

    Hello All,
    Can we  read/display the content of Word/PDF file in Flex 3 or Flex 4?.  I have one word file containing  Arabic  and English content with some settings like  Bold, Color, Align etc. I want to display the content of this word file as it is in the flex web application.
    Awaiting for prompt reply.
    Thanks and Regards

    thank you for your immediate reply, but,
    sorry, this does not work.
    With this code:
    <cfpdf action = "read" source = "dok_1.pdf" name =
    "mypdf">
    <cfdump var="#mypdf#"/>
    I get this result:
    Everything, but no text of the document.
    PDFDocument
    Application name of application
    Author bimbam Verlag GmbH
    CenterWindowOnScreen [empty string]
    ChangingDocument Allowed
    Commenting Allowed
    ContentExtraction Allowed
    CopyContent Allowed
    Created D:20080710
    DocumentAssembly Allowed
    Encryption No Security
    FilePath [empty string]
    FillingForm Allowed
    FitToWindow [empty string]
    HideMenubar [empty string]
    HideToolbar [empty string]
    HideWindowUI [empty string]
    Keywords [empty string]
    Language [empty string]
    Modified [empty string]
    PageLayout SinglePage
    Printing Allowed
    Producer [empty string]
    Properties [empty string]
    Secure Allowed
    ShowDocumentsOption [empty string]
    ShowWindowsOption [empty string]
    Signing Allowed
    Subject [empty string]
    Title Rheinische Angler-Zeitschrift
    TotalPages 1
    Trapped [empty string]
    Version 1.3
    Maybe i do not understand the cfpdf tag the right way.
    What i want is a kind of pdf-to-text conversion.
    Do I have to use the processddx action? I do not think so.
    But there is a property DocumentText .. ?

  • Is it possible to extract the contents of any PDF file using Adobe PDF SDK?

    Is it possible to extract the contents of any PDF file using Adobe PDF SDK?
    For Example: There is one pdf file. Let us say xxx.pdf with 32 pages. I am interested in only in a topic present at 10th page. Can I extract this information and save it into another pdf file (means new pdf file)?

    Thanks Irosenth,
    I am actually interested in extract the page and create a new PDF with that page. But still there is a catch that on which basis the page needs to extract either on PAGE number OR on Bookmark basis.
    But here in this scenario assume I am looking for the PDF file, now I want to save only page 5. How can I extract page 5 automatically/programmatuically? Or in simple word how can I get the reference link of page 5?
    Here I am not getting clear picture that Do I need both SDK Adobe & Acrobat to achieve this requirement. And more over you have mentioned that SDK itself is free. But on Adobe side it is mentioned that it is available by license only with this I have another doubt: To work my desktop/system application with Adobe PDF library, this library needs to distribute with the application. So in this case will it be chargeable for each and every deployment.
    Could you please provide me the link from where I can download the SDK? So that I can do some excerise with SDK to figure out the exact flow of functionality to work with my application.

  • How to display the contents of a document set on a page?

    I want to display the contents of a document set (that contains both folders and files) on a page (with the same structure as they are in the document set like folders and files). How to achieve this?
    I tried content search webpart but it is of no use as it displays the flat list instead I need folders and files as they are present in the document set
    I tried document set contents webpart but as it doesn't accept any connection, it is not of much use.
    I will be glad if you have any pointers for me in this regard.
    Regards
    Kesava

    Hi Kesava,
    According to your description, you might want to display the content in a document set with its hierarchy.
    How about using
    Page Viewer Web Part to display the page of the corresponding document set? This would be a non-code solution I would recommend.
    More information about Page Viewer Web Part:
    https://support.office.com/en-nz/article/Page-Viewer-Web-Part-e364436c-0ec4-4819-acac-1982b3525531
    Thanks
    Patrick Liang
    Forum Support
    Please remember to mark the replies as answers if they help and unmark them if they provide no help. If you have feedback for TechNet Subscriber Support,
    contact [email protected]
    Patrick Liang
    TechNet Community Support

  • How do you recover the contents of a document once its been erased

    Hi Pages community
    Any idea how to recover the contents of a document once the contents has been erased but the document still exists?
    I already reset the Ipad using an Icloud backup from June 11th.
    Thanks
    Beth

    This is modified code. I think it works.
    declare
    a blob;
    b blob;
    d number:=1;
    e number:=1;
    f number:=1;
    begin
    select module_floorplan into a from tblob where seqno = 100;
    select module_floorplan into b from tblob where seqno = 100;
    e:=length(a);
    d:=length(b);
    select dbms_lob.compare(a,b,dbms_lob.lobmaxsize,d,e) into f from dual;
    if f = 0 then
       dbms_output.put_line('Same --> '||to_char(f));
    else
       dbms_output.put_line('Diff --> '||to_char(f));
    end if;
    exception
         when others then
              dbms_output.put_line(SQLERRM);
    end;        

  • How to extract the content of a mail message?

    Friends,,,
    How to extract the content of a mail message?
    the message does not contain any attachments or images.
    its just a plain text..
    if i use message.getContent(), in addition of the content it returns headers information also...
    but i need only the content of that message...
    if i write code like this:
    String content = (String) message.getContent();
    it gives cast exception...
    if the message contains only plain text, no multipart, then which method is useful to extract only the content?
    please tell me friends..
    thanks in advannce,
    regards,
    Venkata Naveen

    Message.getContent() does not return headers for a simple text/plain
    message. If you're getting headers, something else is wrong.
    Also, casting the result to String should work.
    Most likely the message really isn't a simple text/plain message.
    Provide more details and we'll help you figure out what you're
    doing wrong.
    Also, please read the msgshow.java demo program included with JavaMail.

  • How to extract the contents of the Jar file?

    Hello,
    Can anyone tell me how to extract the contents of the Jar file?
    An example will be highly appreciated.
    Thanks.

    From command line, or from within a java application?
    Kaj
    Btw. Why do you need to do it. You do know that you can add jars to the classpath and read resources from them without extracting the file?

  • Cannot view the content of XML document.

    Hi experts,
    i have saved edit form (SAP Demo News) in document folder (KM Content > document). To view the content, I click the XML document then the content is display as design in ShowForm.
    My problem is i save the edit form (SAP Demo News) in my own folder (KM Content > MyFolder), then when I click the XML document the content is NOT display as design in ShowForm. But its display as XML coding.
    How to make the content of XML document NOT view as XML coding?
    Thanks and regards
    faeza

    Hi faeza ,
    by default XML Forms can only be used in the repositories documents and userhome.
    If you want to use XML-Forms in your own repository like "MyFolder" than the behaviour is like you have described.
    So you have to adjust the KM settings for using XML-Forms as well in the repository "MyFolder".
    Goto
    System Administration ->
    System Configuration ->
    Knowledge Management ->
    Content Management ->
    Configuration
    -> Content Management
    ->  Repository Filters
    -> XML Forms Repository Filter
    Choose the XML Forms Repository Filter entry "xmlforms_filter" and choose Edit.
    Add as "Repositories:" your repository "MyFolder".
    Save it and restart the whole portal.
    Best regards
    Frank

  • What is the best free word document app?

    What is the best free word document app for iphone5?

    Take a look at this link, https://itunes.apple.com/us/app/documents-free-mobile-office/id306273816?mt=8

  • Pro grammatically generate build in document ID at the footer of Word document?

    Hi All,
    i am trying to added Document ID at the footer of Word file  programatically using the following code. Following code is working only when i am uploading the document first time and its adding the ID at the footer. But problem is when i am uploading
    same file into same document library, document Id is just going away. i have tried to use ItemUpdated method but it doesn't work. what i need to modify into code, so that same document ID will be remains at the footer of document no matter how many times uploaded
    document. any help will greatly appreciated. 
    using System;
    using System.Security.Permissions;
    using System.Runtime.InteropServices;
    using Microsoft.SharePoint;
    using System.IO;
    using System.IO.Packaging;
    using DocumentFormat.OpenXml.Packaging;
    using System.Xml;
    using System.Collections.Generic;
    using Microsoft.SharePoint.Security;
    using Microsoft.SharePoint.Utilities;
    using Microsoft.SharePoint.Workflow;
    namespace AddHeaderFooterReceiver.ItemAddedEventReceiver
        /// <summary>
        /// List Item Events
        /// </summary>
     public class ItemAddedEventReceiver : SPItemEventReceiver
            /// <summary>
            /// An item was added.
            /// </summary>
            public string GetFooter()
                string footerVal = "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\" ?><w:ftr xmlns:ve=\"http://schemas.openxmlformats.org/markup-compatibility/2006\"
    xmlns:o=\"urn:schemas-microsoft-com:office:office\" xmlns:r=\"http://schemas.openxmlformats.org/officeDocument/2006/relationships\" xmlns:m=\"http://schemas.openxmlformats.org/officeDocument/2006/math\" xmlns:v=\"urn:schemas-microsoft-com:vml\"
    xmlns:wp=\"http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing\" xmlns:w10=\"urn:schemas-microsoft-com:office:word\" xmlns:w=\"http://schemas.openxmlformats.org/wordprocessingml/2006/main\" xmlns:wne=\"http://schemas.microsoft.com/office/word/2006/wordml\"><w:p
    w:rsidR=\"00C24C70\" w:rsidRDefault=\"00C24C70\"><w:pPr><w:pStyle w:val=\"Footer\" /></w:pPr><w:r><w:t>Hi</w:t></w:r></w:p><w:p w:rsidR=\"00C24C70\" w:rsidRDefault=\"00C24C70\"><w:pPr><w:pStyle
    w:val=\"Footer\" /></w:pPr></w:p></w:ftr>";
                return footerVal;
            public void WDAddFooter(Stream footerContent, Stream fileContent)
                //  Given a document name, and a stream containing valid footer content,
                //  add the stream content as a footer in the document.
                const string documentRelationshipType = "http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument";
                const string wordmlNamespace = "http://schemas.openxmlformats.org/wordprocessingml/2006/main";
                const string footerContentType = "application/vnd.openxmlformats-officedocument.wordprocessingml.footer+xml";
                const string footerRelationshipType = "http://schemas.openxmlformats.org/officeDocument/2006/relationships/footer";
                const string relationshipNamespace = "http://schemas.openxmlformats.org/officeDocument/2006/relationships";
                PackagePart documentPart = null;
                using (Package wdPackage = Package.Open(fileContent, FileMode.Open, FileAccess.ReadWrite))
                    //  Get the main document part (document.xml).
                    foreach (System.IO.Packaging.PackageRelationship relationship in wdPackage.GetRelationshipsByType(documentRelationshipType))
                        Uri documentUri = PackUriHelper.ResolvePartUri(new Uri("/", UriKind.Relative), relationship.TargetUri);
                        documentPart = wdPackage.GetPart(documentUri);
                        //  There is only one officeDocument.
                        break;
                    Uri uriFooter = new Uri("/word/footer1.xml", UriKind.Relative);
                    if (wdPackage.PartExists(uriFooter))
                        //  Although you can delete the relationship
                        //  to the existing node, the next time you save
                        //  the document after making changes, Word
                        //  will delete the relationship.
                        wdPackage.DeletePart(uriFooter);
                    //  Create the footer part.
                    PackagePart footerPart = wdPackage.CreatePart(uriFooter, footerContentType);
                    //  Load the content from the input stream.
                    //  This may seem redundant, but you must read it at some point.
                    //  If you ever need to analyze the contents of the footer,
                    //  at least it is already in an XmlDocument.
                    //  This code uses the XmlDocument object only as
                    //  a "pass-through" -- giving it a place to hold as
                    //  it moves from the input stream to the output stream.
                    //  The code could read each byte from the input stream, and
                    //  write each byte to the output stream, but this seems
                    //  simpler...
                    XmlDocument footerDoc = new XmlDocument();
                    footerContent.Position = 0;
                    footerDoc.Load(footerContent);
                    //  Write the footer out to its part.
                    footerDoc.Save(footerPart.GetStream());
                    //  Create the document's relationship to the new part.
                    PackageRelationship rel = documentPart.CreateRelationship(uriFooter, TargetMode.Internal, footerRelationshipType);
                    string relID = rel.Id;
                    //  Manage namespaces to perform Xml XPath queries.
                    NameTable nt = new NameTable();
                    XmlNamespaceManager nsManager = new XmlNamespaceManager(nt);
                    nsManager.AddNamespace("w", wordmlNamespace);
                    //  Get the document part from the package.
                    //  Load the XML in the part into an XmlDocument instance.
                    XmlDocument xdoc = new XmlDocument(nt);
                    xdoc.Load(documentPart.GetStream());
                    //  Find the node containing the document layout.
                    XmlNode targetNode = xdoc.SelectSingleNode("//w:sectPr", nsManager);
                    if (targetNode != null)
                        //  Delete any existing references to footers.
                        //XmlNodeList footerNodes = targetNode.SelectNodes("./w:footerReference", nsManager);
                        //foreach (System.Xml.XmlNode footerNode in footerNodes)
                        //    targetNode.RemoveChild(footerNode);
                        //  Create the new footer reference node.
                        XmlElement node = xdoc.CreateElement("w:footerReference", wordmlNamespace);
                        XmlAttribute attr = node.Attributes.Append(xdoc.CreateAttribute("r:id", relationshipNamespace));
                        attr.Value = relID;
                        node.Attributes.Append(attr);
                        targetNode.InsertBefore(node, targetNode.FirstChild);
                    //  Save the document XML back to its part.
                    xdoc.Save(documentPart.GetStream(FileMode.Create, FileAccess.Write));
            public override void ItemAdded(SPItemEventProperties properties)
                string extension = properties.ListItem.Url.Substring(properties.ListItem.Url.LastIndexOf(".") + 1);
                if (extension == "docx")
                    //string headerContent = GetHeader().Replace("hello", properties.ListItem["Name"].ToString());
                    //string footerContent = GetFooter().Replace("Hi", properties.ListItem["Modified"].ToString());
                    //string footerContent = GetFooter().Replace("Hi", properties.ListItem["_dlc_DocId"].ToString() + "  V : " +properties.ListItem["_UIVersionString"].ToString());
                    string footerContent = GetFooter().Replace("Hi", properties.ListItem["_dlc_DocId"].ToString());
                    //string footerContent1 = GetFooter().Replace("Hi", properties.ListItem["_UIVersionString"].ToString());
                    //Stream headerStream = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(headerContent));
                    //Stream footerStream = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(footerContent));
                    Stream footerStream = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(footerContent));
                    //Stream footerStream1 = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(footerContent1));
                    MemoryStream fileStream = new MemoryStream();
                    fileStream.Write(properties.ListItem.File.OpenBinary(), 0, (int)properties.ListItem.File.TotalLength);
                    //WDAddHeader(headerStream, fileStream);
                    //WDAddFooter(footerStream, fileStream);
                    WDAddFooter(footerStream, fileStream);
                    //WDAddFooter(footerStream1, fileStream);
                    properties.ListItem.File.SaveBinary(fileStream);

    Instead of using Event Receiver approach have you tried looking on adding labels instead?
    Here's a helpful article on how to add labels on your document
    http://blog.isaacblum.com/2011/02/28/add-document-id-to-word-document-automatically/
    Hope this helps
    Artificial intelligence can never beat natural stupidity.

  • The file .docx cannot be opened because there are problems with the contents in sharepoint document library

    I created a site and  created a document libarary and assigned a word document as a content type . I have written a workflow to create a new document . The workflow is working fine and all the document is getting created and values are getting stored
    in it .
    As per my requirement I am saving the above site as site template .
    After that I am creating a new site based on the above site template , Its getting created and I started the workflow , its working fine and the document is also getting created .
    The problem is I am getting error while opening the word document
    Error - The file filename.docx cannot be opened because there are problems with the contents .
    Details - No error details availble .
    Indresh

    What are you doing within the txt document?  Is it general text and string based items, or have you something more elaborate going on?
    An older discussion here elaborates a bit more on the dotx vs docx side of things.
    http://social.msdn.microsoft.com/Forums/en-US/de1b5ff9-ea6d-460c-a707-8c28acd4906f/error-opening-office-open-xml-file-when-using-sd-workflow-to-create-item-in-document-library?forum=sharepointcustomizationlegacy
    Steven Andrews
    SharePoint Business Analyst: LiveNation Entertainment
    Blog: baron72.wordpress.com
    Twitter: Follow @backpackerd00d
    My Wiki Articles:
    CodePlex Corner Series
    Please remember to mark your question as "answered" if this solves (or helps) your problem.

Maybe you are looking for

  • How do I group songs under one album, please?

    OK, so I iTunes works better than Windows media player for my set up with Windows 7 Ultimate .... BUT it 'took over' the Media Player library and put some tracks as separate entries - so, for example, I have one album of Rod Stewart in 6 different pl

  • Popup Blocker NOT working in Safari 5.0.5!

    Hi all: I am using Safari 5.0.5;  Since installing Snow Leopard I seem to be getting more and more pop-up ads. Many will not close, no matter what one does. On others one hits the "X" to close and the *&%$# ad opens the ad page. Is there something th

  • Recommend a Free Mail Server (Windows)

    Hi, Can somebody recommend a free mail server that I can use when running demo's / POC's / Training etc from my local machine for use with Delivers. There seems to be a few out there - I'd be interested if anybody can advise there experience with any

  • How do I get back up to work again..... I pay for this

    trying back up my pins on pin4ever. It gets stuck on93% finished and it stayed like that all night I found a help area that said refresh Firefox by going to the icon in the upper left corner of this page ( 3 bars) and click on t then click on help. T

  • Good Calendar Package?

    I'm not much of a programmer. I wonder if anyone can suggest a good package for creating a calendar for my group's web page. We would want to display events and have links to more information and a link to a shopping cart.