Extract the content of word document

Hi Experts,
I need to extract the content of word document in to an internal table using OLE automation.
Suggest us with a sample code or any FM.
Regards
Paul

One way or probably Only way I can think off is using HWPF for reading/writing MS Word documents. It is a Jakartha project which is still under development though. I was able to read and Write WORD documents pretty nicely. Like I said, HWPF is still under development . You can still try using it, it is quite simple. Let me know if need help with coding using HWPF, I used it before. Check out the following link.
http://jakarta.apache.org/poi/hwpf/index.html.
Hope that helps. Thanks.

Similar Messages

Printing the contents of word document in the smartform

Hi all,
Can anyone help me out in solving my problem. Problem is that i need to read a MS word document from a specified location given on the selection and print it in the smart form with the exact formatting options as in the MS word. If possible provide me with some smaple code for this.
Thanks in advance,
Swamy Mantha

Hi all,
it seems all are busy with their work.. A gentle request. Can any one solve the problem which i had posted earlier. for convenience i am posting the query once again.
Can anyone help me out in solving my problem. Problem is that i need to read a MS word document from a specified location given on the selection and print it in the smart form with the exact formatting options as in the MS word. If possible provide me with some smaple code for this.
Thanks in advance for ur efforts,
Swamy Mantha

Extracting content of word document

Hi everybody,
I need to create an application, where the user can enter formated text. (like in Transaction SO10).
If I understand it correctly, the document bound to datasource cannot be in RTF format.
Does anybody know, how i can extract the content (the text and the formats) from a the native .doc format?
Thanx in advance
Bernhard

Mike,
In order to perform versioning you would upload each version of the word doc into the database using a upload document screen. You can automatically put a version number on each upload. The documents can be downloaded and will automatically open word. If I remember correctly you can pop the browser with the correct url directly from word (after saving the doc of course).
Keep Smiling,
Bob R

Cannot search file content on Word document with embedded Excel table

Cannot search file content on Word document with embedded Excel table. I have Windows 8.1 64-bit and Office 2010 Professional. Only phrases from within Excel tables are not searchable. I have many Word documents with embedded Excel table.
I use it for my invoices. Those invoices are converted to pdf to be sent via mail. Searching the same phrases in related pdf files Works fine. And yes, folders are indexed, searching service is active......... For example I can find all invoices that have
specific address or name, which is located in word document, but cannot find invoices with specific item name or price, being that information is in embedded Excel table. (not linked, embedded). I thought that is a question for Windows forum, but guys directed
me here on Office forum. To clarify, I do not use Ctrl+F inside some document, but Windows Search in my folders. Probably the same happens in Office 2013.
Thank you.

Hi, I have a lot of Word documents (invoices, offers). Main part of those documents is embedded Excel file because it is easier to do mathematics in Excel than in Word. There are columns with description, unit price, quantity, taxes... Now, I need
to find who bought HP switch 2530-24G last year. I open folder with last year invoices and search "2530". Cannot find any. But if that document was converted to pdf for mail, than I can find that phrase. Windows search does not work for content if the content
is in embedded file.

How to extract the content of a user uploaded txt file in web dynpro?

Hi,
I'm working on a java web dynpro component. This component consists of document upload field, where users should be able to upload .txt documents. These uploaded text documents should then be somehow read, and thir content displayed. I am already able to upload documents using the upload field, and store it in the context, but I'm still not able to extract the content of these text documents for displaying.
Does anyone have any suggestions of how I could do this?
Any help will be greatly appreciated!
Thanks!

Hi Alain,
You can do through this document on how to upload/download files in Webdynpro.
[https://www.sdn.sap.com/irj/scn/go/portal/prtroot/docs/library/uuid/202a850a-58e0-2910-eeb3-bfc3e081257f]
Once you have the uploaded file in your context if you are storing it as a byte array then convert it to a string using the String constructor String(byte[] bytes) and then you can store this string in an attribute of type String which could be bound to a UI element (TextArea) to display the contents.
If you are using an IWDResource then you will get an inputstream from which you can read the data and convert it to a string for display as mentioned above.
Hope this helps.
Sanyev

Can we Read/Display the content of Word/PDF file in Flex 3/4 ?

Hello All,
Can we read/display the content of Word/PDF file in Flex 3 or Flex 4?. I have one word file containing Arabic and English content with some settings like Bold, Color, Align etc. I want to display the content of this word file as it is in the flex web application.
Awaiting for prompt reply.
Thanks and Regards

thank you for your immediate reply, but,
sorry, this does not work.
With this code:
<cfpdf action = "read" source = "dok_1.pdf" name =
"mypdf">
<cfdump var="#mypdf#"/>
I get this result:
Everything, but no text of the document.
PDFDocument
Application name of application
Author bimbam Verlag GmbH
CenterWindowOnScreen [empty string]
ChangingDocument Allowed
Commenting Allowed
ContentExtraction Allowed
CopyContent Allowed
Created D:20080710
DocumentAssembly Allowed
Encryption No Security
FilePath [empty string]
FillingForm Allowed
FitToWindow [empty string]
HideMenubar [empty string]
HideToolbar [empty string]
HideWindowUI [empty string]
Keywords [empty string]
Language [empty string]
Modified [empty string]
PageLayout SinglePage
Printing Allowed
Producer [empty string]
Properties [empty string]
Secure Allowed
ShowDocumentsOption [empty string]
ShowWindowsOption [empty string]
Signing Allowed
Subject [empty string]
Title Rheinische Angler-Zeitschrift
TotalPages 1
Trapped [empty string]
Version 1.3
Maybe i do not understand the cfpdf tag the right way.
What i want is a kind of pdf-to-text conversion.
Do I have to use the processddx action? I do not think so.
But there is a property DocumentText .. ?

Is it possible to extract the contents of any PDF file using Adobe PDF SDK?

Is it possible to extract the contents of any PDF file using Adobe PDF SDK?
For Example: There is one pdf file. Let us say xxx.pdf with 32 pages. I am interested in only in a topic present at 10th page. Can I extract this information and save it into another pdf file (means new pdf file)?

Thanks Irosenth,
I am actually interested in extract the page and create a new PDF with that page. But still there is a catch that on which basis the page needs to extract either on PAGE number OR on Bookmark basis.
But here in this scenario assume I am looking for the PDF file, now I want to save only page 5. How can I extract page 5 automatically/programmatuically? Or in simple word how can I get the reference link of page 5?
Here I am not getting clear picture that Do I need both SDK Adobe & Acrobat to achieve this requirement. And more over you have mentioned that SDK itself is free. But on Adobe side it is mentioned that it is available by license only with this I have another doubt: To work my desktop/system application with Adobe PDF library, this library needs to distribute with the application. So in this case will it be chargeable for each and every deployment.
Could you please provide me the link from where I can download the SDK? So that I can do some excerise with SDK to figure out the exact flow of functionality to work with my application.

How to display the contents of a document set on a page?

I want to display the contents of a document set (that contains both folders and files) on a page (with the same structure as they are in the document set like folders and files). How to achieve this?
I tried content search webpart but it is of no use as it displays the flat list instead I need folders and files as they are present in the document set
I tried document set contents webpart but as it doesn't accept any connection, it is not of much use.
I will be glad if you have any pointers for me in this regard.
Regards
Kesava

Hi Kesava,
According to your description, you might want to display the content in a document set with its hierarchy.
How about using
Page Viewer Web Part to display the page of the corresponding document set? This would be a non-code solution I would recommend.
More information about Page Viewer Web Part:
https://support.office.com/en-nz/article/Page-Viewer-Web-Part-e364436c-0ec4-4819-acac-1982b3525531
Thanks
Patrick Liang
Forum Support
Please remember to mark the replies as answers if they help and unmark them if they provide no help. If you have feedback for TechNet Subscriber Support,
contact [email protected]
Patrick Liang
TechNet Community Support

How do you recover the contents of a document once its been erased

Hi Pages community
Any idea how to recover the contents of a document once the contents has been erased but the document still exists?
I already reset the Ipad using an Icloud backup from June 11th.
Thanks
Beth

This is modified code. I think it works.
declare
a blob;
b blob;
d number:=1;
e number:=1;
f number:=1;
begin
select module_floorplan into a from tblob where seqno = 100;
select module_floorplan into b from tblob where seqno = 100;
e:=length(a);
d:=length(b);
select dbms_lob.compare(a,b,dbms_lob.lobmaxsize,d,e) into f from dual;
if f = 0 then
   dbms_output.put_line('Same --> '||to_char(f));
else
   dbms_output.put_line('Diff --> '||to_char(f));
end if;
exception
     when others then
          dbms_output.put_line(SQLERRM);
end;

How to extract the content of a mail message?

Friends,,,
How to extract the content of a mail message?
the message does not contain any attachments or images.
its just a plain text..
if i use message.getContent(), in addition of the content it returns headers information also...
but i need only the content of that message...
if i write code like this:
String content = (String) message.getContent();
it gives cast exception...
if the message contains only plain text, no multipart, then which method is useful to extract only the content?
please tell me friends..
thanks in advannce,
regards,
Venkata Naveen

Message.getContent() does not return headers for a simple text/plain
message. If you're getting headers, something else is wrong.
Also, casting the result to String should work.
Most likely the message really isn't a simple text/plain message.
Provide more details and we'll help you figure out what you're
doing wrong.
Also, please read the msgshow.java demo program included with JavaMail.

How to extract the contents of the Jar file?

Hello,
Can anyone tell me how to extract the contents of the Jar file?
An example will be highly appreciated.
Thanks.

From command line, or from within a java application?
Kaj
Btw. Why do you need to do it. You do know that you can add jars to the classpath and read resources from them without extracting the file?

Cannot view the content of XML document.

Hi experts,
i have saved edit form (SAP Demo News) in document folder (KM Content > document). To view the content, I click the XML document then the content is display as design in ShowForm.
My problem is i save the edit form (SAP Demo News) in my own folder (KM Content > MyFolder), then when I click the XML document the content is NOT display as design in ShowForm. But its display as XML coding.
How to make the content of XML document NOT view as XML coding?
Thanks and regards
faeza

Hi faeza ,
by default XML Forms can only be used in the repositories documents and userhome.
If you want to use XML-Forms in your own repository like "MyFolder" than the behaviour is like you have described.
So you have to adjust the KM settings for using XML-Forms as well in the repository "MyFolder".
Goto
System Administration ->
System Configuration ->
Knowledge Management ->
Content Management ->
Configuration
-> Content Management
-> Repository Filters
-> XML Forms Repository Filter
Choose the XML Forms Repository Filter entry "xmlforms_filter" and choose Edit.
Add as "Repositories:" your repository "MyFolder".
Save it and restart the whole portal.
Best regards
Frank

What is the best free word document app?

What is the best free word document app for iphone5?

Take a look at this link, https://itunes.apple.com/us/app/documents-free-mobile-office/id306273816?mt=8

Pro grammatically generate build in document ID at the footer of Word document?

Hi All,
i am trying to added Document ID at the footer of Word file programatically using the following code. Following code is working only when i am uploading the document first time and its adding the ID at the footer. But problem is when i am uploading
same file into same document library, document Id is just going away. i have tried to use ItemUpdated method but it doesn't work. what i need to modify into code, so that same document ID will be remains at the footer of document no matter how many times uploaded
document. any help will greatly appreciated.
using System;
using System.Security.Permissions;
using System.Runtime.InteropServices;
using Microsoft.SharePoint;
using System.IO;
using System.IO.Packaging;
using DocumentFormat.OpenXml.Packaging;
using System.Xml;
using System.Collections.Generic;
using Microsoft.SharePoint.Security;
using Microsoft.SharePoint.Utilities;
using Microsoft.SharePoint.Workflow;
namespace AddHeaderFooterReceiver.ItemAddedEventReceiver
/// <summary>
/// List Item Events
/// </summary>
public class ItemAddedEventReceiver : SPItemEventReceiver
/// <summary>
/// An item was added.
/// </summary>
public string GetFooter()
string footerVal = "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\" ?><w:ftr xmlns:ve=\"http://schemas.openxmlformats.org/markup-compatibility/2006\"
xmlns:o=\"urn:schemas-microsoft-com:office:office\" xmlns:r=\"http://schemas.openxmlformats.org/officeDocument/2006/relationships\" xmlns:m=\"http://schemas.openxmlformats.org/officeDocument/2006/math\" xmlns:v=\"urn:schemas-microsoft-com:vml\"
xmlns:wp=\"http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing\" xmlns:w10=\"urn:schemas-microsoft-com:office:word\" xmlns:w=\"http://schemas.openxmlformats.org/wordprocessingml/2006/main\" xmlns:wne=\"http://schemas.microsoft.com/office/word/2006/wordml\"><w:p
w:rsidR=\"00C24C70\" w:rsidRDefault=\"00C24C70\"><w:pPr><w:pStyle w:val=\"Footer\" /></w:pPr><w:r><w:t>Hi</w:t></w:r></w:p><w:p w:rsidR=\"00C24C70\" w:rsidRDefault=\"00C24C70\"><w:pPr><w:pStyle
w:val=\"Footer\" /></w:pPr></w:p></w:ftr>";
return footerVal;
public void WDAddFooter(Stream footerContent, Stream fileContent)
// Given a document name, and a stream containing valid footer content,
// add the stream content as a footer in the document.
const string documentRelationshipType = "http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument";
const string wordmlNamespace = "http://schemas.openxmlformats.org/wordprocessingml/2006/main";
const string footerContentType = "application/vnd.openxmlformats-officedocument.wordprocessingml.footer+xml";
const string footerRelationshipType = "http://schemas.openxmlformats.org/officeDocument/2006/relationships/footer";
const string relationshipNamespace = "http://schemas.openxmlformats.org/officeDocument/2006/relationships";
PackagePart documentPart = null;
using (Package wdPackage = Package.Open(fileContent, FileMode.Open, FileAccess.ReadWrite))
// Get the main document part (document.xml).
foreach (System.IO.Packaging.PackageRelationship relationship in wdPackage.GetRelationshipsByType(documentRelationshipType))
Uri documentUri = PackUriHelper.ResolvePartUri(new Uri("/", UriKind.Relative), relationship.TargetUri);
documentPart = wdPackage.GetPart(documentUri);
// There is only one officeDocument.
break;
Uri uriFooter = new Uri("/word/footer1.xml", UriKind.Relative);
if (wdPackage.PartExists(uriFooter))
// Although you can delete the relationship
// to the existing node, the next time you save
// the document after making changes, Word
// will delete the relationship.
wdPackage.DeletePart(uriFooter);
// Create the footer part.
PackagePart footerPart = wdPackage.CreatePart(uriFooter, footerContentType);
// Load the content from the input stream.
// This may seem redundant, but you must read it at some point.
// If you ever need to analyze the contents of the footer,
// at least it is already in an XmlDocument.
// This code uses the XmlDocument object only as
// a "pass-through" -- giving it a place to hold as
// it moves from the input stream to the output stream.
// The code could read each byte from the input stream, and
// write each byte to the output stream, but this seems
// simpler...
XmlDocument footerDoc = new XmlDocument();
footerContent.Position = 0;
footerDoc.Load(footerContent);
// Write the footer out to its part.
footerDoc.Save(footerPart.GetStream());
// Create the document's relationship to the new part.
PackageRelationship rel = documentPart.CreateRelationship(uriFooter, TargetMode.Internal, footerRelationshipType);
string relID = rel.Id;
// Manage namespaces to perform Xml XPath queries.
NameTable nt = new NameTable();
XmlNamespaceManager nsManager = new XmlNamespaceManager(nt);
nsManager.AddNamespace("w", wordmlNamespace);
// Get the document part from the package.
// Load the XML in the part into an XmlDocument instance.
XmlDocument xdoc = new XmlDocument(nt);
xdoc.Load(documentPart.GetStream());
// Find the node containing the document layout.
XmlNode targetNode = xdoc.SelectSingleNode("//w:sectPr", nsManager);
if (targetNode != null)
// Delete any existing references to footers.
//XmlNodeList footerNodes = targetNode.SelectNodes("./w:footerReference", nsManager);
//foreach (System.Xml.XmlNode footerNode in footerNodes)
// targetNode.RemoveChild(footerNode);
// Create the new footer reference node.
XmlElement node = xdoc.CreateElement("w:footerReference", wordmlNamespace);
XmlAttribute attr = node.Attributes.Append(xdoc.CreateAttribute("r:id", relationshipNamespace));
attr.Value = relID;
node.Attributes.Append(attr);
targetNode.InsertBefore(node, targetNode.FirstChild);
// Save the document XML back to its part.
xdoc.Save(documentPart.GetStream(FileMode.Create, FileAccess.Write));
public override void ItemAdded(SPItemEventProperties properties)
string extension = properties.ListItem.Url.Substring(properties.ListItem.Url.LastIndexOf(".") + 1);
if (extension == "docx")
//string headerContent = GetHeader().Replace("hello", properties.ListItem["Name"].ToString());
//string footerContent = GetFooter().Replace("Hi", properties.ListItem["Modified"].ToString());
//string footerContent = GetFooter().Replace("Hi", properties.ListItem["_dlc_DocId"].ToString() + " V : " +properties.ListItem["_UIVersionString"].ToString());
string footerContent = GetFooter().Replace("Hi", properties.ListItem["_dlc_DocId"].ToString());
//string footerContent1 = GetFooter().Replace("Hi", properties.ListItem["_UIVersionString"].ToString());
//Stream headerStream = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(headerContent));
//Stream footerStream = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(footerContent));
Stream footerStream = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(footerContent));
//Stream footerStream1 = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(footerContent1));
MemoryStream fileStream = new MemoryStream();
fileStream.Write(properties.ListItem.File.OpenBinary(), 0, (int)properties.ListItem.File.TotalLength);
//WDAddHeader(headerStream, fileStream);
//WDAddFooter(footerStream, fileStream);
WDAddFooter(footerStream, fileStream);
//WDAddFooter(footerStream1, fileStream);
properties.ListItem.File.SaveBinary(fileStream);

Instead of using Event Receiver approach have you tried looking on adding labels instead?
Here's a helpful article on how to add labels on your document
http://blog.isaacblum.com/2011/02/28/add-document-id-to-word-document-automatically/
Hope this helps
Artificial intelligence can never beat natural stupidity.

The file .docx cannot be opened because there are problems with the contents in sharepoint document library

I created a site and created a document libarary and assigned a word document as a content type . I have written a workflow to create a new document . The workflow is working fine and all the document is getting created and values are getting stored
in it .
As per my requirement I am saving the above site as site template .
After that I am creating a new site based on the above site template , Its getting created and I started the workflow , its working fine and the document is also getting created .
The problem is I am getting error while opening the word document
Error - The file filename.docx cannot be opened because there are problems with the contents .
Details - No error details availble .
Indresh

What are you doing within the txt document? Is it general text and string based items, or have you something more elaborate going on?
An older discussion here elaborates a bit more on the dotx vs docx side of things.
http://social.msdn.microsoft.com/Forums/en-US/de1b5ff9-ea6d-460c-a707-8c28acd4906f/error-opening-office-open-xml-file-when-using-sd-workflow-to-create-item-in-document-library?forum=sharepointcustomizationlegacy
Steven Andrews
SharePoint Business Analyst: LiveNation Entertainment
Blog: baron72.wordpress.com
Twitter: Follow @backpackerd00d
My Wiki Articles:
CodePlex Corner Series
Please remember to mark your question as "answered" if this solves (or helps) your problem.

Extract the content of word document

Similar Messages

Maybe you are looking for