Read contents of Word docs?

Hello all:
I have a directory of word documents that I need to loop
through and read the contents and save various parts of the textual
content into a database.
I've used cfdirectory to loop through the directory and then
cffile action="read" to read the contents of the file into a
variable. However, I have what appears to be binary information
stored before and after the text that is saved in the variable
specified in the cffile tag.
How can I get rid of this so that I'm left with just the text
contained in the Word file?
TIA
Lisa

When you read a binary file you get binary data. Word .doc
are not text
files. If you can not convert the files to txt or at least
rtf files
you will have to use the word com object to parse the file.
This is a
very problematic solution as it involves installing MS Word
on the
server. The trouble is the MS Word is not designed to run on
a server
and both Adobe nee Macromedia, and Microsoft warn against
doing so.
If you do so, have good access to the server. Because as you
program,
anytime you do something that causes MS Word to ask a
question with a
dialog box, it is going to send that to the server's screen
and lock up
and wait for somebody sitting at the server to answer the
dialog. Since
it is not a server application it doesn't understand how to
send these
to clients in any way.
No since you can read some of the text from the binary, you
may be able
to get it out with Regex or other string processing, but that
does not
sound like fun to me.
kitty1967 wrote:
> Hello all:
>
> I have a directory of word documents that I need to loop
through and read the
> contents and save various parts of the textual content
into a database.
>
> I've used cfdirectory to loop through the directory and
then cffile
> action="read" to read the contents of the file into a
variable. However, I have
> what appears to be binary information stored before and
after the text that is
> saved in the variable specified in the cffile tag.
>
> How can I get rid of this so that I'm left with just the
text contained in the
> Word file?
>
> TIA
> Lisa
>

Similar Messages

  • How to conver an adobe reader x1 to word.doc

    How to convert an adobe reader file to word.doc

    Hi masharif,
    To convert a PDF file to Word format, you need ExportPDF, or Acrobat. ExportPDF is an online service that converts PDF files to Word, Excel, and RTF format. If you're only interested in converting PDF files to the formats I've mentioned, then ExportPDF is the most economical option.
    Best,
    Sara

  • Getting a Word-Doc stored in DB as Blob

    Hi !
    I'm an intermedia-beginner and I would like to know, how can I get the content of Word-Doc (stored as blob) by searching (if kriteria is found)?
    Example:
    I have a table: documents(id number pk,note varchar2(20), doc blob )
    indextype is ctxsys.context (pref -inso_filter)
    by searching :
    select id from documents where contains (doc,'search_kriteria') > 0;
    i need to get a document-blob, that includes the search kriteria.
    Thanks

    Hello, Stefan;
    Is this a Windows or Web application?
    Visual Basic 6.0 shipped with Crystal Reports 4.6 Visual Studio .NET 2005 ships with Crystal Reports 10.2 and VS .NET 2008 with version 10.5.
    The Crystal Reports application was rewritten in versions 5.0 and  9.0 and a great many changes were made - among them different options for including OLE Objects. See the [Note 1218374|https://www.sdn.sap.com/irj/servlet/prt/portal/prtroot/com.sap.km.cm.docs/oss_notes_boj/sdn_oss_boj_erq/sap(bD1lbiZjPTAwMQ==)/bc/bsp/spn/scn_bosap/notes.do] for useful new functionality. I tested a report with a linked Word document and it viewed well through Visual Studio .NET 2008.
    Is it possible support for the  the Word format as a BLOB field in MS SQL Server has changed over time. What method is used to create the document as a BLOB field?
    Are you using the bundled version included with Visual Studio .NET? In the article recommending the property 'Retain Original Image Color Depth', there is a note:
    Note: This option only functions with the full version of Crystal Reports 9 or 10. This option does not function with the .NET Report Designer.
    Originally the issue was fixed but only in a full versions of Crystal Reports Developer, not in the versions bundled with Visual Studio .NET.
    I tested with a linked Word document and see some degradation when there is an image in the Word Document but not in the text itself.
    Elaine

  • Reading word doc contents using java

    hi
    can u tell me how to read the word doc contents using java and print it
    sameer

    HDF (Horrible Document Format)
    HDF is our port of the Microsoft Word 97 file format to pure Java. It supports read and write capability. Please see the HDF project page for more information. This component is in the early stages of design. Jump in!
    LOL :)

  • How to read content of Microsoft Word to Webdynpro

    I always find to read the content of a doc to OfficeUIComponent is not difficult, but I wonder if I can get the plain text from the doc.
    I guess they read bytes from doc files can only display in OfficeUIComponent, these bytes can not be print out to console.
    So give me hint if Webdynpro can get plain text from doc file
    Thank you.

    @Anil: The problem will be, you wont be able to identify the text in Word by the ByteStream that we are getting.
    @William: There is no standard way to do this but I sugesst you use <a href="http://jakarta.apache.org/poi/hwpf/index.html">this</a> APIs for reading MS Word file.
    Thanks and Regards,
    Mausam

  • Content in Jsp to be converted to Word Doc

    I have .jsp page. with some generated content. In that page, there is an option Convert to Word DOC PAGE. When the link is clicked , the content in the JSP page has to be converted to a Word Doc. How to do?

    <%@ page language="java" %>
    <%@ page import="java.util.*" %>
    <%@ page import = "java.io.*" %>
    <HTML>
    <HEAD>
    <script language="JavaScript">
    var fso = new ActiveXObject('Scripting.FileSystemObject');
    var wdApp = new ActiveXObject("Word.Application");
    function readFromFile(fileName)
         if (fileName == "C:\\Award_Ltr.TXT")
    var fs = fso.OpenTextFile(fileName);
    var result = fs.ReadAll();
    return result;
    function readFromWord()
    alert("PLEASE SAVE THE FILE AS C:\PPY Letter for Annuities to Retirees and Alternate Payees wi_temp.doc");
    var pause = 0;
    var wdDialogFileOpen = 80;
    var wdApp = new ActiveXObject("Word.Application");
    var dialog = wdApp.Dialogs(wdDialogFileOpen);
    var button = dialog.Show(pause);
    </SCRIPT>
    </HEAD>
    <BODY>
    <FORM NAME="formName">
    <INPUT TYPE="file" NAME="fileName">
    <INPUT TYPE="button" VALUE="show"
    ONCLICK="this.form.fileContent.value = readFromFile(this.form.fileName.value)">
    <BR>
    <TEXTAREA NAME="fileContent" ROWS="20" COLS="90" WRAP="off"></TEXTAREA>
    <BR>
    <INPUT TYPE="button" VALUE="SaveExtract" >
    <BR>
    <INPUT TYPE="button" VALUE="Modify Template" onClick = "readFromWord()">
    </FORM>
    </BODY>
    </HTML>

  • Firefox 4 thinks Word doc should be opened with Adobe Reader

    After upgrading to Firefox 4, I clicked a link to a Word doc, and the dialog box pops asking what Firefox should do with "file.doc "which is a: Adobe Acrobat Document" and it suggests Adobe Acrobat 9.4 (default).

    Yeah, I forgot to mention that I tried that also...MS Word doesn't show up as a Content Type.

  • Export section content to report or ms word doc?

    All,
    Hopefully a quick and simple yes/no question. Is there a way to export just the contents of a set of sections to a word doc or some type of report?
    We created a large set of sections that are unique customer contract languages. The CAT/UAT team wants to be able to validate the content w/o having to create all the test scenarios to generate them in the system.
    thanks!

    There are a couple of command line utilities that you might want to consider: FAP2PDF and FAP2RTF
    These utilities support using wildcards to convert a set of FAP files into PDF or RTF files.
    You can find information about these tools in the Utilities Reference or just run the programs with no parameters in a DOS box.

  • How to open and read pdf and micrsoft word (.doc) files or documents

    My problem is how to use my BB 9800 software version 6.0.0.546 to read/view pdf files and microsoft office documents. I have also bought documents to go from online and have installed it on my phone, but whenever i try to open it I receive a message that it is incompactible. Any help will be greatly appreciated.

    Hi, Sammy.
    Why not install a 3rd party PDF reader and Word Doc reader to help open and read pdf and micrsoft word (.doc) files or documents? You can google it and select one whose way of processing is simple and fast to help you with the related converting work.  It will be better if it is totally manual and can be customized by users according to our own favors. Remember to check its free trial package first if possible. I hope you success. Good luck.
    Best regards,
    Arron

  • When I converted a file from PDF to WORD.DOC it opened as Read Only and I'm unable to alter any of the text.  How can I solve this problem?

    When I tried to convert a file from PDF to WORD.DOC it opened as Read Only and I'm unable to alter any of the text.  How do I solve this problem?  I don't know how to change it from Read Only.

    Dear Sara
    Thank you very much for sending the converted file.  Yes, I am able to edit it - that will be very useful to me and I appreciate your help.
    Referring to your previous message when you said you'd converted via Acrobat and not Acrobat.com, I'm converting via Acrobat.com because that's the only option that is shown (how do I convert via Acrobat? - perhaps that would be more successful).  I'm using the Export PDF website, and the OCR is enabled to "Recognize text in English UK".  However, I've just tried converting a different LPA pdf form, this time for Health & Welfare https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/245571/LPA114_He alth_welfare_LPA.pdf and again I have the same problem as before.  I've altered it from "Read only" and thought I'd then be able to enter text, but "Drawing Tools" is shown in the pane and I'm unable to enter anything onto the form - everything appears to be in images.  Also, the lettering on the heading of' page 1 has become jumbled, with the word "Public" overlaying the word "Guardian."  Any suggestions as to what it is I'm doing wrong would be welcome - I don't like to be beaten!
    Regards,
    Judith.

  • Cannot embed pdfs in MS Word doc which are PDF Version 1.3 (Acrobat 4.x) [This is associated with Adobe Reader 9]. Getting 'program not installed' error.

    Cannot embed pdfs in Word doc which are PDF Version 1.3 (Acrobat 4.x) [This is associated with Adobe Reader 9]. Getting 'program not installed' error.
    Pdfs with other versions are okay:
    Tested successfully with Adobe Pro 9 (Version 1.5 (Acrobat 6.x)), Reader X (Version 1.4 (Acrobat 5.x)) and XI (Version 1.6 (Acrobat 7.x))
    Also cannot open embedded pdfs which have been embedded with this version (getting a similar error).
    Importantly I have removed/disabled all security options in Adobe Reader, following numerous internet suggestions from Adobe and elsewhere. This does not fix the problem.
    Currently the work around is to re-save the pdf with something other than Adobe Reader 9 (so the version updates), but we would prefer not to do this for all the old pdfs we have.
    Thank you,
    Louise.

    Could you please email me the document on which you are seeing this issue at [email protected] ?
    Please mention the forum thread link as well in the mail for the reference.
    Thanks,
    Atul

  • How do I convert a read only word doc to a read only pdf file?

    How do I convert a read only word doc to a read only pdf file?
    Thanks, Linda

    Hi Linda,
    I just tried it, and was able to convert a Word .doc to to PDF, in spite of the fact that it was marked Read Only. Here are the instructions for uploading and converting to PDF with Acrobat.com:
    http://help.adobe.com/en_US/Acrobat.com/Acrobat/WS396AAA88-4AA4-4a40-87B8-004A5DC1E131.htm l
    Kind Regards,
    Michelle

  • Reading a multilevel list from MS Word Doc and converting it into an HTML nested list using C#

    I can achieve the above for a single level list as follows:
    foreach (Paragraph item in app.Selection.Range.ListParagraphs)
    item.Range.InsertBefore("<li>");
    item.Range.InsertAfter("</li>");
    Using C#, how can I programmatically convert a multilevel list (like the following) in a Word doc to a nested HTML list? Note: The bullet icons are not important. Thanks..Nam
    List from Word Doc:
    A
    B
    C
    D
    E
    F
    G
    H
    I

    Hi Nam,
    >>how can we programmatically determine the start and end elements of the sub-list with elements C,DE,F,G in the example of my original post? <<
    We can check the begin and end elements of the sub-list by the
    ListLevelNumber. For example, the sub-list's ListLevelNumber start at 2 by default. Here is the code to find the begin element for your reference:
    Sub FindBeginSubElement()
    For i = 1 To Selection.Range.ListParagraphs.Count
    If Selection.Range.ListParagraphs(i).Range.ListFormat.ListLevelNumber = 2 Then
    Debug.Print "begin sub element:" & Selection.Range.ListParagraphs(i).Range.Text
    Exit Sub
    End If
    Next i
    End Sub
    Also we can loop the selection in reverse order to find the end element for the sub-list.
    Hope it is helpful.
    Regards & Fei
    We are trying to better understand customer views on social support experience, so your participation in this interview project would be greatly appreciated if you have time. Thanks for helping make community forums a great place.
    Click
    HERE to participate the survey.

  • Can we Read/Display the content of Word/PDF file  in Flex 3/4 ?

    Hello All,
    Can we  read/display the content of Word/PDF file in Flex 3 or Flex 4?.  I have one word file containing  Arabic  and English content with some settings like  Bold, Color, Align etc. I want to display the content of this word file as it is in the flex web application.
    Awaiting for prompt reply.
    Thanks and Regards

    thank you for your immediate reply, but,
    sorry, this does not work.
    With this code:
    <cfpdf action = "read" source = "dok_1.pdf" name =
    "mypdf">
    <cfdump var="#mypdf#"/>
    I get this result:
    Everything, but no text of the document.
    PDFDocument
    Application name of application
    Author bimbam Verlag GmbH
    CenterWindowOnScreen [empty string]
    ChangingDocument Allowed
    Commenting Allowed
    ContentExtraction Allowed
    CopyContent Allowed
    Created D:20080710
    DocumentAssembly Allowed
    Encryption No Security
    FilePath [empty string]
    FillingForm Allowed
    FitToWindow [empty string]
    HideMenubar [empty string]
    HideToolbar [empty string]
    HideWindowUI [empty string]
    Keywords [empty string]
    Language [empty string]
    Modified [empty string]
    PageLayout SinglePage
    Printing Allowed
    Producer [empty string]
    Properties [empty string]
    Secure Allowed
    ShowDocumentsOption [empty string]
    ShowWindowsOption [empty string]
    Signing Allowed
    Subject [empty string]
    Title Rheinische Angler-Zeitschrift
    TotalPages 1
    Trapped [empty string]
    Version 1.3
    Maybe i do not understand the cfpdf tag the right way.
    What i want is a kind of pdf-to-text conversion.
    Do I have to use the processddx action? I do not think so.
    But there is a property DocumentText .. ?

  • How can I return my PDF doc back into word doc when Acrobat will not read the files.

    I changed one doc into a PDF and now all my doc have changed to PDF. I need to have all my word doc back but the Adobe said it can not read the formate.

    Hi Jewelhbowie1,
    What was your workflow to change the word doc to PDF format.
    The Acrobat application/Service does not change the original document it makes copy of that document and converts to another format you choose.
    Please share your steps that you choose to convert the word doc to pdfs.
    Regards,
    Ajlan Huda.

Maybe you are looking for

  • Weird problem how to delete a file in the cloud

    Hello there, I'm now a proud member of the Adobe Creative Cloud. Once signed in everything looks normal and I can upload a file simply by dragging and droppping. Now the problem: i'm not able to DELETE a file; there is no trash icon like in the onlin

  • "iTunes cannot run because some of its required files are missing. Please reinstall iTunes." Will I lose purchases?

    So, I received the above message (and continue to receive it even after restartind the computer) when I try to open iTunes. To my knowledge, we did not delete any files. Anyway, I'm concerned that in reinstalling, I will lose purchsed content, specif

  • Alerts to External E-mail PI 7.0  SP13

    Hi, I'm trying to configure alerts to an external e-mail id and was browsing through some the answers in forum. And it surprised me that I do not see some options mentioned in those answers on my SAP screen 1) RWB>ALERT INBOX> PERSONALIZATION--> Time

  • Relation between record in a form and insert statement written in button

    hi all, i am using Forms [32 Bit] Version 6.0.8.24.1 (Production). i have a master detail form with relation A is master of B, B is master of C and a button which resides on control block(other than A,B,C). so i will insert a one record into the bloc

  • Imported Image Resolution

    I know there is a simple fix to this, but I'm stumped. When I'm importing images into Ill CS3, their resolution is always poor. I've tried upping the doc raster effects settings, trying different file types, etc. What am I missing?