Searching Words in .RTF documents

I have a problem with the use of "contains" option within .rtf document.
I have a table (note_tab) in which i put .rtf text:
note_tab (id integer,note varchar2(250));
with primary key on the column id.
Then I have a context index on the column "note":
CREATE INDEX note_idx ON note_tab(note) indextype IS ctxsys.context parameters ('FILTER CTXSYS.AUTO_FILTER');
I inserted in the table some text with .rtf format.
If I try to search some word with the "contains" option then it works.
But I have a problem: if i search a word that is in the table but having different font types, font dimensions, colors etc... in itself , then the "contains" command doesn't work.
For example if I look for the word "country" and this word is in the .rtf text (that I inserted in the table) but it is written like this "co[i]untry", then the "contains" command doesn't find it.
If I do the same thing with an HTML document (instead of a rtf document) using a context index with the NULL_FILTER option, it works!
So is there some option in the creation of the index to ensure the fact that, in .rtf documents, I can find also words with different fonts,dimensions etc.. within themselves?
Thank you
Massimo

For example if I look for the word "country" and this
word is in the .rtf text (that I inserted in the
table) but it is written like this
"co[i]untry", thenthe "contains" command doesn't find it.
The AUTO_FILTER converts the RTF to HTML. This HTML contains the and tags to preserve the look and feel of the document. You are then implicitly invoking the HTML_SECTION_GROUP which strips out the HTML tags and replaces them with a space. So by the time the document reaches the lexer it looks like "c o un tr y". This is why your query does not match this document. The reason HTML_SECTION_GROUP replaces HTML tags with spaces is because a lot of times when people/programs use HTML tags they don't use space around them e.g. "...Oracle Text</TITLE><BODY>My sentence starts...". If HTML_SECTION_GROUP did not replace HTML tags with spaces then this document would like as follows by the time it arrives at the lexer: "...Oracle TextMy sentence starts...". This will cause the search for 'Text' to fail. The use of tag(s) within word, like in your example, is very rare. Are there certain HTML tags which should be replaced with an empty string instead of a space by HTML_SECTION_GROUP? I would like to hear what you and other members of the forum think.
If I do the same thing with an HTML document (instead
of a rtf document) using a context index with the
NULL_FILTER option, it works!Unless I misunderstood your question I don't think that can work with your "country" example. Can you post your code?
Faisal
Message was edited by:
mfaisal

Similar Messages

Need suggestion for searching words in a document

Is there any option to search the words in a document, where I want to make a user designed words or phrases saved itself in a PDF docment. For e.g I need to search the data with words like Salary, Compensation, Remuneration, etc in an annual report (PDF Version). For which I need to type these words hundreds of times everyday. So if I have a option to upload these words in the adobe reader itself it will save most of my time. please help me with ur opinions and suggestions.

Nothing in Reader itself. A macro program might help.

Open a word or RTF document with table doen't shown right

I found some problem opening rtf or word documents with pages if they have a table
Rtf doesn't import table
Word has problem with merged cells

I don't understand the point of your post.
If you're trying to report a problem or "bug" with Pages, that is not the purpose of this user-to-user forum. You should leave feedback for the Pages team on this page.
I've not had a problem opening Word files with tables in Pages. If you're saying Word can't handle tables with merged cells, then don't use merged cells in files you are going to export as Word. Word & Pages must handle merged table cells differently, as I know both can do that. As far as RTF, Pages can open & export as RTF. Again, if you are going to export as RTF, don't use tables. Neither of these is a fault of Pages, just limitations of the formats/programs.

Generate MS Word and RTF document

Hello,
is any library which I could use to gerenate MS Word document ?
I'm looking also for something which could help me to generate RTF file.
with best regards,
Rafal

Hello,
google "java ms word library"I made it before. I couldn't find any library for free. There is POI but this project wasn't finished, so it's not ready for production.
with best regards,
Rafal

How to insert a word document or an RTF document into RichTextEditor?

How to insert a word document or an RTF document into af:richTextEditor. I am using Apache POI for reading the Word document and getting its contents. I am able to display the whole content of the document except the table and image within the document. The data in the table is getting displayed as a string and not as a table inside the editor.
Can we insert a word/RTF document into a rich text editor?
Can we insert images into the rich text editor?
The following is the code that I used. On clicking a button the word document has to be inserted into the <af:richTextEditor>.
<af:richTextEditor id="rte1" autoSubmit="true"
immediate="true"
columns="110" rows="20">
<af:dropTarget dropListener="#{SendEmail.richTextEditorDrop}">
<af:dataFlavor flavorClass="java.lang.String"/>
</af:dropTarget>
</af:richTextEditor>
<af:commandButton text="Insert at position" id="cb2">
<af:richTextEditorInsertBehavior for="rte1" value="#{RichTextEditorUtil.docFile}"/>
</af:commandButton>
Java Code: I am using Apache POI for reading the word document.
import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.extractor.WordExtractor;
public String getDocFile() {
File docFile = null;
WordExtractor docExtractor = null ;
WordExtractor exprExtractor = null ;
try {
docFile = new File("C:/temp/test.doc");
//A FileInputStream obtains input bytes from a file.
FileInputStream fis=new FileInputStream(docFile.getAbsolutePath());
//A HWPFDocument used to read document file from FileInputStream
HWPFDocument doc=new HWPFDocument(fis);
docExtractor = new WordExtractor(doc);
catch(Exception exep)
System.out.println(exep.getMessage());
//This Array stores each line from the document file.
String [] docArray = docExtractor.getParagraphText();
String fileContent = "";
for(int i=0;i<docArray.length;i++)
if(docArray[i] != null)
System.out.println("Line "+ i +" : " + docArray);
fileContent += docArray[i] + "\n";
System.out.println(fileContent);
return fileContent;

Hi,
images are not yet supported. Its an open enhancement request for the rich text editor.
For tables, it seems they are supported but in a basic way (just HTML 4 style) if I interpret the tag documentation correct
http://download.oracle.com/docs/cd/E15523_01/apirefs.1111/e12419/tagdoc/af_richTextEditor.html
Frank

How do you change the default setting for RTF documents from TextEdit to MS Word

How do you change the default setting for RTF documents from TextEdit to MS Word, I download forms from the Canadian government and they use RTF as a standard, they are supposed to open as MS Word documents (Yes I have Office for MAC 2011) but when download the document it goes to TextEdit which messes up the form format. I can down load it right click and go "open with" MS Word but how can I get the default chnaged to save this step.

richr604 wrote:
Thanks I knew this, I like Safari as I want to keep everything MAC, but if I use Firefox the documents open in MS Word straight away, I was just curious if there was some setting I could change?
In Firefox > Preferences, click the Applications tab. If RTF is there, use the pulldown menu in the Action column to pick MS Word. If RTF is not there, there are instructions here for how to add file types.

Placing Microsoft Word 7 RTF files in PageMaker 6.0 document

I am a weekly newspaper editor. For years I have written my articles on a PC at home using Microsoft Word 2000, saved them to disc as RTF files, transferred the RTF files to our Mac (running OS 9.2.2) at work, and placed them into PageMaker 6.0 documents with no problems.
I now have a new PC at home running Microsoft Office Word 7. When I attempt to place the RTF files produced by Microsoft Word 7 into a PageMaker 6.0 document at work (on the Mac running OS 9.2.2) I get the following error messages: RTF import syntax warning: Unrecognized character set, mac character set used. When I click OK the same message repeats. When I click OK a second time I get a different message:
RTF import syntax warning, unrecognized token.
Is there anything I can do to get these Word 7 RTF files to work with our old Macs?
I forgot to inform you my new PC operating system at home is Windows Vista Home Premium.
--Roger W. Bonham

You might be able to save your word file as a Macintosh word 6 file or Word 97/98 file. That should help. You might, like Buko suggests, upgrade to PageMaker 6.5 and then get the free upgrade to 6.5.2.
It might also work to save your word file as a text only file.
Upgrading your PageMaker should get rid of your Word woes, but beware of recent Word apps, which have no support from PageMaker.

Problem displaying Word 2002 RTF files in JEditorPane

Hi all,
I am having a problem displaying RTF files created in Word 2002/Office XP in a JEditorPane.
Our code, which does the usual stuff:
JEditorPane uiViewNarrativeEda = new JEditorPane();
uiViewNarrativeEda.setContentType(new RTFEditorKit().getContentType());
FileInputStream inDocument = new FileInputStream("c:/temp/testing.rtf"); uiViewNarrativeEda.read(inDocument, "");
inDocument.close();
works just FINE with RTF files created in Word 97, WordPad etc, but it seems that Word 2002 adds some tags to the RTF file that the RTFReader cannot handle.
For example, I believe the following exception is due to the new \stylesheet section that Word 2002 adds to the RTF file:
java.lang.NullPointerException:
 at javax.swing.text.rtf.RTFReader$StylesheetDestination$StyleDefiningDestination.close(RTFReader.java:924)
 at javax.swing.text.rtf.RTFReader.setRTFDestination(RTFReader.java:254)
 at javax.swing.text.rtf.RTFReader.handleKeyword(RTFReader.java:484)
 at javax.swing.text.rtf.RTFParser.write(RTFParser.java, Compiled Code)
 at javax.swing.text.rtf.AbstractFilter.readFromReader(AbstractFilter.java:111)
 at javax.swing.text.rtf.RTFEditorKit.read(RTFEditorKit.java:129)
 at javax.swing.text.JTextComponent.read(JTextComponent.java:1326)
 at javax.swing.JEditorPane.read(JEditorPane.java:387)
Does anyone have similar problems or knows how I could get around this?
I thought about writing a parser that replaces the \stylesheet section with one that works but that seems a lot of work and it does not always work (I tried that by copying and pasting...).
Maybe I could replace the RTF converter that Word 2002 is using with another one - but how?

Hello again,
I found out that the 2002 version of MS Word writes a lot more of data into the file than e.g. Wordpad does.
There is one section that causes the problem, its called "\stylesheet".
My workaround (working in my case) is to wrap the input stream of the RTF document and remove this section (only in memory).
See example implementation:
<<<<<<<<<<<<<<<<<<<<<<< SOURCE CODE<<<<<<<<<<<<<
* Copyright 2004 DaimlerChrysler TSS.
* All Rights Reserved.
* Last Change $Author: wiedenmann $
* At $Date: 2004/03/31 11:08:54CEST $.
package com.dcx.tss.swing;
import java.io.*;
* This class provides a workaround for parse errors in the
* {@link javax.swing.text.rtf.RTFEditorKit}. These errors are caused
* by new format specification for RichTextFormat (RTF V1.7). 
* 
* The workaround is to filter out a section of the RFT document
* which causes an exception during parsing it. This section has no
* impact on the display of the document, it just contains some
* meta information used by MS Word 2002. 
* The whole document will be loaded into memory and then the section
* will be deleted in memory, there is no affect to the document
* directly (on file system). 
* 
* This workaround is provided without any warranty of completely solving
* the problem.
* @version $Revision: 1.1 $
* @author Wiedenmann
public class RtfInputStream extends FilterReader {
/** Search string for start of the section. */
private static final String SEC_START = "{\\stylesheet";
/** Search string for end of the section. */
private static final String SEC_END = "}}";
/** Locale store for the document data. */
private final StringBuffer strBuf = new StringBuffer();
* Wrapper for the input stream used by the RTF parser. 
* Here the complete document will be loaded into a string buffer
* and the section causes the problems will be deleted. 
* 
* @param in Stream reader for the document (e.g. {@link FileReader}).
* @throws IOException in case of I/O errors during document loading.
public RtfInputStream( final Reader in ) throws IOException {
super( in );
int numchars;
final char[] tmpbuf = new char[2048];
// read the whole document into StringBuffer
do {
numchars = in.read( tmpbuf, 0, tmpbuf.length );
if ( numchars != -1 ) {
strBuf.append( tmpbuf, 0, numchars );
} while ( numchars != -1 );
// finally delete the problem making section
deleteStylesheet();
* Deletion of the prblematic section.
private void deleteStylesheet() {
// find start of the section
final int start = strBuf.indexOf( SEC_START );
if ( start == -1 ) {
// section not contained, so just return ...
return;
// find end of section
final int end = strBuf.indexOf( SEC_END, start );
// delete section
strBuf.delete( start, end + 2 );
* Read characters into a portion of an array. 
* The data given back will be provided from local StringBuffer
* which contains the whole document.
* @param buf Destination buffer.
* @param off Offset at which to start storing characters -
* <srong>NOT RECOGNIZED HERE..
* @param len Maximum number of characters to read.
* @return The number of characters read, or -1 if the end of the
* stream has been reached
* @exception IOException If an I/O error occurs
public int read( final char[] buf, final int off, final int len ) throws IOException {
if ( strBuf.length() == 0 ) {
// if buffer is empty end of document is reached
return -1;
// fill destination array
int byteCount = 0;
for (; byteCount < len; byteCount++) {
if ( byteCount == strBuf.length() ) {
// end reached, stop filling
break;
// copy data to destination array
buf[byteCount] = strBuf.charAt( byteCount );
// delete to copied data from local store
strBuf.delete( 0, byteCount + 1 );
return byteCount;
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Integration of the warpper looks like:
RtfInputStream inDocument = new RtfInputStream( new FileReader("test.rtf"));
Document doc = rtf.createDefaultDocument();
rtf.read(inDocument, doc, 0 );
Hope this helps - for me it did :-)
Timo Wiedenmann
DaimlerChrysler TSS, Germany

How to search for text in document

How do you search text within a document on the Samsung Galaxy 10.1? I click the search icon, enter the text to search and nothing happens. The only options I see are a circled "x" and "Cancel". The program does not automatically go to the entered search string and I cannot find anyway to cause the program to find the words. Thanks,

Once you have typed your search term in, you need to tap the "enter" key on the keyboard. Once you do, the keyboard should close, and you should see the first instance selected.
There will be two Arrow buttons in the bottom center of the screen to go back and forth through all of the instances of this word in the document.
If the word could not be found, then a message displaying "search term" not found will appear at the bottom of the screen above those buttons.

OLE object using MS Word -- Unprotect/Protect document

Hi.
I'm on ECC6.0. I need a functionality to open a Word file (.rtf / .doc) on user's temp folder, unprotect it (like using Word's Tools -> Unprotect Document because, by default, the doc is protected w/o a password) to be able to protect it with a password, and then save it. I tried the following code but it's not working as I expected for the requirement:
TYPE-POOLS: ole2.
DATA:
lo_word TYPE ole2_object,
lo_worddocs TYPE ole2_object,
lo_worddoc TYPE ole2_object.
        CREATE OBJECT lo_word 'WORD.Application'.
        GET PROPERTY OF lo_word 'Documents' = lo_worddocs.
        CALL METHOD OF lo_worddocs 'OPEN'
          EXPORTING
          #1 = lv_file_source.
        GET PROPERTY OF lo_word 'ActiveDocument' = lo_worddoc.
        CALL METHOD OF lo_worddoc 'UNPROTECT'.
        CALL METHOD OF lo_worddoc 'PROTECT'
          EXPORTING #1 = 'password'.
        CALL METHOD OF lo_worddocs 'SAVE'.
        CALL METHOD OF lo_worddocs 'CLOSE'.
In tcode SOLE, I found two relevant entries:
OLE application      WORD.BASIC
Version number       6
CLSID                {000209FE-0000-0000-C000-000000000046}
CLSID LibType        {000209FE-0000-0000-C000-000000000046}
OLE object name      WORDBASIC
Type Info key        NO_TYPELIB
Include program
Language             EN
Check authorization
Text Microsoft Word 6.0 Wordbasic
OLE application      WORD97
Version number
CLSID                {000209FF-0000-0000-C000-000000000046}
CLSID LibType
OLE object name
Type Info key        NO_TYPELIB
Include program
Language
Check authorization
Text
Pls help me to the right track.
Thanks.

Post Author: fwinter
CA Forum: .NET
We have the same problem. A page footer generated in Word, embedded in a CR-Report shows in the report viewer with overlapping characters. This is "a known limitation" we were told. Not really a problem, but when exported to PDF, the PDF also shows the overlapping characters and prints with overlapping characters as well. This is our problem!We've tried different fonts, different font- an pagesizes in Word, the "can grow" checkbox etc. We believe, the result is affected by the printer driver but cannot really find a way, to avoid the problem and get a clear print at all of our customers.Anyone figured out how to solve the problem? CA Support unfortunatly couldn't help We are using CR Merge Moduls XI R2 in our Report Viewer, Word 2003, Problem appears on local machines as well as on Citrix.

Does anyone know how to use pages so you can export pdfs from the internet and automatically drag words from the document into the file name of the pdf (i.e., author, title of a scientific paper)

Does anyone know how to use pages so you can export pdfs from the internet and automatically drag words from the document into the file name of the pdf (i.e., author, title of a scientific paper). For example, if I am downloading a paper by smith called "Surgery" that was published in 2002, it will automatically set the file name in the download to smith- surgery 2002. I have heard pages is smart enough to do this.
thank you

Pages can export only its own documents. They may be exported as PDF, MS Word, RTF or Text files.
Pages can import (ie. Open a file as, or Insert a file into, a Pages document) documents in several formats, but won't rename the document as you describe. Documents that can be Opened (eg. Text, AppleWorks 6 WP, MS Word files) are converted to Pages documents, and retain their original names, with .pages replacing the original file extension. Files that can be Inserted (generally .jpg, .pdf and other image files) become part of the existing Pages file and lose their names.
It may be possible, using AppleScript, to extract the text you want and to Save a Pages file using that text as the filename, but that would depend in part on being able to identify which text is wanted and which is not.
How will the script determine where the author's name begins and where it ends?
How will the script recognize the beginning and of the title, an decide how much of the title to use in the filename?
How will the script recognize the year of publication?
For papers published in a specific journal, with a strict format for placing each of these pieces on information, or containing the needed information as searchable meta data in the file, this might be possible. But it would require knowledge of the structure of these files, and would probably handle only papers published in a specific journal or set of journals.
Outside my field of knowledge, but there are some talented scripters around here who might want to take a closer look.
Best of luck.
Regards,
Barry

Searching word docs with verity - strange problem

I am searching word documents using cfsearch. These documents
are basically forms.
Some of the fields such as name etc. are with back fill
enable for user input. So if I search the names in this field with
back fill I get 0 records.
If it's a plain text it's can be searched.
Can anyone please help!
Thanks

pr_coldfusion-
You can grab the updater here:
http://www.adobe.com/support/coldfusion/downloads_updates.html#mx7
-Courtney

Help in merging the output rtf/documents reports

Hi,
I am working with XML pub tool for a user requirement that 2 output reports(rtf documents) were genated using XML processor from 2 rtf templates, XML data files etc. No I need to merge the 2 output rtf/doc reports . Is there any function in XML publisher so that I can merge these output reports as one report.(Same as merging 2 documents in microsoft word)
Thanks
Raj

You can merge/append 2 documents with a third party tools wihtout using XMLP publisher.
HTH
Shaun S

Unable to combine microcsoft word or rtf files when creating a .pdf

We cannot add microsoft word or .rtf files when trying to combine them into a single .pdf file..
When you click on Add files..the word or .rtf files are greyed out and you
cannot select them..
We are using Mountain Lion OS (ltatest version) and adobe acrobat version 10.1.5

Before Acrobat XI, it was only possible in Windows, not on a Macintosh.
http://indesignsecrets.com/new-acrobat-xi-and-reader-xi-can-change-workflows.php
"In previous versions of Acrobat, Mac users were stymied when you wanted to open a Microsoft Word, PowerPoint, or Excel file into Acrobat to convert it to PDF, or to include these files when combining documents together. Now, if you have Microsoft Office installed on your Mac, you can include these files. (There are still no PDF Maker plug-ins installed on the Mac, and hyperlink and bookmark export from Word is not yet supported.)"

Content Search and not the document in SharePoint 2010

Hi,
The requirement is : Search the content of documents by storing the data in a database-driven structure and get the search results in a grid view with data in different columns. My questions are :-
1) Can we store complete content of the word document in a separate database (other than content DB of site collection) or in list/library and show in search results/SP search result page ?
2) Can we convert the existing documents in XML format and save it SharePoint ? Will that content be visible in Search results ?
3) How can we modify search page to have check box before every search result ?
4) How can we export the selected search results in an excel file ?
Any help will be highly appreciated !!!
Vipul Jain

Hi Inderjeet,
Thanks for the reply. But the client's existing documents are in .doc (Word 2003) format , which is not an open-xml supported format. So the first step would be -
Q1) How can we extract the content from word documents ?
The existing documents are based on a standard corporate template (.dot). These documents are having very heavy content , for example, resume document of an employee, having 20 experiences in very big paragraphs in text format.
Q2) Can we store such large content (experience/qualifications/professional licenses etc) in SharePoint list/library columns ? If yes , which column type will be used...
Its true that we can convert the existing documents in XML and then the content will be searchable in SharePoint 2010.
Q3) Can we customize the search page to that level that it can give us results in a grid view having different columns , and user can select the multiple search results and create a document dynamically based on those selected search results.
I Know anything can be done using customization's (.net C#)..I just want to clarify for SharePoint search page. If yes , then there will be no need to store the document contents in a separate SQL database.
Kindly reply.
Vipul Jain

Searching Words in .RTF documents

Similar Messages

Maybe you are looking for