Document Indexing

If we go through the Google Search, we can find out that it display's URLs which not only has been registered with the search engine with some keywords but also does the complete document indexing. (I hope I am correct, atleast that is what I have seen during some searches).
Can I acheive the same type of indexing on documents in iFS? If yes, does it work on all types of documents?

Sastry--
Yes, definitely, Oracle iFS can do this. Text indexing of HTML and 150+ other formats is what we get by using interMedia Text. The 150+ formats covers everything that you might want to index, including HTML, XML, the Microsoft Office document formats, PDF, and as we product marketing types like to say, "and so much more!"
null

Similar Messages

Can't open my IBA project: error parasing document index: invalid character in attribute value

Hi guys,
I am new of iBooks Author and I am not HTML code savvy (expect the very basics). I have to send an iBooks for an University exam and I can't open my file anymore.
I was working on it since Saturday, I saved many times and I have quit the iBooks Author and rebooted my iMac few moments ago. There were no crashes.
The project is a iBook with pictures, videos and some Tumult Hype animations. It is 248Mb.
Now I can't open the file (but I can see it in quick view). The error message is in the subject, but here it is again:
"Progetto iBooks" couldn't be opened. Error parsing document index: invalid character in attribute value.
Now, I have to send the project today (here in Italy it is now 7:10 am) and so I am kind of desperate.
Any suggestion?
I am already working on making a new ibook, but if someone could give me a workaround to fix it and open it on iBooks I would be extremely grateful.
Best regards,
Sebastiano

Hi Sebastiano,
Did you ever get any answer? I have the same problem and now have to work from an old version again (good thing I had backup one manually!)
Thx
JP.

Error parsing document index: invalid character in attribute value

Hi,
I am working on an eBook with iBook Author. Once in a while, after I finish working, and want to reopen my eBook a few hours or days after I get the error:
'title' could not be opened
Error parsing document index: invalid character in attribute value
I have to go back to an old backup I always do before quitting.
What is the reason of this error message and anyw ay to fix/repair it on a version I just saved?
Thanks,
JP.

Hello,
I have the same error, but for me the above solution did not work.
Did I understand correctly? This is what I did:
1. I changed the extension of the IBA file in ZIP
2. I unzipped the file
3. in the folder with the unzipped book I have changed the file index.xml in index.html
4. I zipped it all back (in a ZIP file)
5. I renamed the extension ZIP archive in IBA
6. I tried to open the book and I got the error that there was no index.xml file
7. I changed the extension of the IBA and in ZIP
8. I unzipped the file again
9. I changed back index.xml to index.html and I zipped it back (compressed it)
10. I renamed the extension ZIP archive in IBA
11. I opened the book
Is this correct?
The problem is that I am still receiving the same error message: Error parsing document index: invalid character in attribute value
Did I do something wrong?
If you can help I would be very greatfull, I worked hard on this book and it is the only backup that I have saved.
Thank you!

Can't open file "/Library/WebServer/Documents/index.html.en."

Can’t open file “/Library/WebServer/Documents/index.html.en.”

You will never be able to open this file with iWeb, because, as it states it is a User/Sites/index.html file and iWeb cannot open this files - anything with html on the end of it is already published and iWeb cannot open this. It is not the same as your domain file.
Go to User/Library/Application Support/iWeb/domain.sites and try opening your domain file, rather than something that is already published. iWeb cannot import - at least not published pages.

Fireworks: How to find document index for scripting?

Can anyone share a simple method for obtaining the document index of an active Fireworks document, within the array of open documents? Normally, the current document is accessed by using fw.getDocumentDOM(), but I'd like to obtain the actual array index value (e.g., 0, 1, 2, 3, etc.) to use elsewhere in a script.
I've created a function to obtain this index value, but it's ungainly: It compares entire DOMs that have been converted to source. This requires too much memory or processing and nearly brings the script to a halt. I need something simpler.
var dom = fw.getDocumentDOM();
function documentIndex() {
    if (fw.documents.length == 1) {
        return 0;
    else if (fw.documents.length > 1) {
        var i = 0;
        for (i = 0; i < fw.documents.length; i++) {
            if (fw.documents[i].toSource() == dom.toSource()) {
                return i;
                break;
I've considered using a document property like docTitleWithoutExtension, filePathForRevert, or filePathForSave as a basis for comparison between documents, but these are unreliable: They won't work if the documents in question have have not yet been saved.
I figure there must be a simple method, I just don't know what it is.

In case anyone's interested, here's the workaround I'm now using to find the position of an active Fireworks document within the array of open documents (a.k.a., the document index). I'd still love to hear from anyone who can suggest a simpler method.
The basic idea is to first look at two properties of the active document—docTitleWithoutExtension and filePathForRevert—and, where possible, compare those values to that of the same properties within each open document. On their own, each property has loopholes, which is why I'm combining them. (For example, it's possible to have two documents with the same title—if they have different extensions or file paths. Likewise, it's possible to open multiple copies of an unsaved document, all with the same filePathForRevert value.)
New or untitled documents demand an entirely different criteria. I'm not crazy about this approach, but it seems like the best option at this point: When you can't find a DOM property to reliably distinguish one document from another, you temporarily write a distinctive value into a property of the document you want to find. Think of it like tagging a wild animal. In this case, when the active document lacks both a docTitle and file path, I write a crazy piece of "alt text" gibberish into the defaultAltText property and use that to identify my active document. (Even though I'm taking care to restore the original "alt text" afterwards, I don't love this method... but it seems to work.)
var dom = fw.getDocumentDOM();
function documentIndex() {
    if (fw.documents.length == 1) {
        return 0;
    else if (fw.documents.length > 1) {
        var docTitle = dom.docTitleWithoutExtension;
        var filePath = dom.filePathForRevert;
        if ((docTitle != "") && (filePath != null)) {
            var i = 0;
            for (i = 0; i < fw.documents.length; i++) {
                if ((fw.documents[i].docTitleWithoutExtension == docTitle) && (fw.documents[i].filePathForRevert == filePath)) {
                    return i;
        else {
            var originalDefaultAlt = dom.defaultAltText;
            dom.defaultAltText = "toRtoiSeOfThEsLowAcoRnsRejoiCe";
            var i = 0;
            for (i = 0; i < fw.documents.length; i++) {
                if (fw.documents[i].defaultAltText == dom.defaultAltText) {
                    dom.defaultAltText = originalDefaultAlt;
                    return i;

[CS3] Used font in a document: Index?

Hello!
I am working with IUsedFontList and IPMFont.
But none is telling me the index of a font used in a document.
How can I get the index of a font used in a document?
Alois Blaimer

Changing fonts/font sizes in a scanned document requires a product like Acrobat to
convert the scanned image into text (OCR)
make actual changes to the text
With Adobe Reader you can use the Zoom function to enlarge the PDF document content.

Cannot open pages documents "index.xml file is missing"

Working on an iMac Mountain lion and using pages 3.0.3. Just today, none of my pages documents will open. I haven't stored anything in the cloud, I haven't changes anything, just working on the pages documents simple text.
The last thing I did with it was to print the file, as I have so often before. I have a MacBook with the same pages app, but running on Yosemite -- bought a new version of pages.
The MacBook will open .pages files stored on that hard drive, but files taken from the iMac will not open on the MacBook.
I'm in a tight. I have a vast number of documents as I've been using pages for years.

By default, Pages v5.5.2 on Yosemite saves out a Single File format document, which is a compressed renamed zip folder that the Finder allows us to believe is a document. When you attempt to open that document from within Pages '08, or Pages '09, you get the following dialog:
The Pages v5 generation documents do not use an internal index.xml file. Pages v5.5.2 also allows one to change that single file format document into a package file format document, which is not a compressed, renamed folder. When you open one of these package format documents in the older Pages applications, you now get a different dialog message. The index.xml file is still missing, but…
If you are using Pages '08 v3.0.3, or Pages '09 v4.3, there are no newer versions of these applications available. This is an Apple attempt to steer users to a Yosemite update, and then downgrade to the latest Pages applications that now require Yosemite. They could have just said, “we made a document format that is incompatible with older Pages applications.”

TREX: Preparation Failed in document index

Hi,
I have defined an index with one datasource and I get an error in 317 documents, while 204 documents are OK.
Error are due to 6401 code error (HTTP Status Code 401 : Unauthorized) but I don't know how to solve it. Everyone has full control to datasource (is a development environment).
Any suggestion?
Thanks and best regards!
Damian

Hi All,
When go this patha <b>SysAdmin>Monitoring>IndexingMonitor</b> iam getting the error.
Trex: Preparation failed: index operation.
Could anybody tell me what could be the reason...i had give host name in URL generator also.
<b>SysAdmin>SysConfig>KM--> Index administrator</b> i can able to see all Acive and Gree tick mark for all categeory..but indexing was not working for me...

The document "index.html" could not be opened.

Hi,
I'm having trouble opening iWeb. When i double click on the iweb icon it tries to open a particular file (index.html), but claims it can't open it and subsequently quits. This happens no matter if I try to open any other file with iWeb, even files that were originally created with iWeb. I can't find the iLife installation CD (i think because it came installed on the computer when i bought it). Any help would be appreciated
thanks!

iWeb stores your website data in a domain file located in Home Folder/Library/Application Support/iWeb.
Go look and see if you have one there. It should be the only file there - normally.
Double click it to launch iWeb

Problems in indexing MS word document. Please help!

Hi
I'm using oracle 8.1.6 on solaris 5.7
I stored a MS word document in a table as a internal blob.
The word document contains one line:
"This is test word document." Then I indexed it with inso_filter preference. I created a log file during indexing. The log file showed thatb there was no document indexed. Here was what I did:
===============================================================
--Create preference
exec CTX_DDL.drop_preference('MY_LEXER');
exec CTX_DDL.create_preference('MY_LEXER','BASIC_LEXER');
exec CTX_DDL.set_attribute('MY_LEXER','MIXED_CASE', 'NO');
exec CTX_DDL.set_attribute('MY_LEXER','INDEX_THEMES','NO');
exec CTX_DDL.set_attribute('MY_LEXER','INDEX_TEXT', 'YES');
exec ctx_ddl.Drop_Preference ('MY_FILTER');
exec ctx_ddl.Create_Preference ('MY_FILTER','INSO_FILTER');
exec ctx_ddl.drop_section_group ('MY_SECTION');
exec ctx_ddl.create_section_group ('MY_SECTION','NULL_SECTION_GROUP');
--Create table
drop table test;
create table test
(id number primary key,
text blob
--Initialize blob column with an empty blob
insert into test (id,text) values (1,empty_blob());
--Create an directory in which a word file (test.doc) exsits
create directory filedir as '/home/mydir';
--Insert the word file
DECLARE
lobd BLOB;
fils BFILE;
BEGIN
fils := BFILENAME('FILEDIR','test.doc');
SELECT text INTO lobd FROM test WHERE id = 1 FOR UPDATE;
dbms_lob.fileopen(fils, dbms_lob.file_readonly);
dbms_lob.loadfromfile(lobd, fils, dbms_lob.getlength(fils));
COMMIT;
dbms_lob.fileclose(fils);
END;
---Start logging
exec ctx_output.start_log('index.log');
---Create index with INSO_FILTER defined in preference
create index test_index on TEST(text) indextype is ctxsys.context
parameters ('lexer MY_LEXER filter MY_FILTER section group MY_SECTION memory 50M');
---Stop loggin
exec ctx_output.end_log;
=============================================================
The indes was created. And I open the index.log file. It is:
==============================================================
Oracle interMedia Text: Release 8.1.6.0.0 - Production on Tue Feb 19 16:22:50 2002
(c) Copyright 1999 Oracle Corporation. All rights reserved.
16:22:50 02/19/02 begin logging
16:23:48 02/19/02 populate index: CALLOB.TEST_INDEX
16:23:48 02/19/02 Begin document indexing
16:23:49 02/19/02 End of document indexing. 0 documents indexed.
16:24:06 02/19/02 log
16:24:06 02/19/02 logging halted
===============================================================
I did the query:
select token_text from dr$test_index$i;
no rows returned.
Could anyone tell me why this happened? An advices are appreciated.
Thansk,
George

Hi, Omar:
I tried use SQL*Loader to load the word document. Part of the loader logging reads as following:
Table TEST:
1 Row successfully loaded.
0 Rows not loaded due to data errors.
0 Rows not loaded because all WHEN clauses were failed.
0 Rows not loaded because all fields were null.
Space allocated for bind array: 6720 bytes(64 rows)
Space allocated for memory besides bind array: 0 bytes
Total logical records skipped: 0
Total logical records read: 1
Total logical records rejected: 0
Total logical records discarded: 0
================================================================
It seems that the file was sucessfully loaded into the database. Then I created index using the procedure I posted on this thread. I checked the table ctx_user_index_errors.
select * from ctx_user_index_errors;
the returns are:
ERR_INDEX_NAME ERR_TIMES
TEST_INDEX 20-FEB-02
ERR_TEXTKEY
AAAGtpABLAAAAAXAAA
ERR_TEXT
----------------------------------------------------------------DRG-11207: user filter command exited with status 137
What does this return tell?
Thanks.

Snow Leopard indexing - mail and documents.

Mail index is seriously broken and document index seems to be failing.
Is there a simple way to force Snow Leopard to rebuild all indexes?

bump - can anyone help with Snow Leopard indexing problem?

KM Document iView - index.html and main.css not properly displayed

Hello,
as a test we have put two files in the /documents repository in KM :
a) index.html
<head>
<link rel="stylesheet" type="text/css" href="./main.css"/>
</head>
<table width="92%" bgcolor="#FFFFFF">
<tr align="left" valign="top">
    <td> </td>
    <td colspan="5"><table width="100%" border="0" cellpadding="5" cellspacing="0">
        <tr valign="middle">
          <td width="85" bgcolor="#C7D9E9"> <p><b>Top Links</b></p></td>
          <td width="125" class="document-list"><a href="impax.html">IMPAX Client
            </a> </td>
          <td width="125" class="document-list"><a href="talkstation.html">TalkStation</a></td>
          <td width="125" class="document-list"><a href="ris.html">RIS</a></td>
          <td width="125" class="document-list"><a href="connectivity.html">Connectivity
            Manager</a></td>
          <td width="125" class="document-list"><a href="impax.html">IMPAX Server</a></td>
        </tr>
      </table></td>
</tr>
</table>
b) main.css
A:visited
    color: #264560
A:active
    color: #12212E
A:hover
    color: #14623D
A
    color: #336699
table
    margin-top: 0px;
    margin-bottom: 0px;
p
    color:#000000;
     font-family: Arial, Helvetica, sans-serif;
     margin-bottom: 0px;
    margin-top: 5px;
     font-size: 12px;
.document-list
    background-color:#C7D9E9;
    font-family: Arial, Helvetica, sans-serif;
    font-size: 12px;
    font-color: #000000
    margin-bottom:3px;
When going to Content Administration -> KM Content -> Documents and clicking the index.html file, the css file is taken into account, when i.e hovering over the IMPAX hyperlink, the path is http://<host>:<port>/irj/go/km/docs/documents/impax.html and the impax.html page is displayed when clicked.
However, when creating a KM Document iView (with or without content filter) pointing to /documents/index.html and displaying the iView, the style sheet is ignored, and the same hyperlink as above now refers to http://<host>:<port>/irj/servlet/prt/portal/prtroot/impax.html, which is incorrect.
-> How can this behaviour be explained?
-> When creating an URL iView pointing to /irj/go/km/docs/Agfa_Knowledgebase/index.html , everything works as expected.
Thanks for the help -

Hi,
You should correct the path to your css file in your index.html:
href="/irj/go/km/docs/documents/main.css"
Regards,
Praveen Gudapati

Ultrasearch doesn't index documents processed by remote crawler

Hi,
My Oracle9i 9.2 database is on Solaris. I have a remote crawler on Windows 2000. The remote crawler seems working fine. No error message is in log file. Every file has been processed. However, I can't query the documents processed by remote crawler. $ORACLE_HOME/ctx/log/ultrasearch_log reads:
Oracle Text, 9.2.0.1.0
15:24:45 07/12/02 begin logging
15:31:04 07/12/02 sync index: LING.WK$DOC_PATH_IDX
15:31:04 07/12/02 Begin document indexing
15:31:05 07/12/02 End of document indexing. 0 documents indexed.
The last part of the log file is:
=================== Crawling results ===================^M
Crawling started at 7/12/02 3:25 PM^M
Crawling stopped at 7/12/02 3:32 PM^M
Total crawling time = 0:6:31^M
^M
Total number of documents fetched = 179^M
Document fetch failures = 0^M
Document conversion failures = 0^M
Total number of unique documents indexed = 178^M
Total data collected = 1,975,751 bytes^M
Total number of non-indexable documents = 0^M
Average size of fetched document = 11,099 bytes^M
^M
Total indexing time = 0:0:0 for 1,975,751 bytes of data^M
Number of documents collected/indexed per hour = 1,638^M
^M
Number of times disk cache is full = 0^M
I have another crawler on the database host. It works fine. I can query the documents processed by this crawler.
Any idea?
Ling Niu

More Information about my question.
I used samba to share a directory /ling on Solaris to my Windows 2000, and map it to a drive on Windows NT, say, E:. This directory is used as both log and temp directory. I have a account on Solaris with same name/password as the my Widnows 2000 account. When the schedule is executing, I can see crawler create a directory inso_tmp on shared directory, and I believe it is used to filter. To my knowledge, after filtering, Oracle or crawler will copy the filtered file in inso_tmp to temp directory, which is /ling or E:\ in my case. But I failed to catch the temporary file which I know is transient files. I've given the write privilege on /ling to Oracle. I checked the table WK$DOC and WK$URL. In these tables, my temporary file's name is E:\****. If Oracle Server use this table to get the name of temporary file, it will fail because Oracle on Solaris doesn't know where is E:\. I can't get any log message to prove my guess. And if this is the case, it will be difficult to set up a remote crawler on Windows/Unix mixed enviroment, Right?
Any leads would be welcome,
Ling
Hi,
My Oracle9i 9.2 database is on Solaris. I have a remote crawler on Windows 2000. The remote crawler seems working fine. No error message is in log file. Every file has been processed. However, I can't query the documents processed by remote crawler. $ORACLE_HOME/ctx/log/ultrasearch_log reads:
Oracle Text, 9.2.0.1.0
15:24:45 07/12/02 begin logging
15:31:04 07/12/02 sync index: LING.WK$DOC_PATH_IDX
15:31:04 07/12/02 Begin document indexing
15:31:05 07/12/02 End of document indexing. 0 documents indexed.
The last part of the log file is:
=================== Crawling results ===================^M
Crawling started at 7/12/02 3:25 PM^M
Crawling stopped at 7/12/02 3:32 PM^M
Total crawling time = 0:6:31^M
^M
Total number of documents fetched = 179^M
Document fetch failures = 0^M
Document conversion failures = 0^M
Total number of unique documents indexed = 178^M
Total data collected = 1,975,751 bytes^M
Total number of non-indexable documents = 0^M
Average size of fetched document = 11,099 bytes^M
^M
Total indexing time = 0:0:0 for 1,975,751 bytes of data^M
Number of documents collected/indexed per hour = 1,638^M
^M
Number of times disk cache is full = 0^M
I have another crawler on the database host. It works fine. I can query the documents processed by this crawler.
Any idea?
Ling Niu

UltraSearch - Numbers of document discovered, fetched and indexed

I am using US 1.0.3.
- I have a table data source mapped to a table with the following characteristics:
> PK is a composite of three columns
> table has a total of 970 rows
> the column TITLE which is of varchar2 is specified as the content column
> Of the 971 rows, 82 rows have NULL in TITLE column.
> Of the 971 rows, only 196 rows have unique TITLE.
> There is no attribute mapping
- Here is the crawler summary:
Document discovered: 381
Document fetched: 381
Document indexed: 196
The rest are zeros.
My questions are:
(1) It seems US only indexes rows with unique value which explains why only 196 rows/documents are indexed. That is, rows with duplicate TITLE are not indexed. It seems to make sense. Is that correct?
(2) But why only 381 documents/rows are discovered and fetched? I would think it would discovered all the rows with NON-NULL value in the TITLE column: 889 (i.e. 971 82).
(3) In summary, how does US determine what rows to fetch and index?
Thanks!
C Cheung

Hi nyzonegirl,
Welcome to Numbers discussions.
Yvan is correct; nothing neither in iWork nor any Mac application will remove MS Office—Excel. If you were using Numbers there is a 30 trial for it, after 30 days it stops working or one purchases it. So it seems that the Excel work sheet you were using may have opened in Numbers not your Excel—a file association thingy.
Find on your HDD the Excel .XLS file, click on it once to highlight it. Now click File > Get Info, down the list it will read Open with:, change it to Excel (it may read Numbers). You'll have the choice to have all like files open in Excel as well.
Yes you're correct, Windows users won't be able to open Numbers files so if you decide to purchase iWork you'll need to do as Yvan suggested, Export to Excel.
As the need arises I use Excel, however, for my personal use 100% of the time I use Numbers. When I know Functions are the same I'll use Numbers then Export to Excel for Windows users.
Hope this helps you. Do let us know the outcome.
Sincerely,
RicD

Change the Index from documents to All

Hi all
I created an index in the index administration only for documents(Items to Index=Documents).
This was a long time ago...
Now we have the problem that our search engine only shows documents to this index.
OK its how it works.
But my question is: How can I change the "Items to index" from documents to All?
Is there a way, because its nearly impossible to delete the index a create a new one, because
we have a lot of documents indexed.
Thanks in advance
Steve

Hi Steve,
Can you try the following -
1. Create a new Index with the required properties (items to index set to "All") and select the same data source as done in the old index.
2. Provide schedule for the index.
3. Re-index it one time.
4. When everything is done then you can remove the old index and use the new one.
5. Modify your Search Options Set accordingly.
Note: There should be sufficient space in the TREX Server to accommodate both the indexes for some time.
Regards,
Sudip

Document Indexing

Similar Messages

Maybe you are looking for