Home » Server Options » Text & interMedia » PDF Document search & Indexing NOT WORKING
PDF Document search & Indexing NOT WORKING [message #244775] Thu, 14 June 2007 01:41 Go to next message
sachin sharma
Messages: 9
Registered: February 2004
Junior Member
I am using ORACLE 10g Rel 2 Express Edition i.e. Oracle XE.

I have a table structure like
Table = docs
doc_id NUMBER
document BLOB

We are using this table to store MULTILINGUAL PDF, MS-Excel, MS-Word documents.
I created an index as shown below using Oracle Text to search documents as per data in the documents.
Index:
CTX_DDL.CREATE_PREFERENCE('doc_search_lexer', 'WORLD_LEXER');

CREATE INDEX doc_search_binary_idx on docs(document)
INDEXTYPE is CTXSYS.CONTEXT
PARAMETERS ('LEXER doc_search_lexer
STOPLIST CTXSYS.EMPTY_STOPLIST');

I used the following two procedures to optimize & synchronize the Index:
CTX_DDL.SYNC_INDEX('doc_search_binary_idx');
CTX_DDL.OPTIMIZE_INDEX('doc_search_binary_idx', 'FULL');

The SELECT query that I use is:
SELECT * from docs
WHERE CONTAINS(document, 'search string') > 0;

To test the Index, I uploaded two exactly same documents in MS-WORD & PDF format, which were in Hindi (Devnagari script) & executed the optimize & synchronize procedures & then executed the SELECT SQL. The result set lists only the matching MS-word documents, PDF documents (which are exactly same w.r.t. data inj document) are not returned. The PDF documents attributes under "File --> Document Properties --> Font" are OK w.r.t. Oracle Text documentation.
I referred the table "DR$DOC_SEARCH_BINARY_IDX$I", the tokens are not being generated for PDF documents.
What is the way out to Index documents (PDF, MS-Excel, MS-Word) (MULTILINGUAL)???
Re: PDF Document search & Indexing NOT WORKING [message #245040 is a reply to message #244775] Fri, 15 June 2007 01:11 Go to previous messageGo to next message
Barbara Boehmer
Messages: 9077
Registered: November 2002
Location: California, USA
Senior Member
You might try adding auto_filter to your index parameters.
Re: PDF Document search & Indexing NOT WORKING [message #245514 is a reply to message #244775] Mon, 18 June 2007 00:01 Go to previous messageGo to next message
sachin sharma
Messages: 9
Registered: February 2004
Junior Member
I have already tried using "AUTO_FILTER" in my filter preference, but I am not getting the correct data.
Re: PDF Document search & Indexing NOT WORKING [message #570829 is a reply to message #245514] Fri, 16 November 2012 03:20 Go to previous messageGo to next message
acikus
Messages: 3
Registered: November 2012
Location: Belgrade
Junior Member
Maybe this?
GRANT RESOURCE, CONNECT, CTXAPP TO MYUSER;

GRANT EXECUTE ON CTXSYS.CTX_CLS TO myuser;
GRANT EXECUTE ON CTXSYS.CTX_DDL TO myuser;
GRANT EXECUTE ON CTXSYS.CTX_DOC TO myuser;
GRANT EXECUTE ON CTXSYS.CTX_OUTPUT TO myuser;
GRANT EXECUTE ON CTXSYS.CTX_QUERY TO myuser;
GRANT EXECUTE ON CTXSYS.CTX_REPORT TO myuser;
GRANT EXECUTE ON CTXSYS.CTX_THES TO myuser;
GRANT EXECUTE ON CTXSYS.CTX_ULEXER TO myuser;
Re: PDF Document search & Indexing NOT WORKING [message #570831 is a reply to message #570829] Fri, 16 November 2012 03:42 Go to previous message
Michel Cadot
Messages: 68624
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
Never grant predefined roles like CONNECT and RESOURCE, and above all NEVER grant RESOURCE.
Always create your own.

Regards
Michel
Previous Topic: Text search to eliminate "?" using Contains Clause
Next Topic: Re: Multi Column Datastore (spit from hijacked old thread)
Goto Forum:
  


Current Time: Thu Mar 28 07:54:05 CDT 2024