You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Inderjeet Kalra <In...@headstrong.com> on 2006/12/01 08:17:49 UTC

Full text searching on documents saved in database as BLOB

Hi,

I have a query related to the full text searching on documents saved in
database as BLOB. In our application, we are planning to save our
documents in the database as BLOB and we have a requirement of searching
a document on it's meta data and the content of the document i.e. search
within document. 

 

Will Lucene serve our purpose, I have gone through the Lucene FAQs and
various issues on the site but I am unable to find any references to the
documents saved in the database as BLOB.

 

1. How can I search on the metadata of the document saved in the
database as BLOB

2. How can I search on the content of the document i.e. search within
the document saved in the database as BLOB

 

If anyone has implemented the same functionality, can you suggest me any
sample code so that i can start using lucene for the same.

 

There is another search engine i.e. Oracle Text search engine which also
does the same but which one is better ?

 

Thanks & Regards

Inderjeet

 



 
***The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review,retransmission,dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any computer.***

Re: Full text searching on documents saved in database as BLOB

Posted by Erick Erickson <er...@gmail.com>.
See this thread for an interesting Oracle-based solution. I admit I just
skimmed it, so it may not really apply....
*Oracle and Lucene Integration*

In general, you can't reach into an Oracle BLOB with lucene to search it
because you haven't created a Lucene index. One approach is to index the
BLOB data in the DB into a Lucene index, along with enough relevant metadata
to satisfy your needs. Include a unique DB id for each doc in your
index(es). Search the Lucene index then pull the remainder of the data from
the DB based upon the DB id you stored.

Search this mail archive for "database" and you'll see extensive discussions
of this topic, and many interesting ideas....

Be warned: Lucene does not support SQL-like constructs. You should use
lucene for efficient searching. If you have a need for dealing with
relations between documents, you'll need the capabilities of a DB and/or
clever indexing in Lucene. You can always use a hybrid solution....

Best
Erick

On 12/1/06, Inderjeet Kalra <In...@headstrong.com> wrote:
>
> Hi,
>
> I have a query related to the full text searching on documents saved in
> database as BLOB. In our application, we are planning to save our
> documents in the database as BLOB and we have a requirement of searching
> a document on it's meta data and the content of the document i.e. search
> within document.
>
>
>
> Will Lucene serve our purpose, I have gone through the Lucene FAQs and
> various issues on the site but I am unable to find any references to the
> documents saved in the database as BLOB.
>
>
>
> 1. How can I search on the metadata of the document saved in the
> database as BLOB
>
> 2. How can I search on the content of the document i.e. search within
> the document saved in the database as BLOB
>
>
>
> If anyone has implemented the same functionality, can you suggest me any
> sample code so that i can start using lucene for the same.
>
>
>
> There is another search engine i.e. Oracle Text search engine which also
> does the same but which one is better ?
>
>
>
> Thanks & Regards
>
> Inderjeet
>
>
>
>
>
>
> ***The information transmitted is intended only for the person or entity
> to which it is addressed and may contain confidential and/or privileged
> material. Any review,retransmission,dissemination or other use of, or taking
> of any action in reliance upon, this information by persons or entities
> other than the intended recipient is prohibited. If you received this in
> error, please contact the sender and delete the material from any
> computer.***
>
>

Re: Full text searching on documents saved in database as BLOB

Posted by Marcelo Ochoa <ma...@gmail.com>.
Hi Inderjeet:
  I am working in a full text searching implementation for Oracle
Databases running Lucene on the Oracle JVM.
  The text searching functionality is ready yet, you can get latest
code uploaded on Tuesday, see the attachment text for the detail of
the new functionality included:
http://issues.apache.org/jira/browse/LUCENE-724?page=all
  Now, I am optimizing the Oracle Lucene Domain index, this new index
is implemented using Oracle Data Cartridge API and can be used as the
CTX_SYS.CONTEXT, including the SQL operators contains(column,'lucene
query string',correlation) and score(correlation) operators.
  The benefits of running Lucene inside the Oracle JVM is that all the
access to the data-sources which you index (now columns of type
VARCHAR2, CLOB, and XMLType) are locally, no network round trip, no
data marshalling and the cursors are in the same memory space.
  Also the Oracle JVM can be used as enterprise container for running
Lucene, because the Oracle JVM can be scaled up to thousand of
concurrent sessions.
   The Oracle OJVMDirectory classes replace the file system by BLOBs
as the storage for the  Lucene inverted index, so they inherits all
the benefits of the database storage such as full transactional
support, export and imports, recovery, auditing, and others.
   Best regards, Marcelo.

On 12/1/06, Inderjeet Kalra <In...@headstrong.com> wrote:
> Hi,
>
> I have a query related to the full text searching on documents saved in
> database as BLOB. In our application, we are planning to save our
> documents in the database as BLOB and we have a requirement of searching
> a document on it's meta data and the content of the document i.e. search
> within document.
>
>
>
> Will Lucene serve our purpose, I have gone through the Lucene FAQs and
> various issues on the site but I am unable to find any references to the
> documents saved in the database as BLOB.
>
>
>
> 1. How can I search on the metadata of the document saved in the
> database as BLOB
>
> 2. How can I search on the content of the document i.e. search within
> the document saved in the database as BLOB
>
>
>
> If anyone has implemented the same functionality, can you suggest me any
> sample code so that i can start using lucene for the same.
>
>
>
> There is another search engine i.e. Oracle Text search engine which also
> does the same but which one is better ?
>
>
>
> Thanks & Regards
>
> Inderjeet
>
>
>
>
>
>
> ***The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review,retransmission,dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any computer.***
>
>


-- 
Marcelo F. Ochoa
http://marcelo.ochoa.googlepages.com/home
______________
Do you Know DBPrism? Look @ DB Prism's Web Site
http://www.dbprism.com.ar/index.html
More info?
Chapter 17 of the book "Programming the Oracle Database using Java &
Web Services"
http://www.amazon.com/gp/product/1555583296/
Chapter 21 of the book "Professional XML Databases" - Wrox Press
http://www.amazon.com/gp/product/1861003587/
Chapter 8 of the book "Oracle & Open Source" - O'Reilly
http://www.oreilly.com/catalog/oracleopen/

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Full text searching on documents saved in database as BLOB

Posted by Chris Lu <ch...@gmail.com>.
Lucene can work for this case, in case you can extract data out in
plain text format. So whether your data is in BLOB or on file disk
does not really matter, but you need to detect or tell what type of
BLOB content it is, either by filename or binary format.

Usually when storing BLOB content, there will be another column
storing the filename. Based on the filename extension, you can map to
a corresponding extracting API, for pdf, word doc, etc. The database
search software I am working on, DBSight, uses this method.

-- 
Chris Lu
-------------------------
Instant Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com


On 11/30/06, Inderjeet Kalra <In...@headstrong.com> wrote:
> Hi,
>
> I have a query related to the full text searching on documents saved in
> database as BLOB. In our application, we are planning to save our
> documents in the database as BLOB and we have a requirement of searching
> a document on it's meta data and the content of the document i.e. search
> within document.
>
>
>
> Will Lucene serve our purpose, I have gone through the Lucene FAQs and
> various issues on the site but I am unable to find any references to the
> documents saved in the database as BLOB.
>
>
>
> 1. How can I search on the metadata of the document saved in the
> database as BLOB
>
> 2. How can I search on the content of the document i.e. search within
> the document saved in the database as BLOB
>
>
>
> If anyone has implemented the same functionality, can you suggest me any
> sample code so that i can start using lucene for the same.
>
>
>
> There is another search engine i.e. Oracle Text search engine which also
> does the same but which one is better ?
>
>
>
> Thanks & Regards
>
> Inderjeet
>
>
>
>
>
>
> ***The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review,retransmission,dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any computer.***
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org