You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by markharw00d <ma...@yahoo.co.uk> on 2005/09/17 02:27:29 UTC

Lucene database bindings

I know there have been some posts discussing how to integrate Lucene 
with Derby recently.

I've added an example project that works with both HSQLDB and Derby 
here: http://issues.apache.org/jira/browse/LUCENE-434

The bindings allow you to use SQL that mixes database and Lucene 
functionality in ways like this:

    select top 10 lucene_score(id) as SCORE,
            lucene_highlight(adText) from ads
               where pricePounds <200 and pricePounds >1
               and lucene_query('"drum kit"',id)>0
            order by SCORE DESC, pricePounds ASC

See the readme.txt in the zip file for details.

Cheers,
Mark







		
___________________________________________________________ 
To help you stay safe and secure online, we've developed the all new Yahoo! Security Centre. http://uk.security.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Lucene database bindings

Posted by Mag Gam <ma...@gmail.com>.
Mark:

Thanks for looking at this.I will try it out!


On 9/16/05, markharw00d <ma...@yahoo.co.uk> wrote:
> 
> I know there have been some posts discussing how to integrate Lucene
> with Derby recently.
> 
> I've added an example project that works with both HSQLDB and Derby
> here: http://issues.apache.org/jira/browse/LUCENE-434
> 
> The bindings allow you to use SQL that mixes database and Lucene
> functionality in ways like this:
> 
> select top 10 lucene_score(id) as SCORE,
> lucene_highlight(adText) from ads
> where pricePounds <200 and pricePounds >1
> and lucene_query('"drum kit"',id)>0
> order by SCORE DESC, pricePounds ASC
> 
> See the readme.txt in the zip file for details.
> 
> Cheers,
> Mark
> 
> 
> 
> 
> 
> 
> 
> 
> ___________________________________________________________
> To help you stay safe and secure online, we've developed the all new 
> Yahoo! Security Centre. http://uk.security.yahoo.com
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
>

Re: Lucene database bindings

Posted by mark harwood <ma...@yahoo.co.uk>.
>>does it deal w/ aggregate functions and group by 
>> clauses?

Yes, it is basically *all* the normal SQL
functionality but with the added option to mix in
scores from lucene queries to the criteria.

>From the example code:

select top 10 count(*) as numAds,pricePounds  from ads
where pricePounds <500 and lucene_query('table',id)>0
group by pricePounds order by numAds desc

This returns the top 10 most common prices for a table
(as in kitchen table, not SQL table). The database has
classified ad descriptions and prices so there's not
much meaningful to group on. A "category" column 
would be a better example for grouping but there isn't
one in the example data.


Cheers,
Mark



		
___________________________________________________________ 
To help you stay safe and secure online, we've developed the all new Yahoo! Security Centre. http://uk.security.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Lucene database bindings

Posted by Ray Tsang <sa...@gmail.com>.
I must admit that I have not downloaded the source yet.  But a quick
question, does it deal w/ aggregate functions and group by clauses?
Thanks!

Ray,

On 9/17/05, markharw00d <ma...@yahoo.co.uk> wrote:
>  >>Basically your lucene_query function will return a true/false in one
> of the query predicates for each record.
> 
> Almost, it returns a score  - much more useful than just a boolean and
> the key difference between a search engine and a database (partial
> matching with relevance ranked scores). These can be used to sort
> results by relevance.
> 
> 
> 
> ___________________________________________________________
> To help you stay safe and secure online, we've developed the all new Yahoo! Security Centre. http://uk.security.yahoo.com
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
>

Re: Lucene database bindings

Posted by markharw00d <ma...@yahoo.co.uk>.
 >>Basically your lucene_query function will return a true/false in one 
of the query predicates for each record.

Almost, it returns a score  - much more useful than just a boolean and 
the key difference between a search engine and a database (partial 
matching with relevance ranked scores). These can be used to sort 
results by relevance.


		
___________________________________________________________ 
To help you stay safe and secure online, we've developed the all new Yahoo! Security Centre. http://uk.security.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Lucene database bindings

Posted by Mag Gam <ma...@gmail.com>.
Mark:

VERY VERY good post! Please publish this doc and example. 

On 9/17/05, Chris Lu <ch...@gmail.com> wrote:
> 
> On 9/17/05, markharw00d <ma...@yahoo.co.uk> wrote:
> > Mag Gam wrote:
> >
> > >Does your example store the index in the derby db or somewhere else? I 
> was
> > >thinking of indexing a table in a seperate column.
> > >
> > >
> > The software is not an org.apache.lucene.store.Directory implementation
> > ie an FSDirectory alternative for persisting Lucene data in a relational
> > table.
> > Instead, the software demonstrates a way to extend SQL syntax to allow
> > Lucene queries to run as in-line functions during the database's
> > execution of queries. These hybrid SQL statements can take advantage of
> > the usual databases functions for sorting, grouping joins, conditions,
> > indexes etc but also use Lucene queries and highlighting functions all
> > in the one SQL statement.
> > The Lucene indexes used as part of this can be any standard Directory
> > implementation (eg RAM, FS).
> >
> > The motivation for creating a Lucene/RDBMS hybrid query tool was to
> > address issues commonly associated with using just Lucene:
> > 1) Sorting on float/date fields and associated memory consumption
> > 2) Representing numbers/dates in Lucene (eg having to pad with sufficent
> > leading zeros and add
> > to index's list of terms)
> > 3) Retrieving only certain stored fields from a document (all storage
> > can be done in db)
> > 4) Issues to do with updating *volatile* data eg price data used in 
> sorts
> > 5) Manually coding joins with RDBMS content as custom filters
> > 6) Too-many terms exceptions produced by range queries
> > 7) Grouping results eg by website
> > 8) Boosting docs based on stored content eg date
> >
> > These are the sorts of things an RDBMS can help with.
> >
> > Cheers
> > Mark
> >
> 
> Mark,
> 
> This is really good stuff!
> I have been thinking about it for a long while.
> Thank you for showing us the door!
> 
> Basically your lucene_query function will return a true/false in one
> of the query predicates for each record.
> This will be very useful when other query predicates can filter out a
> lot of records.
> 
> Is there any hint to give DB to use the lucene_query function last?
> 
> Chris Lu
> ------------------------
> Lucene RAD on Any Databases
> http://www.dbsight.net
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
>

Re: Lucene database bindings

Posted by Chris Lu <ch...@gmail.com>.
On 9/17/05, markharw00d <ma...@yahoo.co.uk> wrote:
> Mag Gam wrote:
> 
> >Does your example store the index in the derby db or somewhere else? I was
> >thinking of indexing a table in a seperate column.
> >
> >
> The software is not an org.apache.lucene.store.Directory implementation
> ie an FSDirectory alternative for persisting Lucene data in a relational
> table.
> Instead, the software demonstrates a way to extend SQL syntax to allow
> Lucene queries to run as in-line functions during the database's
> execution of queries. These hybrid SQL statements can take advantage of
> the usual databases functions for sorting, grouping joins, conditions,
> indexes etc but also use Lucene queries and highlighting functions all
> in the one SQL statement.
> The Lucene indexes used as part of this can be any standard Directory
> implementation (eg RAM, FS).
> 
> The motivation for creating a Lucene/RDBMS hybrid query tool was to
> address issues commonly associated with using just Lucene:
> 1) Sorting on float/date fields and associated memory consumption
> 2) Representing numbers/dates in Lucene (eg having to pad with sufficent
> leading zeros and add
> to index's list of terms)
> 3) Retrieving only certain stored fields from a document (all storage
> can be done in db)
> 4) Issues to do with updating *volatile* data eg price data used in sorts
> 5) Manually coding joins with RDBMS content as custom filters
> 6) Too-many terms exceptions produced by range queries
> 7) Grouping results eg by website
> 8) Boosting docs based on stored content eg date
> 
> These are the sorts of things an RDBMS can help with.
> 
> Cheers
> Mark
> 

Mark,

This is really good stuff! 
I have been thinking about it for a long while.
Thank you for showing us the door!

Basically your lucene_query function will return a true/false in one
of the query predicates for each record.
This will be very useful when other query predicates can filter out a
lot of records.

Is there any hint to give DB to use the lucene_query function last?

Chris Lu
------------------------
Lucene RAD on Any Databases
http://www.dbsight.net

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Lucene database bindings

Posted by markharw00d <ma...@yahoo.co.uk>.
Mag Gam wrote:

>Does your example store the index in the derby db or somewhere else? I was 
>thinking of indexing a table in a seperate column.
>  
>
The software is not an org.apache.lucene.store.Directory implementation 
ie an FSDirectory alternative for persisting Lucene data in a relational 
table.
Instead, the software demonstrates a way to extend SQL syntax to allow 
Lucene queries to run as in-line functions during the database's 
execution of queries. These hybrid SQL statements can take advantage of 
the usual databases functions for sorting, grouping joins, conditions, 
indexes etc but also use Lucene queries and highlighting functions all 
in the one SQL statement.
The Lucene indexes used as part of this can be any standard Directory 
implementation (eg RAM, FS).

The motivation for creating a Lucene/RDBMS hybrid query tool was to 
address issues commonly associated with using just Lucene:
1) Sorting on float/date fields and associated memory consumption
2) Representing numbers/dates in Lucene (eg having to pad with sufficent 
leading zeros and add
to index's list of terms)
3) Retrieving only certain stored fields from a document (all storage 
can be done in db)
4) Issues to do with updating *volatile* data eg price data used in sorts
5) Manually coding joins with RDBMS content as custom filters
6) Too-many terms exceptions produced by range queries
7) Grouping results eg by website
8) Boosting docs based on stored content eg date

These are the sorts of things an RDBMS can help with.

Cheers
Mark


		
___________________________________________________________ 
How much free photo storage do you get? Store your holiday 
snaps for FREE with Yahoo! Photos http://uk.photos.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Lucene database bindings

Posted by Mag Gam <ma...@gmail.com>.
Does your example store the index in the derby db or somewhere else? I was 
thinking of indexing a table in a seperate column.



On 9/16/05, markharw00d <ma...@yahoo.co.uk> wrote:
> 
> I know there have been some posts discussing how to integrate Lucene
> with Derby recently.
> 
> I've added an example project that works with both HSQLDB and Derby
> here: http://issues.apache.org/jira/browse/LUCENE-434
> 
> The bindings allow you to use SQL that mixes database and Lucene
> functionality in ways like this:
> 
> select top 10 lucene_score(id) as SCORE,
> lucene_highlight(adText) from ads
> where pricePounds <200 and pricePounds >1
> and lucene_query('"drum kit"',id)>0
> order by SCORE DESC, pricePounds ASC
> 
> See the readme.txt in the zip file for details.
> 
> Cheers,
> Mark
> 
> 
> 
> 
> 
> 
> 
> 
> ___________________________________________________________
> To help you stay safe and secure online, we've developed the all new 
> Yahoo! Security Centre. http://uk.security.yahoo.com
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
>