You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-user@db.apache.org by Mag Gam <ma...@gmail.com> on 2005/07/24 22:44:15 UTC

Full Text Indexing

Is it possible to have searchable data type with indexed access in
Cloudscape/Derby?

I am looking for something similar to tsearch2
(http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/)

Re: Full Text Indexing

Posted by Mag Gam <ma...@gmail.com>.
Thanks for the replies Rick and Dan!

Dan:
I was not able to get to  lucene's website. 


Rick: 
Don't know if this will help you out, but we can always use tsearch2's
code if possible. The code is in BSD license. I guess we can contact
the authors to port something like this for Java, but they are harcore
postgresql people

I will try to do more research on this.


On 7/25/05, Daniel John Debrunner <dj...@debrunners.com> wrote:
> Rick Hillegas wrote:
> > Hi Mag,
> >
> > Thanks for bringing up this issue. It certainly deserves an enhancement
> > request. I'll log one later on today.
> 
> 
> Going forward integrating Lucene (http://lucene.apache.org/) with Derby
> would be the logical choice for text indexing. I once hacked up
> something similar using a different text search library and Cloudscape's
> VTIs.
> 
> Dan.
> 
>

Re: Full Text Indexing

Posted by Mag Gam <ma...@gmail.com>.
A solution for this may be another DBMS

http://www.daffodildb.com/onedollardb-roadmap.html

They seem to have FTI :-(


On 9/10/05, Mag Gam <ma...@gmail.com> wrote:
> 
> Has anyone found a solution for this? I have been looking for a month, and 
> can't seem to find a good full text indexing solution similar to Tsearch2
> 
> 
> On 7/27/05, Mag Gam <ma...@gmail.com> wrote:
> > 
> > Okay. I will be asking the people of Lucene-User. I will let you guys
> > know what they have to say.
> > 
> > Thanks for the leadway!
> > 
> > 
> > On 7/26/05, Daniel Noll <daniel@nuix.com.au > wrote:
> > > Daniel John Debrunner wrote:
> > >
> > > >Rick Hillegas wrote:
> > > >
> > > >
> > > >>Hi Mag,
> > > >>
> > > >>Thanks for bringing up this issue. It certainly deserves an 
> > enhancement 
> > > >>request. I'll log one later on today.
> > > >>
> > > >>
> > > >Going forward integrating Lucene (http://lucene.apache.org/) with 
> > Derby
> > > >would be the logical choice for text indexing. I once hacked up 
> > > >something similar using a different text search library and 
> > Cloudscape's
> > > >VTIs.
> > > >
> > > >
> > > For work, I maintain an application which needs fast text searching as
> > > well as database queries for the structured content. We're using 
> > Lucene 
> > > and Derby for exactly these two purposes, and need to tie the two
> > > together with IDs stored in both so that we can effectively join the 
> > two
> > > together.
> > >
> > > And so, whenever some suggestion like integrating the two comes up, I 
> > > just have to say... Yes, please!
> > >
> > > Daniel
> > >
> > > --
> > > Daniel Noll
> > >
> > > NUIX Pty Ltd
> > > Level 8, 143 York Street, Sydney 2000
> > > Phone: (02) 9283 9010
> > > Fax: (02) 9283 9020 
> > >
> > > This message is intended only for the named recipient. If you are not
> > > the intended recipient you are notified that disclosing, copying,
> > > distributing or taking any action in reliance on the contents of this 
> > > message or attachment is strictly prohibited.
> > >
> > >
> > 
> 
>

Re: Full Text Indexing

Posted by Mag Gam <ma...@gmail.com>.
Has anyone found a solution for this? I have been looking for a month, and 
can't seem to find a good full text indexing solution similar to Tsearch2


On 7/27/05, Mag Gam <ma...@gmail.com> wrote:
> 
> Okay. I will be asking the people of Lucene-User. I will let you guys
> know what they have to say.
> 
> Thanks for the leadway!
> 
> 
> On 7/26/05, Daniel Noll <da...@nuix.com.au> wrote:
> > Daniel John Debrunner wrote:
> >
> > >Rick Hillegas wrote:
> > >
> > >
> > >>Hi Mag,
> > >>
> > >>Thanks for bringing up this issue. It certainly deserves an 
> enhancement
> > >>request. I'll log one later on today.
> > >>
> > >>
> > >Going forward integrating Lucene (http://lucene.apache.org/) with Derby
> > >would be the logical choice for text indexing. I once hacked up
> > >something similar using a different text search library and 
> Cloudscape's
> > >VTIs.
> > >
> > >
> > For work, I maintain an application which needs fast text searching as
> > well as database queries for the structured content. We're using Lucene
> > and Derby for exactly these two purposes, and need to tie the two
> > together with IDs stored in both so that we can effectively join the two
> > together.
> >
> > And so, whenever some suggestion like integrating the two comes up, I
> > just have to say... Yes, please!
> >
> > Daniel
> >
> > --
> > Daniel Noll
> >
> > NUIX Pty Ltd
> > Level 8, 143 York Street, Sydney 2000
> > Phone: (02) 9283 9010
> > Fax: (02) 9283 9020
> >
> > This message is intended only for the named recipient. If you are not
> > the intended recipient you are notified that disclosing, copying,
> > distributing or taking any action in reliance on the contents of this
> > message or attachment is strictly prohibited.
> >
> >
>

Re: Full Text Indexing

Posted by Mag Gam <ma...@gmail.com>.
Okay. I will be asking the people of Lucene-User. I will let you guys
know what they have to say.

Thanks for the leadway!


On 7/26/05, Daniel Noll <da...@nuix.com.au> wrote:
> Daniel John Debrunner wrote:
> 
> >Rick Hillegas wrote:
> >
> >
> >>Hi Mag,
> >>
> >>Thanks for bringing up this issue. It certainly deserves an enhancement
> >>request. I'll log one later on today.
> >>
> >>
> >Going forward integrating Lucene (http://lucene.apache.org/) with Derby
> >would be the logical choice for text indexing. I once hacked up
> >something similar using a different text search library and Cloudscape's
> >VTIs.
> >
> >
> For work, I maintain an application which needs fast text searching as
> well as database queries for the structured content.  We're using Lucene
> and Derby for exactly these two purposes, and need to tie the two
> together with IDs stored in both so that we can effectively join the two
> together.
> 
> And so, whenever some suggestion like integrating the two comes up, I
> just have to say...  Yes, please!
> 
> Daniel
> 
> --
> Daniel Noll
> 
> NUIX Pty Ltd
> Level 8, 143 York Street, Sydney 2000
> Phone: (02) 9283 9010
> Fax:   (02) 9283 9020
> 
> This message is intended only for the named recipient. If you are not
> the intended recipient you are notified that disclosing, copying,
> distributing or taking any action in reliance on the contents of this
> message or attachment is strictly prohibited.
> 
>

Re: Full Text Indexing

Posted by Daniel Noll <da...@nuix.com.au>.
Daniel John Debrunner wrote:

>Rick Hillegas wrote:
>  
>
>>Hi Mag,
>>
>>Thanks for bringing up this issue. It certainly deserves an enhancement
>>request. I'll log one later on today.
>>    
>>
>Going forward integrating Lucene (http://lucene.apache.org/) with Derby
>would be the logical choice for text indexing. I once hacked up
>something similar using a different text search library and Cloudscape's
>VTIs.
>  
>
For work, I maintain an application which needs fast text searching as 
well as database queries for the structured content.  We're using Lucene 
and Derby for exactly these two purposes, and need to tie the two 
together with IDs stored in both so that we can effectively join the two 
together.

And so, whenever some suggestion like integrating the two comes up, I 
just have to say...  Yes, please!

Daniel

-- 
Daniel Noll

NUIX Pty Ltd
Level 8, 143 York Street, Sydney 2000
Phone: (02) 9283 9010
Fax:   (02) 9283 9020

This message is intended only for the named recipient. If you are not
the intended recipient you are notified that disclosing, copying,
distributing or taking any action in reliance on the contents of this
message or attachment is strictly prohibited.


Re: Full Text Indexing

Posted by Daniel John Debrunner <dj...@debrunners.com>.
Rick Hillegas wrote:
> Hi Mag,
> 
> Thanks for bringing up this issue. It certainly deserves an enhancement
> request. I'll log one later on today.


Going forward integrating Lucene (http://lucene.apache.org/) with Derby
would be the logical choice for text indexing. I once hacked up
something similar using a different text search library and Cloudscape's
VTIs.

Dan.


Re: Full Text Indexing

Posted by Rick Hillegas <Ri...@Sun.COM>.
Hi Mag,

Thanks for bringing up this issue. It certainly deserves an enhancement 
request. I'll log one later on today.

Technically, you can build your own poor-man's solution to this problem 
today:

1) You can build your own inverted index (as a full-fledged table) on 
the text column you need to search.

2) You can maintain rows in that pseudo-index by hiding all 
insert/update/delete access (to the base table) inside some set of 
application-layer methods.

3) Similarly, you can manage searches by an application-layer query 
generator which knows how to build joins between the base table and the 
inverted index.

A couple years ago I tried something like this with the free-license 
version of Cloudscape which IBM exposed. The performance disappointed 
me. I didn't look deeply into the performance issue. Instead, I tried 
another solution, which outperformed the inverted index: Full table 
scans on the base (text bearing) table supported by a search function 
evaluated, per query, on each text column. To date this hack has 
performed adequately on a dataset of 12K rows. However, it would be 
pretty slow on datasets an order of magnitude larger.

So...we need to log an enhancment request for text search support.

Regards,
-Rick



Mag Gam wrote:

>Is it possible to have searchable data type with indexed access in
>Cloudscape/Derby?
>
>I am looking for something similar to tsearch2
>(http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/)
>  
>



Re: Full Text Indexing

Posted by Rick Hillegas <Ri...@Sun.COM>.
I have moved the discussion about Full Text Indexing onto a wiki page:

  http://wiki.apache.org/db-derby/LuceneIntegration

Right now this page lists some features and use cases which we might 
want this integration to support. Please feel free to add your own 
feature requests and use cases. After we have collected enough feedback, 
we can use this page to propose phased support for full text search.

Hi Mag Gam,

I would especially appreciate your feedback. I suspect I haven't 
captured your features and use cases yet.

Thanks,
-Rick