You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Charles Patridge <ch...@prodigy.net> on 2007/05/15 17:11:16 UTC
Concept Search
I have looked around on Lucene web site as well as some documentation
but have not found anything to do with Concept Search.
My definition of Concept Search is as follows:
1. I would have a file (list) of various phrases / N-grams which I
would like to Lucene to use as a search basis without having to type in
all these phrases manually, and have Lucene return the results as it
would normally if a single search query was entered.
2. An example would be - find !Wild_Animals! - where the "!" would
indicate that this is a search that would use a file (ie
Wild_Animals.txt) and read in the various phrases within this file and perform the
search in the corpus for these phrases.
3. The contents of Wild_Animals.txt could look like this:
BUFFALO
BEAR
MOOSE
COYOTE
WOLF
MOUNTAIN GOAT
MOUNTAIN SHEEP
DALL SHEEP
DEER
KODIAK BEAR
BROWN BEAR
BLACK BEAR
etc etc etc
4. Is my idea of a Concept Search feasible / doable??? If so, can
you point me to any documentation that exists whereby this could be done
within Lucene
Please send any info you have on this to me -
Charles_S_Patridge@prodigy.net
Thank you in advance for your time and efforts.
Charles S Patridge - PDPC, Ltd.
172 Monce Road - Burlington, CT 06013 USA
Email: Charles_S_Patridge@prodigy.net
Web: http://www.sconsig.com
Web: http://pages.prodigy.net/charles_s_patridge
Web: http://www.munic.state.ct.us/burlington
Re: Concept Search
Posted by Daniel Noll <da...@nuix.com>.
On Thursday 17 May 2007 09:50:55 Erick Erickson wrote:
> I thought that that's the point in the Synonym injection
> example, setting Term.setPositionIncrement(0) for the injected
> token(s). That way. phrase queries work since all of the
> injected tokens share the same offset....
>
> But I've been wrong before.
Ah, I see. A feature I haven't toyed with just yet.
That's rather nice. :-)
Daniel
--
Daniel Noll
Nuix Pty Ltd
Suite 79, 89 Jones St, Ultimo NSW 2007, Australia Ph: +61 2 9280 0699
Web: http://nuix.com/ Fax: +61 2 9212 6902
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Concept Search
Posted by Erick Erickson <er...@gmail.com>.
<<What *would* be tricky is phrase queries since inserting a new term breaks
the
offsets AFAIK.>>
I thought that that's the point in the Synonym injection
example, setting Term.setPositionIncrement(0) for the injected
token(s). That way. phrase queries work since all of the
injected tokens share the same offset....
But I've been wrong before.
Erick
On 5/16/07, Daniel Noll <da...@nuix.com> wrote:
>
> On Wednesday 16 May 2007 23:50:55 Erick Erickson wrote:
> > That's interesting. I suppose you could add the "synonym" of
> > WildAnimals$ whenever you encountered any of the items in your
> > list, then when concept searching is called for, search on
> > WildAnimals$.
> >
> > Highlighting might be tricky, but certainly do-able, especially with
> > the capabilities of a MemoryIndex......
>
> I'm not even convinced it would be tricky. I'm fairly sure that if the
> token
> stream returns two terms over the same span, that they would have the same
> start offset and end offset and highlight identically.
>
> What *would* be tricky is phrase queries since inserting a new term breaks
> the
> offsets AFAIK.
>
> Although, I suppose you could always store the concepts in a different
> field
> and not modify the analyser being used for the text itself.
>
> Daniel
>
>
>
> --
> Daniel Noll
> Nuix Pty Ltd
> Suite 79, 89 Jones St, Ultimo NSW 2007, Australia Ph: +61 2 9280 0699
> Web: http://nuix.com/ Fax: +61 2 9212 6902
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
Re: Concept Search
Posted by Daniel Noll <da...@nuix.com>.
On Wednesday 16 May 2007 23:50:55 Erick Erickson wrote:
> That's interesting. I suppose you could add the "synonym" of
> WildAnimals$ whenever you encountered any of the items in your
> list, then when concept searching is called for, search on
> WildAnimals$.
>
> Highlighting might be tricky, but certainly do-able, especially with
> the capabilities of a MemoryIndex......
I'm not even convinced it would be tricky. I'm fairly sure that if the token
stream returns two terms over the same span, that they would have the same
start offset and end offset and highlight identically.
What *would* be tricky is phrase queries since inserting a new term breaks the
offsets AFAIK.
Although, I suppose you could always store the concepts in a different field
and not modify the analyser being used for the text itself.
Daniel
--
Daniel Noll
Nuix Pty Ltd
Suite 79, 89 Jones St, Ultimo NSW 2007, Australia Ph: +61 2 9280 0699
Web: http://nuix.com/ Fax: +61 2 9212 6902
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Concept Search
Posted by Steven Rowe <sa...@syr.edu>.
That's not precisely what I was imagining, although it does sound viable
- I was thinking of using standard indexing, and then generating concept
instantiations ("synonyms") at query time. - Steve
Erick Erickson wrote:
> That's interesting. I suppose you could add the "synonym" of
> WildAnimals$ whenever you encountered any of the items in your
> list, then when concept searching is called for, search on
> WildAnimals$.
>
> Highlighting might be tricky, but certainly do-able, especially with
> the capabilities of a MemoryIndex......
>
> Erick
>
> On 5/16/07, Steven Rowe <sa...@syr.edu> wrote:
>>
>> Hi Charles,
>>
>> The need presented by your use case sounds very similar to that served
>> by the SynonymAnalyzer given in Erik Hatcher's and Otis Gospodnetic's
>> excellent book "Lucene in Action" - take a look:
>>
>> http://lucenebook.com/
>>
>> Steve
>>
>> Charles Patridge wrote:
>> > I have looked around on Lucene web site as well as some documentation
>> > but have not found anything to do with Concept Search.
>> >
>> > My definition of Concept Search is as follows:
>> >
>> > 1. I would have a file (list) of various phrases / N-grams which I
>> > would like to Lucene to use as a search basis without having to type in
>> > all these phrases manually, and have Lucene return the results as it
>> > would normally if a single search query was entered.
>> >
>> > 2. An example would be - find !Wild_Animals! - where the "!" would
>> > indicate that this is a search that would use a file (ie
>> > Wild_Animals.txt) and read in the various phrases within this file and
>> perform the
>> > search in the corpus for these phrases.
>> >
>> > 3. The contents of Wild_Animals.txt could look like this:
>> > BUFFALO
>> > BEAR
>> > MOOSE
>> > COYOTE
>> > WOLF
>> > MOUNTAIN GOAT
>> > MOUNTAIN SHEEP
>> > DALL SHEEP
>> > DEER
>> > KODIAK BEAR
>> > BROWN BEAR
>> > BLACK BEAR
>> > etc etc etc
>> >
>> > 4. Is my idea of a Concept Search feasible / doable??? If so, can
>> > you point me to any documentation that exists whereby this could be
>> done
>> > within Lucene
>> >
>> > Please send any info you have on this to me -
>> > Charles_S_Patridge@prodigy.net
>> >
>> > Thank you in advance for your time and efforts.
>> >
>> >
>> >
>> > Charles S Patridge - PDPC, Ltd.
>> > 172 Monce Road - Burlington, CT 06013 USA
>> > Email: Charles_S_Patridge@prodigy.net
>> > Web: http://www.sconsig.com
>> > Web: http://pages.prodigy.net/charles_s_patridge
>> > Web: http://www.munic.state.ct.us/burlington
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Concept Search
Posted by Erick Erickson <er...@gmail.com>.
That's interesting. I suppose you could add the "synonym" of
WildAnimals$ whenever you encountered any of the items in your
list, then when concept searching is called for, search on
WildAnimals$.
Highlighting might be tricky, but certainly do-able, especially with
the capabilities of a MemoryIndex......
Erick
On 5/16/07, Steven Rowe <sa...@syr.edu> wrote:
>
> Hi Charles,
>
> The need presented by your use case sounds very similar to that served
> by the SynonymAnalyzer given in Erik Hatcher's and Otis Gospodnetic's
> excellent book "Lucene in Action" - take a look:
>
> http://lucenebook.com/
>
> Steve
>
> Charles Patridge wrote:
> > I have looked around on Lucene web site as well as some documentation
> > but have not found anything to do with Concept Search.
> >
> > My definition of Concept Search is as follows:
> >
> > 1. I would have a file (list) of various phrases / N-grams which I
> > would like to Lucene to use as a search basis without having to type in
> > all these phrases manually, and have Lucene return the results as it
> > would normally if a single search query was entered.
> >
> > 2. An example would be - find !Wild_Animals! - where the "!" would
> > indicate that this is a search that would use a file (ie
> > Wild_Animals.txt) and read in the various phrases within this file and
> perform the
> > search in the corpus for these phrases.
> >
> > 3. The contents of Wild_Animals.txt could look like this:
> > BUFFALO
> > BEAR
> > MOOSE
> > COYOTE
> > WOLF
> > MOUNTAIN GOAT
> > MOUNTAIN SHEEP
> > DALL SHEEP
> > DEER
> > KODIAK BEAR
> > BROWN BEAR
> > BLACK BEAR
> > etc etc etc
> >
> > 4. Is my idea of a Concept Search feasible / doable??? If so, can
> > you point me to any documentation that exists whereby this could be done
> > within Lucene
> >
> > Please send any info you have on this to me -
> > Charles_S_Patridge@prodigy.net
> >
> > Thank you in advance for your time and efforts.
> >
> >
> >
> > Charles S Patridge - PDPC, Ltd.
> > 172 Monce Road - Burlington, CT 06013 USA
> > Email: Charles_S_Patridge@prodigy.net
> > Web: http://www.sconsig.com
> > Web: http://pages.prodigy.net/charles_s_patridge
> > Web: http://www.munic.state.ct.us/burlington
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
Re: Concept Search
Posted by Steven Rowe <sa...@syr.edu>.
Hi Charles,
The need presented by your use case sounds very similar to that served
by the SynonymAnalyzer given in Erik Hatcher's and Otis Gospodnetic's
excellent book "Lucene in Action" - take a look:
http://lucenebook.com/
Steve
Charles Patridge wrote:
> I have looked around on Lucene web site as well as some documentation
> but have not found anything to do with Concept Search.
>
> My definition of Concept Search is as follows:
>
> 1. I would have a file (list) of various phrases / N-grams which I
> would like to Lucene to use as a search basis without having to type in
> all these phrases manually, and have Lucene return the results as it
> would normally if a single search query was entered.
>
> 2. An example would be - find !Wild_Animals! - where the "!" would
> indicate that this is a search that would use a file (ie
> Wild_Animals.txt) and read in the various phrases within this file and perform the
> search in the corpus for these phrases.
>
> 3. The contents of Wild_Animals.txt could look like this:
> BUFFALO
> BEAR
> MOOSE
> COYOTE
> WOLF
> MOUNTAIN GOAT
> MOUNTAIN SHEEP
> DALL SHEEP
> DEER
> KODIAK BEAR
> BROWN BEAR
> BLACK BEAR
> etc etc etc
>
> 4. Is my idea of a Concept Search feasible / doable??? If so, can
> you point me to any documentation that exists whereby this could be done
> within Lucene
>
> Please send any info you have on this to me -
> Charles_S_Patridge@prodigy.net
>
> Thank you in advance for your time and efforts.
>
>
>
> Charles S Patridge - PDPC, Ltd.
> 172 Monce Road - Burlington, CT 06013 USA
> Email: Charles_S_Patridge@prodigy.net
> Web: http://www.sconsig.com
> Web: http://pages.prodigy.net/charles_s_patridge
> Web: http://www.munic.state.ct.us/burlington
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Concept Search
Posted by Mark Miller <ma...@gmail.com>.
There are quite a few ways to do this...you just read in the file to
create a list of the words and when your query parser sees the right
keyword either use a TokenFilter that expands to each word or just add
each word to a BooleanQuery as a Should clause (or expand to a proper
Lucene syntax string). This is basically just a thesaurus and you
shouldn't find it too terribly difficult to implement.
- Mark
Charles Patridge wrote:
> I have looked around on Lucene web site as well as some documentation
> but have not found anything to do with Concept Search.
>
> My definition of Concept Search is as follows:
>
> 1. I would have a file (list) of various phrases / N-grams which I
> would like to Lucene to use as a search basis without having to type in
> all these phrases manually, and have Lucene return the results as it
> would normally if a single search query was entered.
>
> 2. An example would be - find !Wild_Animals! - where the "!" would
> indicate that this is a search that would use a file (ie
> Wild_Animals.txt) and read in the various phrases within this file and perform the
> search in the corpus for these phrases.
>
> 3. The contents of Wild_Animals.txt could look like this:
> BUFFALO
> BEAR
> MOOSE
> COYOTE
> WOLF
> MOUNTAIN GOAT
> MOUNTAIN SHEEP
> DALL SHEEP
> DEER
> KODIAK BEAR
> BROWN BEAR
> BLACK BEAR
> etc etc etc
>
> 4. Is my idea of a Concept Search feasible / doable??? If so, can
> you point me to any documentation that exists whereby this could be done
> within Lucene
>
> Please send any info you have on this to me -
> Charles_S_Patridge@prodigy.net
>
> Thank you in advance for your time and efforts.
>
>
>
> Charles S Patridge - PDPC, Ltd.
> 172 Monce Road - Burlington, CT 06013 USA
> Email: Charles_S_Patridge@prodigy.net
> Web: http://www.sconsig.com
> Web: http://pages.prodigy.net/charles_s_patridge
> Web: http://www.munic.state.ct.us/burlington
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org