You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Charles Patridge <ch...@prodigy.net> on 2007/05/15 17:11:16 UTC

Concept Search

  I have looked around on Lucene web site as well as some documentation 
but have not found anything to do with Concept Search.
   
    My definition of Concept Search is as follows:
  
  1.  I would have a file (list) of various phrases / N-grams which I 
would like to Lucene to use as a search basis without having to type in 
all these phrases manually, and have Lucene return the results as it 
would normally if a single search query was entered.
   
    2.  An example would be - find !Wild_Animals! - where the "!" would 
indicate that this is a search that would use a file (ie 
Wild_Animals.txt) and read in the various phrases within this file and perform the 
search in the corpus for these phrases.
   
    3.  The contents of Wild_Animals.txt could look like this:
  BUFFALO
  BEAR
  MOOSE
  COYOTE
  WOLF
  MOUNTAIN GOAT
  MOUNTAIN SHEEP
  DALL SHEEP
  DEER
  KODIAK BEAR
  BROWN BEAR
  BLACK BEAR
  etc etc etc
   
    4.  Is my idea of a Concept Search feasible / doable???  If so, can 
you point me to any documentation that exists whereby this could be done 
within Lucene
   
    Please send any info you have on this to me - 
Charles_S_Patridge@prodigy.net
  
  Thank you in advance for your time and efforts.



Charles S Patridge - PDPC, Ltd.
172 Monce Road - Burlington, CT 06013 USA
Email: Charles_S_Patridge@prodigy.net
Web: http://www.sconsig.com
Web: http://pages.prodigy.net/charles_s_patridge
Web: http://www.munic.state.ct.us/burlington

Re: Concept Search

Posted by Daniel Noll <da...@nuix.com>.
On Thursday 17 May 2007 09:50:55 Erick Erickson wrote:
> I thought that that's the point in the Synonym injection
> example, setting Term.setPositionIncrement(0) for the injected
> token(s). That way. phrase queries work since all of the
> injected tokens share the same offset....
>
> But I've been wrong before.

Ah, I see.  A feature I haven't toyed with just yet.

That's rather nice. :-)

Daniel


-- 
Daniel Noll
Nuix Pty Ltd
Suite 79, 89 Jones St, Ultimo NSW 2007, Australia    Ph: +61 2 9280 0699
Web: http://nuix.com/                               Fax: +61 2 9212 6902

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Concept Search

Posted by Erick Erickson <er...@gmail.com>.
<<What *would* be tricky is phrase queries since inserting a new term breaks
the
offsets AFAIK.>>

I thought that that's the point in the Synonym injection
example, setting Term.setPositionIncrement(0) for the injected
token(s). That way. phrase queries work since all of the
injected tokens share the same offset....

But I've been wrong before.

Erick

On 5/16/07, Daniel Noll <da...@nuix.com> wrote:
>
> On Wednesday 16 May 2007 23:50:55 Erick Erickson wrote:
> > That's interesting. I suppose you could add the "synonym" of
> > WildAnimals$ whenever you encountered any of the items in your
> > list, then when concept searching is called for, search on
> > WildAnimals$.
> >
> > Highlighting might be tricky, but certainly do-able, especially with
> > the capabilities of a MemoryIndex......
>
> I'm not even convinced it would be tricky.  I'm fairly sure that if the
> token
> stream returns two terms over the same span, that they would have the same
> start offset and end offset and highlight identically.
>
> What *would* be tricky is phrase queries since inserting a new term breaks
> the
> offsets AFAIK.
>
> Although, I suppose you could always store the concepts in a different
> field
> and not modify the analyser being used for the text itself.
>
> Daniel
>
>
>
> --
> Daniel Noll
> Nuix Pty Ltd
> Suite 79, 89 Jones St, Ultimo NSW 2007, Australia    Ph: +61 2 9280 0699
> Web: http://nuix.com/                               Fax: +61 2 9212 6902
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Concept Search

Posted by Daniel Noll <da...@nuix.com>.
On Wednesday 16 May 2007 23:50:55 Erick Erickson wrote:
> That's interesting. I suppose you could add the "synonym" of
> WildAnimals$ whenever you encountered any of the items in your
> list, then when concept searching is called for, search on
> WildAnimals$.
>
> Highlighting might be tricky, but certainly do-able, especially with
> the capabilities of a MemoryIndex......

I'm not even convinced it would be tricky.  I'm fairly sure that if the token 
stream returns two terms over the same span, that they would have the same 
start offset and end offset and highlight identically.

What *would* be tricky is phrase queries since inserting a new term breaks the 
offsets AFAIK.

Although, I suppose you could always store the concepts in a different field 
and not modify the analyser being used for the text itself.

Daniel



-- 
Daniel Noll
Nuix Pty Ltd
Suite 79, 89 Jones St, Ultimo NSW 2007, Australia    Ph: +61 2 9280 0699
Web: http://nuix.com/                               Fax: +61 2 9212 6902

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Concept Search

Posted by Steven Rowe <sa...@syr.edu>.
That's not precisely what I was imagining, although it does sound viable
- I was thinking of using standard indexing, and then generating concept
instantiations ("synonyms") at query time. - Steve

Erick Erickson wrote:
> That's interesting. I suppose you could add the "synonym" of
> WildAnimals$ whenever you encountered any of the items in your
> list, then when concept searching is called for, search on
> WildAnimals$.
> 
> Highlighting might be tricky, but certainly do-able, especially with
> the capabilities of a MemoryIndex......
> 
> Erick
> 
> On 5/16/07, Steven Rowe <sa...@syr.edu> wrote:
>>
>> Hi Charles,
>>
>> The need presented by your use case sounds very similar to that served
>> by the SynonymAnalyzer given in Erik Hatcher's and Otis Gospodnetic's
>> excellent book "Lucene in Action" - take a look:
>>
>>     http://lucenebook.com/
>>
>> Steve
>>
>> Charles Patridge wrote:
>> >   I have looked around on Lucene web site as well as some documentation
>> > but have not found anything to do with Concept Search.
>> >
>> >     My definition of Concept Search is as follows:
>> >
>> >   1.  I would have a file (list) of various phrases / N-grams which I
>> > would like to Lucene to use as a search basis without having to type in
>> > all these phrases manually, and have Lucene return the results as it
>> > would normally if a single search query was entered.
>> >
>> >     2.  An example would be - find !Wild_Animals! - where the "!" would
>> > indicate that this is a search that would use a file (ie
>> > Wild_Animals.txt) and read in the various phrases within this file and
>> perform the
>> > search in the corpus for these phrases.
>> >
>> >     3.  The contents of Wild_Animals.txt could look like this:
>> >   BUFFALO
>> >   BEAR
>> >   MOOSE
>> >   COYOTE
>> >   WOLF
>> >   MOUNTAIN GOAT
>> >   MOUNTAIN SHEEP
>> >   DALL SHEEP
>> >   DEER
>> >   KODIAK BEAR
>> >   BROWN BEAR
>> >   BLACK BEAR
>> >   etc etc etc
>> >
>> >     4.  Is my idea of a Concept Search feasible / doable???  If so, can
>> > you point me to any documentation that exists whereby this could be
>> done
>> > within Lucene
>> >
>> >     Please send any info you have on this to me -
>> > Charles_S_Patridge@prodigy.net
>> >
>> >   Thank you in advance for your time and efforts.
>> >
>> >
>> >
>> > Charles S Patridge - PDPC, Ltd.
>> > 172 Monce Road - Burlington, CT 06013 USA
>> > Email: Charles_S_Patridge@prodigy.net
>> > Web: http://www.sconsig.com
>> > Web: http://pages.prodigy.net/charles_s_patridge
>> > Web: http://www.munic.state.ct.us/burlington


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Concept Search

Posted by Erick Erickson <er...@gmail.com>.
That's interesting. I suppose you could add the "synonym" of
WildAnimals$ whenever you encountered any of the items in your
list, then when concept searching is called for, search on
WildAnimals$.

Highlighting might be tricky, but certainly do-able, especially with
the capabilities of a MemoryIndex......

Erick

On 5/16/07, Steven Rowe <sa...@syr.edu> wrote:
>
> Hi Charles,
>
> The need presented by your use case sounds very similar to that served
> by the SynonymAnalyzer given in Erik Hatcher's and Otis Gospodnetic's
> excellent book "Lucene in Action" - take a look:
>
>     http://lucenebook.com/
>
> Steve
>
> Charles Patridge wrote:
> >   I have looked around on Lucene web site as well as some documentation
> > but have not found anything to do with Concept Search.
> >
> >     My definition of Concept Search is as follows:
> >
> >   1.  I would have a file (list) of various phrases / N-grams which I
> > would like to Lucene to use as a search basis without having to type in
> > all these phrases manually, and have Lucene return the results as it
> > would normally if a single search query was entered.
> >
> >     2.  An example would be - find !Wild_Animals! - where the "!" would
> > indicate that this is a search that would use a file (ie
> > Wild_Animals.txt) and read in the various phrases within this file and
> perform the
> > search in the corpus for these phrases.
> >
> >     3.  The contents of Wild_Animals.txt could look like this:
> >   BUFFALO
> >   BEAR
> >   MOOSE
> >   COYOTE
> >   WOLF
> >   MOUNTAIN GOAT
> >   MOUNTAIN SHEEP
> >   DALL SHEEP
> >   DEER
> >   KODIAK BEAR
> >   BROWN BEAR
> >   BLACK BEAR
> >   etc etc etc
> >
> >     4.  Is my idea of a Concept Search feasible / doable???  If so, can
> > you point me to any documentation that exists whereby this could be done
> > within Lucene
> >
> >     Please send any info you have on this to me -
> > Charles_S_Patridge@prodigy.net
> >
> >   Thank you in advance for your time and efforts.
> >
> >
> >
> > Charles S Patridge - PDPC, Ltd.
> > 172 Monce Road - Burlington, CT 06013 USA
> > Email: Charles_S_Patridge@prodigy.net
> > Web: http://www.sconsig.com
> > Web: http://pages.prodigy.net/charles_s_patridge
> > Web: http://www.munic.state.ct.us/burlington
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Concept Search

Posted by Steven Rowe <sa...@syr.edu>.
Hi Charles,

The need presented by your use case sounds very similar to that served
by the SynonymAnalyzer given in Erik Hatcher's and Otis Gospodnetic's
excellent book "Lucene in Action" - take a look:

    http://lucenebook.com/

Steve

Charles Patridge wrote:
>   I have looked around on Lucene web site as well as some documentation 
> but have not found anything to do with Concept Search.
>    
>     My definition of Concept Search is as follows:
>   
>   1.  I would have a file (list) of various phrases / N-grams which I 
> would like to Lucene to use as a search basis without having to type in 
> all these phrases manually, and have Lucene return the results as it 
> would normally if a single search query was entered.
>    
>     2.  An example would be - find !Wild_Animals! - where the "!" would 
> indicate that this is a search that would use a file (ie 
> Wild_Animals.txt) and read in the various phrases within this file and perform the 
> search in the corpus for these phrases.
>    
>     3.  The contents of Wild_Animals.txt could look like this:
>   BUFFALO
>   BEAR
>   MOOSE
>   COYOTE
>   WOLF
>   MOUNTAIN GOAT
>   MOUNTAIN SHEEP
>   DALL SHEEP
>   DEER
>   KODIAK BEAR
>   BROWN BEAR
>   BLACK BEAR
>   etc etc etc
>    
>     4.  Is my idea of a Concept Search feasible / doable???  If so, can 
> you point me to any documentation that exists whereby this could be done 
> within Lucene
>    
>     Please send any info you have on this to me - 
> Charles_S_Patridge@prodigy.net
>   
>   Thank you in advance for your time and efforts.
> 
> 
> 
> Charles S Patridge - PDPC, Ltd.
> 172 Monce Road - Burlington, CT 06013 USA
> Email: Charles_S_Patridge@prodigy.net
> Web: http://www.sconsig.com
> Web: http://pages.prodigy.net/charles_s_patridge
> Web: http://www.munic.state.ct.us/burlington


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Concept Search

Posted by Mark Miller <ma...@gmail.com>.
There are quite a few ways to do this...you just read in the file to 
create a list of the words and when your query parser sees the right 
keyword either use a TokenFilter that expands to each word or just add 
each word to a BooleanQuery as a Should clause (or expand to a proper 
Lucene syntax string). This is basically just a thesaurus and you 
shouldn't find it too terribly difficult to implement.

- Mark

Charles Patridge wrote:
>   I have looked around on Lucene web site as well as some documentation 
> but have not found anything to do with Concept Search.
>    
>     My definition of Concept Search is as follows:
>   
>   1.  I would have a file (list) of various phrases / N-grams which I 
> would like to Lucene to use as a search basis without having to type in 
> all these phrases manually, and have Lucene return the results as it 
> would normally if a single search query was entered.
>    
>     2.  An example would be - find !Wild_Animals! - where the "!" would 
> indicate that this is a search that would use a file (ie 
> Wild_Animals.txt) and read in the various phrases within this file and perform the 
> search in the corpus for these phrases.
>    
>     3.  The contents of Wild_Animals.txt could look like this:
>   BUFFALO
>   BEAR
>   MOOSE
>   COYOTE
>   WOLF
>   MOUNTAIN GOAT
>   MOUNTAIN SHEEP
>   DALL SHEEP
>   DEER
>   KODIAK BEAR
>   BROWN BEAR
>   BLACK BEAR
>   etc etc etc
>    
>     4.  Is my idea of a Concept Search feasible / doable???  If so, can 
> you point me to any documentation that exists whereby this could be done 
> within Lucene
>    
>     Please send any info you have on this to me - 
> Charles_S_Patridge@prodigy.net
>   
>   Thank you in advance for your time and efforts.
>
>
>
> Charles S Patridge - PDPC, Ltd.
> 172 Monce Road - Burlington, CT 06013 USA
> Email: Charles_S_Patridge@prodigy.net
> Web: http://www.sconsig.com
> Web: http://pages.prodigy.net/charles_s_patridge
> Web: http://www.munic.state.ct.us/burlington
>   

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org