You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Monica Skidmore <Mo...@careerbuilder.com> on 2011/10/13 17:37:03 UTC

Filter Question

Our Solr implementation includes a third-party filter that adds additional, multiple term types to the token list (beyond "word", etc.).  Most of the time this is exactly what we want, but we felt we could improve our search results by having different tokens on the index and query side.  Since the filter in question was third-party and we didn't have access to source code, we wrote our own filter that will take out tokens based on their term attribute type.

We didn't see another filter available that does this - did we overlook it?  And if not, is this something that would be of value if we contribute it back to the Solr community?

Monica Skidmore
Search Technology Team
CareerBuilder.com


Re: Filter Question

Posted by Jan Høydahl <ja...@cominvent.com>.
Hi,

Interesting feature. See also https://issues.apache.org/jira/browse/LUCENE-3130 for a discussion of using TypeAttribtue to (de)boost certain token types such as synonyms. Having the ability to remove a token type from the search, we could do many kind of searches on the same field, that we currently need separate fields for.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 14. okt. 2011, at 09:17, Steven A Rowe wrote:

> Hi Monica,
> 
> AFAIK there is nothing like the filter you've described, and I believe it would be generally useful.  Maybe it could be called StopTermTypesFilter?  (Plural on Types to signify that more than one type of term can be stopped by a single instance of the filter.)  
> 
> Such a filter should have an enablePositionIncrements option like StopFilter.
> 
> Steve
> 
>> -----Original Message-----
>> From: Monica Skidmore [mailto:Monica.Skidmore@careerbuilder.com]
>> Sent: Thursday, October 13, 2011 1:04 PM
>> To: solr-user@lucene.apache.org; Otis Gospodnetic
>> Subject: RE: Filter Question
>> 
>> Thanks, Otis - yes, this is different from the synonyms filter, which we
>> also use.  For example, if you wanted all tokens that were marked 'lemma'
>> to be removed, you could specify that, and all tokens with any type other
>> than 'lemma' would still be returned.  You could also choose to remove
>> all tokens of types 'lemma' and 'word' (although that would probably be a
>> bad idea!), etc.  Normally, if you don't want a token type, you just
>> don't include/run the filter that produces that type.  However, we have a
>> third-party filter that produces multiple types, and this allows us to
>> select a subset of those types.
>> 
>> I did see the HowToContribute wiki, but I'm relatively new to solr, and I
>> wanted to see if this looked familiar to someone before I started down
>> the contribution path.
>> 
>> Thanks again!
>> 
>> 	-Monica
>> 
>> 
>> -----Original Message-----
>> From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
>> Sent: Thursday, October 13, 2011 12:37 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Filter Question
>> 
>> Monica,
>> 
>> This is different from Solr's synonyms filter with different synonyms
>> files, one for index-time and the other for query-time expansion (not
>> sure when you'd want that, but it looks like you need this and like
>> this), right?  If so, maybe you can describe what your filter does
>> differently and then follow http://wiki.apache.org/solr/HowToContribute -
>> thanks in advance! :)
>> 
>> Otis
>> ----
>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene
>> ecosystem search :: http://search-lucene.com/
>> 
>> 
>>> ________________________________
>>> From: Monica Skidmore <Mo...@careerbuilder.com>
>>> To: "solr-user@lucene.apache.org" <so...@lucene.apache.org>
>>> Sent: Thursday, October 13, 2011 11:37 AM
>>> Subject: Filter Question
>>> 
>>> Our Solr implementation includes a third-party filter that adds
>> additional, multiple term types to the token list (beyond "word",
>> etc.).  Most of the time this is exactly what we want, but we felt we
>> could improve our search results by having different tokens on the index
>> and query side.  Since the filter in question was third-party and we
>> didn't have access to source code, we wrote our own filter that will take
>> out tokens based on their term attribute type.
>>> 
>>> We didn't see another filter available that does this - did we overlook
>> it?  And if not, is this something that would be of value if we
>> contribute it back to the Solr community?
>>> 
>>> Monica Skidmore
>>> 
>>> 
>>> 
>>> 


RE: Filter Question

Posted by Monica Skidmore <Mo...@careerbuilder.com>.
Thanks Steven, that's just the kind of feedback I needed.  And thanks also to Jan.  I'll do a little clean-up on my filter and submit it...

  -Monica

-----Original Message-----
From: Steven A Rowe [mailto:sarowe@syr.edu] 
Sent: Friday, October 14, 2011 3:18 AM
To: solr-user@lucene.apache.org
Subject: RE: Filter Question

Hi Monica,

AFAIK there is nothing like the filter you've described, and I believe it would be generally useful.  Maybe it could be called StopTermTypesFilter?  (Plural on Types to signify that more than one type of term can be stopped by a single instance of the filter.)  

Such a filter should have an enablePositionIncrements option like StopFilter.

Steve

> -----Original Message-----
> From: Monica Skidmore [mailto:Monica.Skidmore@careerbuilder.com]
> Sent: Thursday, October 13, 2011 1:04 PM
> To: solr-user@lucene.apache.org; Otis Gospodnetic
> Subject: RE: Filter Question
> 
> Thanks, Otis - yes, this is different from the synonyms filter, which 
> we also use.  For example, if you wanted all tokens that were marked 'lemma'
> to be removed, you could specify that, and all tokens with any type 
> other than 'lemma' would still be returned.  You could also choose to 
> remove all tokens of types 'lemma' and 'word' (although that would 
> probably be a bad idea!), etc.  Normally, if you don't want a token 
> type, you just don't include/run the filter that produces that type.  
> However, we have a third-party filter that produces multiple types, 
> and this allows us to select a subset of those types.
> 
> I did see the HowToContribute wiki, but I'm relatively new to solr, 
> and I wanted to see if this looked familiar to someone before I 
> started down the contribution path.
> 
> Thanks again!
> 
> 	-Monica
> 
> 
> -----Original Message-----
> From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
> Sent: Thursday, October 13, 2011 12:37 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Filter Question
> 
> Monica,
> 
> This is different from Solr's synonyms filter with different synonyms 
> files, one for index-time and the other for query-time expansion (not 
> sure when you'd want that, but it looks like you need this and like 
> this), right?  If so, maybe you can describe what your filter does 
> differently and then follow 
> http://wiki.apache.org/solr/HowToContribute - thanks in advance! :)
> 
> Otis
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene 
> ecosystem search :: http://search-lucene.com/
> 
> 
> >________________________________
> >From: Monica Skidmore <Mo...@careerbuilder.com>
> >To: "solr-user@lucene.apache.org" <so...@lucene.apache.org>
> >Sent: Thursday, October 13, 2011 11:37 AM
> >Subject: Filter Question
> >
> >Our Solr implementation includes a third-party filter that adds
> additional, multiple term types to the token list (beyond "word", 
> etc.).  Most of the time this is exactly what we want, but we felt we 
> could improve our search results by having different tokens on the 
> index and query side.  Since the filter in question was third-party 
> and we didn't have access to source code, we wrote our own filter that 
> will take out tokens based on their term attribute type.
> >
> >We didn't see another filter available that does this - did we 
> >overlook
> it?  And if not, is this something that would be of value if we 
> contribute it back to the Solr community?
> >
> >Monica Skidmore
> >
> >
> >
> >

RE: Filter Question

Posted by Steven A Rowe <sa...@syr.edu>.
Hi Monica,

AFAIK there is nothing like the filter you've described, and I believe it would be generally useful.  Maybe it could be called StopTermTypesFilter?  (Plural on Types to signify that more than one type of term can be stopped by a single instance of the filter.)  

Such a filter should have an enablePositionIncrements option like StopFilter.

Steve

> -----Original Message-----
> From: Monica Skidmore [mailto:Monica.Skidmore@careerbuilder.com]
> Sent: Thursday, October 13, 2011 1:04 PM
> To: solr-user@lucene.apache.org; Otis Gospodnetic
> Subject: RE: Filter Question
> 
> Thanks, Otis - yes, this is different from the synonyms filter, which we
> also use.  For example, if you wanted all tokens that were marked 'lemma'
> to be removed, you could specify that, and all tokens with any type other
> than 'lemma' would still be returned.  You could also choose to remove
> all tokens of types 'lemma' and 'word' (although that would probably be a
> bad idea!), etc.  Normally, if you don't want a token type, you just
> don't include/run the filter that produces that type.  However, we have a
> third-party filter that produces multiple types, and this allows us to
> select a subset of those types.
> 
> I did see the HowToContribute wiki, but I'm relatively new to solr, and I
> wanted to see if this looked familiar to someone before I started down
> the contribution path.
> 
> Thanks again!
> 
> 	-Monica
> 
> 
> -----Original Message-----
> From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
> Sent: Thursday, October 13, 2011 12:37 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Filter Question
> 
> Monica,
> 
> This is different from Solr's synonyms filter with different synonyms
> files, one for index-time and the other for query-time expansion (not
> sure when you'd want that, but it looks like you need this and like
> this), right?  If so, maybe you can describe what your filter does
> differently and then follow http://wiki.apache.org/solr/HowToContribute -
> thanks in advance! :)
> 
> Otis
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene
> ecosystem search :: http://search-lucene.com/
> 
> 
> >________________________________
> >From: Monica Skidmore <Mo...@careerbuilder.com>
> >To: "solr-user@lucene.apache.org" <so...@lucene.apache.org>
> >Sent: Thursday, October 13, 2011 11:37 AM
> >Subject: Filter Question
> >
> >Our Solr implementation includes a third-party filter that adds
> additional, multiple term types to the token list (beyond "word",
> etc.).  Most of the time this is exactly what we want, but we felt we
> could improve our search results by having different tokens on the index
> and query side.  Since the filter in question was third-party and we
> didn't have access to source code, we wrote our own filter that will take
> out tokens based on their term attribute type.
> >
> >We didn't see another filter available that does this - did we overlook
> it?  And if not, is this something that would be of value if we
> contribute it back to the Solr community?
> >
> >Monica Skidmore
> >
> >
> >
> >

RE: Filter Question

Posted by Monica Skidmore <Mo...@careerbuilder.com>.
Thanks, Otis - yes, this is different from the synonyms filter, which we also use.  For example, if you wanted all tokens that were marked 'lemma' to be removed, you could specify that, and all tokens with any type other than 'lemma' would still be returned.  You could also choose to remove all tokens of types 'lemma' and 'word' (although that would probably be a bad idea!), etc.  Normally, if you don't want a token type, you just don't include/run the filter that produces that type.  However, we have a third-party filter that produces multiple types, and this allows us to select a subset of those types.

I did see the HowToContribute wiki, but I'm relatively new to solr, and I wanted to see if this looked familiar to someone before I started down the contribution path.

Thanks again!

	-Monica


-----Original Message-----
From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com] 
Sent: Thursday, October 13, 2011 12:37 PM
To: solr-user@lucene.apache.org
Subject: Re: Filter Question

Monica,

This is different from Solr's synonyms filter with different synonyms files, one for index-time and the other for query-time expansion (not sure when you'd want that, but it looks like you need this and like this), right?  If so, maybe you can describe what your filter does differently and then follow http://wiki.apache.org/solr/HowToContribute - thanks in advance! :)

Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/


>________________________________
>From: Monica Skidmore <Mo...@careerbuilder.com>
>To: "solr-user@lucene.apache.org" <so...@lucene.apache.org>
>Sent: Thursday, October 13, 2011 11:37 AM
>Subject: Filter Question
>
>Our Solr implementation includes a third-party filter that adds additional, multiple term types to the token list (beyond "word", etc.).  Most of the time this is exactly what we want, but we felt we could improve our search results by having different tokens on the index and query side.  Since the filter in question was third-party and we didn't have access to source code, we wrote our own filter that will take out tokens based on their term attribute type.
>
>We didn't see another filter available that does this - did we overlook it?  And if not, is this something that would be of value if we contribute it back to the Solr community?
>
>Monica Skidmore
>
>
>
>

Re: Filter Question

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Monica,

This is different from Solr's synonyms filter with different synonyms files, one for index-time and the other for query-time expansion (not sure when you'd want that, but it looks like you need this and like this), right?  If so, maybe you can describe what your filter does differently and then follow http://wiki.apache.org/solr/HowToContribute - thanks in advance! :)

Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/


>________________________________
>From: Monica Skidmore <Mo...@careerbuilder.com>
>To: "solr-user@lucene.apache.org" <so...@lucene.apache.org>
>Sent: Thursday, October 13, 2011 11:37 AM
>Subject: Filter Question
>
>Our Solr implementation includes a third-party filter that adds additional, multiple term types to the token list (beyond "word", etc.).  Most of the time this is exactly what we want, but we felt we could improve our search results by having different tokens on the index and query side.  Since the filter in question was third-party and we didn't have access to source code, we wrote our own filter that will take out tokens based on their term attribute type.
>
>We didn't see another filter available that does this - did we overlook it?  And if not, is this something that would be of value if we contribute it back to the Solr community?
>
>Monica Skidmore
>Search Technology Team
>CareerBuilder.com
>
>
>
>