You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Paul Taylor <pa...@fastmail.fm> on 2012/02/21 14:06:50 UTC

Can I just add ShingleFilter to my nalayzer used for indexing and searching

Trying out ShingleFIlter and the way it is documented it implys that you 
can just add it to your anaylzer and that's it with no side-effects 
except a larger index, but I read other implying you have to modify the 
way you parse user queries, could anyone confirm/deny.

Also is there an easy way to use a ShingleFilter only for common stop 
words, or is that pointless.

Paul

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Can I just add ShingleFilter to my nalayzer used for indexing and searching

Posted by Paul Taylor <pa...@fastmail.fm>.
On 21/02/2012 14:37, Steven A Rowe wrote:
> Hi Paul,
>
> Lucene QueryParser splits on whitespace and then sends individual words one-by-one to be analyzed.  All analysis components that do their work based on more than one word, including ShingleFilter and SynonymFilter, are borked by this.  (There is a JIRA issue open for the QueryParser problem:<https://issues.apache.org/jira/browse/LUCENE-2605>).
>
> There is a workaround involving PositionFilter described on the Solr wiki:<http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PositionFilterFactory>.  Essentially, include PositionFilter after ShingleFilter in your analyzer, then wrap queries in quotes before sending them to QueryParser.
>
> CommonGramsFilter does the emit-only-shingles-containing-stopwords thing, but in Lucene/Solr 3.x, it's in Solr (solr-core-3.X.jar, to be exact), not Lucene; you can use it in your application by including the solr-core jar as a dependency.  In trunk, which will be released as Lucene/Solr 4.0, CommonGramsFilter has been moved to the analyzers-common module.
>
> Steve
>
>
Thanks Steve, as our user interface allows access to the full lucene 
query syntax I'll hold off this for now.

Paul

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Can I just add ShingleFilter to my nalayzer used for indexing and searching

Posted by Steven A Rowe <sa...@syr.edu>.
Hi Paul,

Lucene QueryParser splits on whitespace and then sends individual words one-by-one to be analyzed.  All analysis components that do their work based on more than one word, including ShingleFilter and SynonymFilter, are borked by this.  (There is a JIRA issue open for the QueryParser problem: <https://issues.apache.org/jira/browse/LUCENE-2605>).  

There is a workaround involving PositionFilter described on the Solr wiki: <http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PositionFilterFactory>.  Essentially, include PositionFilter after ShingleFilter in your analyzer, then wrap queries in quotes before sending them to QueryParser.

CommonGramsFilter does the emit-only-shingles-containing-stopwords thing, but in Lucene/Solr 3.x, it's in Solr (solr-core-3.X.jar, to be exact), not Lucene; you can use it in your application by including the solr-core jar as a dependency.  In trunk, which will be released as Lucene/Solr 4.0, CommonGramsFilter has been moved to the analyzers-common module.

Steve

> -----Original Message-----
> From: Paul Taylor [mailto:paul_t100@fastmail.fm]
> Sent: Tuesday, February 21, 2012 8:07 AM
> To: java-user@lucene.apache.org
> Subject: Can I just add ShingleFilter to my nalayzer used for indexing and
> searching
> 
> Trying out ShingleFIlter and the way it is documented it implys that you
> can just add it to your anaylzer and that's it with no side-effects
> except a larger index, but I read other implying you have to modify the
> way you parse user queries, could anyone confirm/deny.
> 
> Also is there an easy way to use a ShingleFilter only for common stop
> words, or is that pointless.
> 
> Paul
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org