You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Pradeep Pujari <Pr...@rocketmail.com> on 2011/07/28 19:43:00 UTC
ShingleFilterFactory class error
Hi,
I am trying to create shingles with minShingleSize = 10, but it also returns bi-grams too. Heres is my schema defn
<filter class="solr.ShingleFilterFactory" minShingleSize="10" maxShingleSize="25"
outputUnigrams="false" outputUnigramsIfNoShingles="false" tokenSeparator=" "/>
For the input String "Apple - iPad 3G Wi-Fi - 32GB", it breaks into
"Apple -"
"- iPad "
My understaing that it should be 10-gram token.
Is it bug or any configuration is to be added.
Thank you in advance.
Pradeep
RE: ShingleFilterFactory class error
Posted by Steven A Rowe <sa...@syr.edu>.
Pradeep,
As indicated on the wiki <http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ShingleFilterFactory>, the minShingleSize option is not available in Solr versions prior to 3.1.
What version of Solr are you using?
(By the way, I am only replying on solr-user@lucene.apache.org mailing list - the dev@lucene.apache.org mailing list is for the development of Lucene/Solr, not for questions about using the products; please ask first on solr-user@lucene.apache.org, if you think you have found a bug. If you don't get an answer in a day or two, then it makes sense to escalate to dev@lucene.apache.org.)
Steve
> -----Original Message-----
> From: Pradeep Pujari [mailto:Pradeepp@rocketmail.com]
> Sent: Thursday, July 28, 2011 1:43 PM
> To: solr-user@lucene.apache.org
> Subject: ShingleFilterFactory class error
>
> Hi,
>
> I am trying to create shingles with minShingleSize = 10, but it also
> returns bi-grams too. Heres is my schema defn
>
> <filter class="solr.ShingleFilterFactory" minShingleSize="10"
> maxShingleSize="25"
> outputUnigrams="false" outputUnigramsIfNoShingles="false"
> tokenSeparator=" "/>
>
>
> For the input String "Apple - iPad 3G Wi-Fi - 32GB", it breaks into
> "Apple -"
> "- iPad "
>
> My understaing that it should be 10-gram token.
>
> Is it bug or any configuration is to be added.
>
> Thank you in advance.
> Pradeep