You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Pradeep Pujari <Pr...@rocketmail.com> on 2011/07/28 19:43:00 UTC

ShingleFilterFactory class error

Hi,

I am trying to create shingles with minShingleSize = 10, but it also returns bi-grams too. Heres is my schema defn

            <filter class="solr.ShingleFilterFactory" minShingleSize="10" maxShingleSize="25"
                outputUnigrams="false" outputUnigramsIfNoShingles="false" tokenSeparator=" "/>


For the input String "Apple - iPad 3G Wi-Fi - 32GB", it breaks into
    "Apple -"
    "- iPad "   

My understaing that it should be 10-gram token.

Is it bug or any configuration is to be added. 

Thank you in advance.
Pradeep

RE: ShingleFilterFactory class error

Posted by Steven A Rowe <sa...@syr.edu>.
Pradeep,

As indicated on the wiki <http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ShingleFilterFactory>, the minShingleSize option is not available in Solr versions prior to 3.1.

What version of Solr are you using?

(By the way, I am only replying on solr-user@lucene.apache.org mailing list - the dev@lucene.apache.org mailing list is for the development of Lucene/Solr, not for questions about using the products; please ask first on solr-user@lucene.apache.org, if you think you have found a bug.  If you don't get an answer in a day or two, then it makes sense to escalate to dev@lucene.apache.org.)

Steve


> -----Original Message-----
> From: Pradeep Pujari [mailto:Pradeepp@rocketmail.com]
> Sent: Thursday, July 28, 2011 1:43 PM
> To: solr-user@lucene.apache.org
> Subject: ShingleFilterFactory class error
> 
> Hi,
> 
> I am trying to create shingles with minShingleSize = 10, but it also
> returns bi-grams too. Heres is my schema defn
> 
>             <filter class="solr.ShingleFilterFactory" minShingleSize="10"
> maxShingleSize="25"
>                 outputUnigrams="false" outputUnigramsIfNoShingles="false"
> tokenSeparator=" "/>
> 
> 
> For the input String "Apple - iPad 3G Wi-Fi - 32GB", it breaks into
>     "Apple -"
>     "- iPad "
> 
> My understaing that it should be 10-gram token.
> 
> Is it bug or any configuration is to be added.
> 
> Thank you in advance.
> Pradeep