You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Sam Lee <sk...@gmail.com> on 2013/05/22 17:37:57 UTC

filter query by string length or word count?

I have schema.xml
<field name="body" type="text_en_html" indexed="true" stored="true"
omitNorms="true"/>
...
<fieldType name="text_en_html" class="solr.TextField"
positionIncrementGap="100">
    <analyzer type="index">
        <charFilter class="solr.HTMLStripCharFilterFactory"/>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords_en.txt"
                enablePositionIncrements="true"
                />
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPossessiveFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory"
protected="protwords.txt"/>
        <filter class="solr.PorterStemFilterFactory"/>
    </analyzer>
    <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords_en.txt"
                enablePositionIncrements="true"
                />
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPossessiveFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory"
protected="protwords.txt"/>
        <filter class="solr.PorterStemFilterFactory"/>
    </analyzer>
</fieldType>


how can I query docs whose body has more than 80 words (or 80 characters) ?

Re: filter query by string length or word count?

Posted by Jason Hellman <jh...@innoventsolutions.com>.
Sam,

I would highly suggest counting the words in your external pipeline and sending that value in as a specific field.  It can then be queried quite simply with a:

wordcount:{80 TO *]

(Note the { next to 80, excluding the value of 80)

Jason

On May 22, 2013, at 11:37 AM, Sam Lee <sk...@gmail.com> wrote:

> I have schema.xml
> <field name="body" type="text_en_html" indexed="true" stored="true"
> omitNorms="true"/>
> ...
> <fieldType name="text_en_html" class="solr.TextField"
> positionIncrementGap="100">
>    <analyzer type="index">
>        <charFilter class="solr.HTMLStripCharFilterFactory"/>
>        <tokenizer class="solr.StandardTokenizerFactory"/>
>        <filter class="solr.StopFilterFactory"
>                ignoreCase="true"
>                words="stopwords_en.txt"
>                enablePositionIncrements="true"
>                />
>        <filter class="solr.LowerCaseFilterFactory"/>
>        <filter class="solr.EnglishPossessiveFilterFactory"/>
>        <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
>        <filter class="solr.PorterStemFilterFactory"/>
>    </analyzer>
>    <analyzer type="query">
>        <tokenizer class="solr.StandardTokenizerFactory"/>
>        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>        <filter class="solr.StopFilterFactory"
>                ignoreCase="true"
>                words="stopwords_en.txt"
>                enablePositionIncrements="true"
>                />
>        <filter class="solr.LowerCaseFilterFactory"/>
>        <filter class="solr.EnglishPossessiveFilterFactory"/>
>        <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
>        <filter class="solr.PorterStemFilterFactory"/>
>    </analyzer>
> </fieldType>
> 
> 
> how can I query docs whose body has more than 80 words (or 80 characters) ?


Re: filter query by string length or word count?

Posted by Sandeep Mestry <sa...@gmail.com>.
I doubt if there is any straight out of the box feature that supports this
requirement, you will probably need to handle this at the index time.
You can play around with Function Queries
http://wiki.apache.org/solr/FunctionQuery for any such feature.



On 22 May 2013 16:37, Sam Lee <sk...@gmail.com> wrote:

> I have schema.xml
> <field name="body" type="text_en_html" indexed="true" stored="true"
> omitNorms="true"/>
> ...
> <fieldType name="text_en_html" class="solr.TextField"
> positionIncrementGap="100">
>     <analyzer type="index">
>         <charFilter class="solr.HTMLStripCharFilterFactory"/>
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>         <filter class="solr.StopFilterFactory"
>                 ignoreCase="true"
>                 words="stopwords_en.txt"
>                 enablePositionIncrements="true"
>                 />
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.EnglishPossessiveFilterFactory"/>
>         <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
>         <filter class="solr.PorterStemFilterFactory"/>
>     </analyzer>
>     <analyzer type="query">
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>         <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>         <filter class="solr.StopFilterFactory"
>                 ignoreCase="true"
>                 words="stopwords_en.txt"
>                 enablePositionIncrements="true"
>                 />
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.EnglishPossessiveFilterFactory"/>
>         <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
>         <filter class="solr.PorterStemFilterFactory"/>
>     </analyzer>
> </fieldType>
>
>
> how can I query docs whose body has more than 80 words (or 80 characters) ?
>