You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Jie Gao <j....@sheffield.ac.uk> on 2015/09/10 11:25:33 UTC

Can solr ttf functionQuery support ngram (n>2) ?

Hi,

I'm wondering whether solr ttf functionQuery support (compound words) ngram
(n>2) ?

I'm using "
http://localhost:8983/solr/collection1/select?q=*:*&fl=ttf(content,%22apple%20banana%22)&rows=1"
to query total term frequency of bigram tokens in "content" field in the
whole index.

However, the result (returned with 20) is not consistent with the result
queried via
http://localhost:8983/solr/tatasteel/select?q=content:%22apple%20banana%22.
I manually checked the actual occurrence is 15.

What is the actual behaviour of the ttf function query (i'm using solr
5.3.0)? The reference guide does not explain the details.

Does it perform full text index query on this field ? or it relies on the
tf values stored by tvComponent?

I have configured the content field with the following textField type:

<fieldType name="text_tr_general" class="solr.TextField"
positionIncrementGap="100">
        <analyzer type="index">
            <tokenizer class="solr.StandardTokenizerFactory" />
            <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />
            <filter class="solr.LowerCaseFilterFactory" />
            <filter class="solr.ShingleFilterFactory"
minShingleSize="2" maxShingleSize="5"
                    outputUnigrams="true"
outputUnigramsIfNoShingles="false" tokenSeparator=" "/>
        </analyzer>
        <analyzer type="query">
            <tokenizer class="solr.WhitespaceTokenizerFactory"/>
            <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />
            <filter class="solr.SynonymFilterFactory"
synonyms="synonyms.txt" ignoreCase="true" expand="true" />
            <filter class="solr.LowerCaseFilterFactory" />
            <filter class="solr.ASCIIFoldingFilterFactory"/>
        </analyzer>
    </fieldType>

Any ideas ?

Thanks,
Jerry

Re: Can solr ttf functionQuery support ngram (n>2) ?

Posted by Jie Gao <j....@sheffield.ac.uk>.

Please ignore for this question.

No problem for the ttf functionQuery now.

I did wrong for manually checking of the tf result. The row size should be
set to more than default size (10) for phrase query "
http://localhost:8983/solr/
<http://localhost:8983/solr/tatasteel/select?q=content:%22apple%20banana%22>
collection1
<http://localhost:8983/solr/collection1/select?q=*:*&fl=ttf(content,%22apple%20banana%22)&rows=1>
/select?q=content:%22apple%20banana%22&*rows=100*".

Thanks,
Jerry

Jie Gao,
Research Assistant,
Department of Computer Science, The University of Sheffield,
Regent Court, 211 Portobello, S1 4DP, Sheffield, UK

On 10 September 2015 at 10:27, Jie Gao <j....@sheffield.ac.uk> wrote:

> A typo is fixed in the following query url.
>
> On 10 September 2015 at 10:25, Jie Gao <j....@sheffield.ac.uk> wrote:
>
>> Hi,
>>
>> I'm wondering whether solr ttf functionQuery support (compound words)
>> ngram (n>2) ?
>>
>> I'm using "
>> http://localhost:8983/solr/collection1/select?q=*:*&fl=ttf(content,%22apple%20banana%22)&rows=1"
>> to query total term frequency of bigram tokens in "content" field in the
>> whole index.
>>
>> However, the result (returned with 20) is not consistent with the result
>> queried via http://localhost:8983/solr/
>> <http://localhost:8983/solr/tatasteel/select?q=content:%22apple%20banana%22>
>> collection1
>> <http://localhost:8983/solr/collection1/select?q=*:*&fl=ttf(content,%22apple%20banana%22)&rows=1>/select?q=content:%22apple%20banana%22.
>> I manually checked the actual occurrence is 15.
>>
>> What is the actual behaviour of the ttf function query (i'm using solr
>> 5.3.0)? The reference guide does not explain the details.
>>
>> Does it perform full text index query on this field ? or it relies on the
>> tf values stored by tvComponent?
>>
>> I have configured the content field with the following textField type:
>>
>> <fieldType name="text_tr_general" class="solr.TextField" positionIncrementGap="100">
>>         <analyzer type="index">
>>             <tokenizer class="solr.StandardTokenizerFactory" />
>>             <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
>>             <filter class="solr.LowerCaseFilterFactory" />
>>             <filter class="solr.ShingleFilterFactory" minShingleSize="2" maxShingleSize="5"
>>                     outputUnigrams="true" outputUnigramsIfNoShingles="false" tokenSeparator=" "/>
>>         </analyzer>
>>         <analyzer type="query">
>>             <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>             <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
>>             <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true" />
>>             <filter class="solr.LowerCaseFilterFactory" />
>>             <filter class="solr.ASCIIFoldingFilterFactory"/>
>>         </analyzer>
>>     </fieldType>
>>
>> Any ideas ?
>>
>> Thanks,
>> Jerry
>>
>
>

Re: Can solr ttf functionQuery support ngram (n>2) ?

Posted by Jie Gao <j....@sheffield.ac.uk>.

A typo is fixed in the following query url.

On 10 September 2015 at 10:25, Jie Gao <j....@sheffield.ac.uk> wrote:

> Hi,
>
> I'm wondering whether solr ttf functionQuery support (compound words)
> ngram (n>2) ?
>
> I'm using "
> http://localhost:8983/solr/collection1/select?q=*:*&fl=ttf(content,%22apple%20banana%22)&rows=1"
> to query total term frequency of bigram tokens in "content" field in the
> whole index.
>
> However, the result (returned with 20) is not consistent with the result
> queried via http://localhost:8983/solr/
> <http://localhost:8983/solr/tatasteel/select?q=content:%22apple%20banana%22>
> collection1
> <http://localhost:8983/solr/collection1/select?q=*:*&fl=ttf(content,%22apple%20banana%22)&rows=1>/select?q=content:%22apple%20banana%22.
> I manually checked the actual occurrence is 15.
>
> What is the actual behaviour of the ttf function query (i'm using solr
> 5.3.0)? The reference guide does not explain the details.
>
> Does it perform full text index query on this field ? or it relies on the
> tf values stored by tvComponent?
>
> I have configured the content field with the following textField type:
>
> <fieldType name="text_tr_general" class="solr.TextField" positionIncrementGap="100">
>         <analyzer type="index">
>             <tokenizer class="solr.StandardTokenizerFactory" />
>             <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
>             <filter class="solr.LowerCaseFilterFactory" />
>             <filter class="solr.ShingleFilterFactory" minShingleSize="2" maxShingleSize="5"
>                     outputUnigrams="true" outputUnigramsIfNoShingles="false" tokenSeparator=" "/>
>         </analyzer>
>         <analyzer type="query">
>             <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>             <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
>             <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true" />
>             <filter class="solr.LowerCaseFilterFactory" />
>             <filter class="solr.ASCIIFoldingFilterFactory"/>
>         </analyzer>
>     </fieldType>
>
> Any ideas ?
>
> Thanks,
> Jerry
>