You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by MC <vi...@gmail.com> on 2013/10/15 15:14:13 UTC

field "title_ngram" was indexed without position data; cannot run PhraseQuery

Hello,

Could someone explain (or perhaps provide a documentation link) what 
does the following error mean:
"field "title_ngram" was indexed without position data; cannot run 
PhraseQuery"

I'll do some more searching online, I was just wondering if anyone has 
encountered this error before, and what the possible solution might be. 
I've recently upgraded my version of solr from 3.6.0 to 4.5.0, I'm not 
sure if this has any bearing or not.
Thanks,

M

Re: field "title_ngram" was indexed without position data; cannot run PhraseQuery

Posted by MC <vi...@gmail.com>.

Hello,
Thank you all for your help. There was indeed a property which was not 
set right in schema.xml:
omitTermFreqAndPositions="true"
After changing it to false phrase lookup started working OK.
Thanks,

M


On 10/15/13 12:01 PM, Jack Krupansky wrote:
> Show us the field and field type from your schema.
>
> Likely you are "omitting" position info for the field, and the field 
> type has "autoGeneratePhraseQueries="true"" - the ngram analyzer 
> generates a sequence of terms for a single source term and then the 
> query parser generates a PhraseQuery for that sequence, but that 
> requires position info in the index but you have omitted them. That's 
> one theory.
>
> So, if that theory is correct, either retain position info by getting 
> rid of the "omit", or remove the autoGeneratePhraseQueries.
>
> -- Jack Krupansky
>
> -----Original Message----- From: Jason Hellman
> Sent: Tuesday, October 15, 2013 11:19 AM
> To: solr-user@lucene.apache.org
> Subject: Re: field "title_ngram" was indexed without position data; 
> cannot run PhraseQuery
>
> If you consider what n-grams do this should make sense to you. 
> Consider the following piece of data:
>
> White iPod
>
> If the field is fed through a bigram filter (n-gram with size of 2) 
> the resulting token stream would appear as such:
>
> wh hi it te
> ip po od
>
> The usual use of n-grams is to match those partial tokens, essentially 
> giving you a great deal of power in creating non-wildcard partial 
> matches. How you use this is up to your imagination, but one easy use 
> is in partial matches in autosuggest features.
>
> I can't speak for the intent behind the way it's coded, but it makes a 
> great deal of sense to me that positional data would be seen as 
> unnecessary since the intent of n-grams typically doesn't collide with 
> phrase searches.  If you need both behaviors it's far better to use 
> copyField and have one field dedicated to standard tokenization and 
> token filters, and another field for n-grams.
>
> I hope that's useful to you.
>
> On Oct 15, 2013, at 6:14 AM, MC <vi...@gmail.com> wrote:
>
>> Hello,
>>
>> Could someone explain (or perhaps provide a documentation link) what 
>> does the following error mean:
>> "field "title_ngram" was indexed without position data; cannot run 
>> PhraseQuery"
>>
>> I'll do some more searching online, I was just wondering if anyone 
>> has encountered this error before, and what the possible solution 
>> might be. I've recently upgraded my version of solr from 3.6.0 to 
>> 4.5.0, I'm not sure if this has any bearing or not.
>> Thanks,
>>
>> M
>>
>

Re: field "title_ngram" was indexed without position data; cannot run PhraseQuery

Posted by Jack Krupansky <ja...@basetechnology.com>.

Show us the field and field type from your schema.

Likely you are "omitting" position info for the field, and the field type 
has "autoGeneratePhraseQueries="true"" - the ngram analyzer generates a 
sequence of terms for a single source term and then the query parser 
generates a PhraseQuery for that sequence, but that requires position info 
in the index but you have omitted them. That's one theory.

So, if that theory is correct, either retain position info by getting rid of 
the "omit", or remove the autoGeneratePhraseQueries.

-- Jack Krupansky

-----Original Message----- 
From: Jason Hellman
Sent: Tuesday, October 15, 2013 11:19 AM
To: solr-user@lucene.apache.org
Subject: Re: field "title_ngram" was indexed without position data; cannot 
run PhraseQuery

If you consider what n-grams do this should make sense to you.  Consider the 
following piece of data:

White iPod

If the field is fed through a bigram filter (n-gram with size of 2) the 
resulting token stream would appear as such:

wh hi it te
ip po od

The usual use of n-grams is to match those partial tokens, essentially 
giving you a great deal of power in creating non-wildcard partial matches. 
How you use this is up to your imagination, but one easy use is in partial 
matches in autosuggest features.

I can't speak for the intent behind the way it's coded, but it makes a great 
deal of sense to me that positional data would be seen as unnecessary since 
the intent of n-grams typically doesn't collide with phrase searches.  If 
you need both behaviors it's far better to use copyField and have one field 
dedicated to standard tokenization and token filters, and another field for 
n-grams.

I hope that's useful to you.

On Oct 15, 2013, at 6:14 AM, MC <vi...@gmail.com> wrote:

> Hello,
>
> Could someone explain (or perhaps provide a documentation link) what does 
> the following error mean:
> "field "title_ngram" was indexed without position data; cannot run 
> PhraseQuery"
>
> I'll do some more searching online, I was just wondering if anyone has 
> encountered this error before, and what the possible solution might be. 
> I've recently upgraded my version of solr from 3.6.0 to 4.5.0, I'm not 
> sure if this has any bearing or not.
> Thanks,
>
> M
>

Re: field "title_ngram" was indexed without position data; cannot run PhraseQuery

Posted by Jason Hellman <jh...@innoventsolutions.com>.

If you consider what n-grams do this should make sense to you.  Consider the following piece of data:

White iPod

If the field is fed through a bigram filter (n-gram with size of 2) the resulting token stream would appear as such:

wh hi it te
ip po od

The usual use of n-grams is to match those partial tokens, essentially giving you a great deal of power in creating non-wildcard partial matches.  How you use this is up to your imagination, but one easy use is in partial matches in autosuggest features.

I can't speak for the intent behind the way it's coded, but it makes a great deal of sense to me that positional data would be seen as unnecessary since the intent of n-grams typically doesn't collide with phrase searches.  If you need both behaviors it's far better to use copyField and have one field dedicated to standard tokenization and token filters, and another field for n-grams.  

I hope that's useful to you.

On Oct 15, 2013, at 6:14 AM, MC <vi...@gmail.com> wrote:

> Hello,
> 
> Could someone explain (or perhaps provide a documentation link) what does the following error mean:
> "field "title_ngram" was indexed without position data; cannot run PhraseQuery"
> 
> I'll do some more searching online, I was just wondering if anyone has encountered this error before, and what the possible solution might be. I've recently upgraded my version of solr from 3.6.0 to 4.5.0, I'm not sure if this has any bearing or not.
> Thanks,
> 
> M
>