You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Kumaran Ramasubramanian <ku...@gmail.com> on 2016/11/18 11:00:58 UTC

Re: indexing analyzed and not_analyzed values in same field

​Hi All,

​   Can anyone say,  is it advisable to have index with both analyzed and
not_analyzed values in one field?

​Use case: i have custom fields in my product which can be configured
differently ( ANALYZED and NOT_ANALYZED ) in different modules

--
Kumaran R





On Wed, Oct 26, 2016 at 12:07 AM, Kumaran Ramasubramanian <
kums.134@gmail.com> wrote:

>
>
> Hi All,
>
> i have indexed 4 documents in an index where BANKNAME field is analyzed
> in two documents and it is not_analyzed in another two documents. i have
> mentioned search cases below where i am able to search using both analyzed
> ( using classic analyzer ) and not_analyzed ( using keyword analyzer )
> terms. But, is it right to have index with both analyzed and not_analyzed
> values in a field?
>
>
>
>
> output:
>
>
> BANKNAME field of these two documents is analyzed
>
> using classic analyzer
>  query : BANKNAME:"swiss bank"
> total hits:2
>
> DocId:0  DocScore:1.6096026
> [stored,indexed,tokenized<BANKNAME:swiss  bank>,
> stored,indexed,tokenized<PLACENAME:swissland>,
> stored,indexed,tokenized,omitNorms,indexOptions=DOCS_ONLY<company:goog>]
>
> DocId:2  DocScore:1.6096026
> [stored,indexed,tokenized<BANKNAME:swiss  bank>,
> stored,indexed,tokenized<PLACENAME:swissland>,
> stored,indexed,tokenized,omitNorms,indexOptions=DOCS_ONLY<company:goog>]
>
>
>
>
>
> BANKNAME field of these two documents is not analyzed
>
> using keyword analyzer
> rrsk query : BANKNAME:swiss bank
> total hits:2
>
> DocId:1  DocScore:1.287682
> [stored,indexed,tokenized<BANKNAME:swiss bank>,
> stored,indexed,tokenized<PLACENAME:swiss>, stored,indexed,tokenized,omitN
> orms,indexOptions=DOCS_ONLY<company:goog>]
>
> DocId:3  DocScore:1.287682
> [stored,indexed,tokenized<BANKNAME:swiss bank>,
> stored,indexed,tokenized<PLACENAME:swiss>, stored,indexed,tokenized,omitN
> orms,indexOptions=DOCS_ONLY<company:goog>]
>
>
>
>
>
>
> --
> Kumaran R
>
>
>
>
>
>
>
>
>
>
>

Re: indexing analyzed and not_analyzed values in same field

Posted by Michael McCandless <lu...@mikemccandless.com>.
So when a query arrives, you know the query is only allowed to match
either module:1 (analyzed terms) or module:2 (not analyzed) but never
both?  If so, you should be fine.

Though relevance will be sort of wonky, in case that matters, because
you are polluting the unique term space; you would get different
relevance results if you used two separate fields instead.

Mike McCandless

http://blog.mikemccandless.com

On Fri, Nov 18, 2016 at 8:50 AM, Kumaran Ramasubramanian
<ku...@gmail.com> wrote:
>
>
> Yes. But am going to provide search for my single module alone, so query
> will match only one type of document in any case.
>
> Here is how i use, append module:1 or module:2 in all queries i make. so
> documents that matched module:1 will have only analyzed terms and module:2
> will have only not_analyzed terms.
>
>
>
>
> --
> Kumaran R
>
>
>
> On Fri, Nov 18, 2016 at 7:04 PM, Michael McCandless
> <lu...@mikemccandless.com> wrote:
>>
>> You can do this, Lucene will let you, but  it's typically a bad idea
>> for search relevance because some documents will return only if you
>> search for precisely the same whole token, others if you search for an
>> analyzed token, giving the user a broken experience.
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Fri, Nov 18, 2016 at 6:00 AM, Kumaran Ramasubramanian
>> <ku...@gmail.com> wrote:
>> > Hi All,
>> >
>> >    Can anyone say,  is it advisable to have index with both analyzed and
>> > not_analyzed values in one field?
>> >
>> > Use case: i have custom fields in my product which can be configured
>> > differently ( ANALYZED and NOT_ANALYZED ) in different modules
>> >
>> > --
>> > Kumaran R
>> >
>> >
>> >
>> >
>> >
>> > On Wed, Oct 26, 2016 at 12:07 AM, Kumaran Ramasubramanian <
>> > kums.134@gmail.com> wrote:
>> >
>> >>
>> >>
>> >> Hi All,
>> >>
>> >> i have indexed 4 documents in an index where BANKNAME field is analyzed
>> >> in two documents and it is not_analyzed in another two documents. i
>> >> have
>> >> mentioned search cases below where i am able to search using both
>> >> analyzed
>> >> ( using classic analyzer ) and not_analyzed ( using keyword analyzer )
>> >> terms. But, is it right to have index with both analyzed and
>> >> not_analyzed
>> >> values in a field?
>> >>
>> >>
>> >>
>> >>
>> >> output:
>> >>
>> >>
>> >> BANKNAME field of these two documents is analyzed
>> >>
>> >> using classic analyzer
>> >>  query : BANKNAME:"swiss bank"
>> >> total hits:2
>> >>
>> >> DocId:0  DocScore:1.6096026
>> >> [stored,indexed,tokenized<BANKNAME:swiss  bank>,
>> >> stored,indexed,tokenized<PLACENAME:swissland>,
>> >>
>> >> stored,indexed,tokenized,omitNorms,indexOptions=DOCS_ONLY<company:goog>]
>> >>
>> >> DocId:2  DocScore:1.6096026
>> >> [stored,indexed,tokenized<BANKNAME:swiss  bank>,
>> >> stored,indexed,tokenized<PLACENAME:swissland>,
>> >>
>> >> stored,indexed,tokenized,omitNorms,indexOptions=DOCS_ONLY<company:goog>]
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> BANKNAME field of these two documents is not analyzed
>> >>
>> >> using keyword analyzer
>> >> rrsk query : BANKNAME:swiss bank
>> >> total hits:2
>> >>
>> >> DocId:1  DocScore:1.287682
>> >> [stored,indexed,tokenized<BANKNAME:swiss bank>,
>> >> stored,indexed,tokenized<PLACENAME:swiss>,
>> >> stored,indexed,tokenized,omitN
>> >> orms,indexOptions=DOCS_ONLY<company:goog>]
>> >>
>> >> DocId:3  DocScore:1.287682
>> >> [stored,indexed,tokenized<BANKNAME:swiss bank>,
>> >> stored,indexed,tokenized<PLACENAME:swiss>,
>> >> stored,indexed,tokenized,omitN
>> >> orms,indexOptions=DOCS_ONLY<company:goog>]
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >> Kumaran R
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: indexing analyzed and not_analyzed values in same field

Posted by Michael McCandless <lu...@mikemccandless.com>.
You can do this, Lucene will let you, but  it's typically a bad idea
for search relevance because some documents will return only if you
search for precisely the same whole token, others if you search for an
analyzed token, giving the user a broken experience.

Mike McCandless

http://blog.mikemccandless.com


On Fri, Nov 18, 2016 at 6:00 AM, Kumaran Ramasubramanian
<ku...@gmail.com> wrote:
> Hi All,
>
>    Can anyone say,  is it advisable to have index with both analyzed and
> not_analyzed values in one field?
>
> Use case: i have custom fields in my product which can be configured
> differently ( ANALYZED and NOT_ANALYZED ) in different modules
>
> --
> Kumaran R
>
>
>
>
>
> On Wed, Oct 26, 2016 at 12:07 AM, Kumaran Ramasubramanian <
> kums.134@gmail.com> wrote:
>
>>
>>
>> Hi All,
>>
>> i have indexed 4 documents in an index where BANKNAME field is analyzed
>> in two documents and it is not_analyzed in another two documents. i have
>> mentioned search cases below where i am able to search using both analyzed
>> ( using classic analyzer ) and not_analyzed ( using keyword analyzer )
>> terms. But, is it right to have index with both analyzed and not_analyzed
>> values in a field?
>>
>>
>>
>>
>> output:
>>
>>
>> BANKNAME field of these two documents is analyzed
>>
>> using classic analyzer
>>  query : BANKNAME:"swiss bank"
>> total hits:2
>>
>> DocId:0  DocScore:1.6096026
>> [stored,indexed,tokenized<BANKNAME:swiss  bank>,
>> stored,indexed,tokenized<PLACENAME:swissland>,
>> stored,indexed,tokenized,omitNorms,indexOptions=DOCS_ONLY<company:goog>]
>>
>> DocId:2  DocScore:1.6096026
>> [stored,indexed,tokenized<BANKNAME:swiss  bank>,
>> stored,indexed,tokenized<PLACENAME:swissland>,
>> stored,indexed,tokenized,omitNorms,indexOptions=DOCS_ONLY<company:goog>]
>>
>>
>>
>>
>>
>> BANKNAME field of these two documents is not analyzed
>>
>> using keyword analyzer
>> rrsk query : BANKNAME:swiss bank
>> total hits:2
>>
>> DocId:1  DocScore:1.287682
>> [stored,indexed,tokenized<BANKNAME:swiss bank>,
>> stored,indexed,tokenized<PLACENAME:swiss>, stored,indexed,tokenized,omitN
>> orms,indexOptions=DOCS_ONLY<company:goog>]
>>
>> DocId:3  DocScore:1.287682
>> [stored,indexed,tokenized<BANKNAME:swiss bank>,
>> stored,indexed,tokenized<PLACENAME:swiss>, stored,indexed,tokenized,omitN
>> orms,indexOptions=DOCS_ONLY<company:goog>]
>>
>>
>>
>>
>>
>>
>> --
>> Kumaran R
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org