You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by "Francisco A. Lozano" <fl...@gmail.com> on 2012/04/27 12:17:58 UTC

Storing same field twice (analyzed+not-analyzed), sorting

Hi,

I'm storing a field two times, one analyzed and other non-analyzed, in
order to be able to query for terms and for exact keyword:

			// Analyzed version
			d.add(new Field(key, value, Store.NO, Index.ANALYZED,
					TermVector.YES));
			// Not-analyzed version
			d.add(new Field(key, value, Store.NO, Index.NOT_ANALYZED));

My first question is if this is supposed to cause problems somehow or
if it's OK.

The problem is that I'm getting strange results when sorting, most of
the documents seem correctly sorted but some of them appear at the
end. Am I doing something wrong?

Francisco A. Lozano

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Storing same field twice (analyzed+not-analyzed), sorting

Posted by "Francisco A. Lozano" <fl...@gmail.com>.

I cannot do that, I need to query for specific fields, both for the
whole value in a term (keyword) and for fuzzy/phrase...

For the sorting I will probably take Erick Ericksson's suggestion -
use a separate non-analyzed field for sorting. Makes sense.

The other problem (querying both by whole keyword in a term and by
fuzzy/phrase) ... I guess it would be solvable if I could use a
StandardAnalyzer which also generated the whole input as a token, in
addition to the tokens it already generates, but I haven't managed to
make one. Any suggestion in this regard?


Francisco A. Lozano


On Fri, Apr 27, 2012 at 14:12, Vinaya Kumar Thimmappa
<vt...@ariba.com> wrote:
> Why don't you store keywords related data in keywords field which can be analyzed and other field in as it is now.
> So all fields for which keywords is needed, move it to keywords section
>
> -v
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerickson@gmail.com]
> Sent: Friday, April 27, 2012 5:38 PM
> To: java-user@lucene.apache.org
> Subject: Re: Storing same field twice (analyzed+not-analyzed), sorting
>
> Hmmm, putting analyzed and unanalyzed values in
> the same field seems like it'd be difficult to get right. In
> the Solr world, two separate fields are usually used.
>
>
> Sorting is right out, the results are unpredictable. What does
> it mean to sort on a field with multiple tokens? For a doc
> with "aardvark" and "zebra", where should it fall in the
> result list?
>
> If you're sorting, it's best to use a single value per doc.
>
> Best
> Erick
>
> On Fri, Apr 27, 2012 at 6:17 AM, Francisco A. Lozano <fl...@gmail.com> wrote:
>> Hi,
>>
>> I'm storing a field two times, one analyzed and other non-analyzed, in
>> order to be able to query for terms and for exact keyword:
>>
>>                        // Analyzed version
>>                        d.add(new Field(key, value, Store.NO, Index.ANALYZED,
>>                                        TermVector.YES));
>>                        // Not-analyzed version
>>                        d.add(new Field(key, value, Store.NO, Index.NOT_ANALYZED));
>>
>> My first question is if this is supposed to cause problems somehow or
>> if it's OK.
>>
>> The problem is that I'm getting strange results when sorting, most of
>> the documents seem correctly sorted but some of them appear at the
>> end. Am I doing something wrong?
>>
>> Francisco A. Lozano
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: Storing same field twice (analyzed+not-analyzed), sorting

Posted by Vinaya Kumar Thimmappa <vt...@ariba.com>.

Why don't you store keywords related data in keywords field which can be analyzed and other field in as it is now.
So all fields for which keywords is needed, move it to keywords section

-v

-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com] 
Sent: Friday, April 27, 2012 5:38 PM
To: java-user@lucene.apache.org
Subject: Re: Storing same field twice (analyzed+not-analyzed), sorting

Hmmm, putting analyzed and unanalyzed values in
the same field seems like it'd be difficult to get right. In
the Solr world, two separate fields are usually used.


Sorting is right out, the results are unpredictable. What does
it mean to sort on a field with multiple tokens? For a doc
with "aardvark" and "zebra", where should it fall in the
result list?

If you're sorting, it's best to use a single value per doc.

Best
Erick

On Fri, Apr 27, 2012 at 6:17 AM, Francisco A. Lozano <fl...@gmail.com> wrote:
> Hi,
>
> I'm storing a field two times, one analyzed and other non-analyzed, in
> order to be able to query for terms and for exact keyword:
>
>                        // Analyzed version
>                        d.add(new Field(key, value, Store.NO, Index.ANALYZED,
>                                        TermVector.YES));
>                        // Not-analyzed version
>                        d.add(new Field(key, value, Store.NO, Index.NOT_ANALYZED));
>
> My first question is if this is supposed to cause problems somehow or
> if it's OK.
>
> The problem is that I'm getting strange results when sorting, most of
> the documents seem correctly sorted but some of them appear at the
> end. Am I doing something wrong?
>
> Francisco A. Lozano
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Storing same field twice (analyzed+not-analyzed), sorting

Posted by Erick Erickson <er...@gmail.com>.

Hmmm, putting analyzed and unanalyzed values in
the same field seems like it'd be difficult to get right. In
the Solr world, two separate fields are usually used.


Sorting is right out, the results are unpredictable. What does
it mean to sort on a field with multiple tokens? For a doc
with "aardvark" and "zebra", where should it fall in the
result list?

If you're sorting, it's best to use a single value per doc.

Best
Erick

On Fri, Apr 27, 2012 at 6:17 AM, Francisco A. Lozano <fl...@gmail.com> wrote:
> Hi,
>
> I'm storing a field two times, one analyzed and other non-analyzed, in
> order to be able to query for terms and for exact keyword:
>
>                        // Analyzed version
>                        d.add(new Field(key, value, Store.NO, Index.ANALYZED,
>                                        TermVector.YES));
>                        // Not-analyzed version
>                        d.add(new Field(key, value, Store.NO, Index.NOT_ANALYZED));
>
> My first question is if this is supposed to cause problems somehow or
> if it's OK.
>
> The problem is that I'm getting strange results when sorting, most of
> the documents seem correctly sorted but some of them appear at the
> end. Am I doing something wrong?
>
> Francisco A. Lozano
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org