You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Mindaugas Žakšauskas <mi...@gmail.com> on 2010/06/01 13:54:22 UTC

NumericField API

Hi,

I have recently been in charge of converting code that was using
pre-3.0 API to be compatible with 3.0 API.

There was a piece of code which was storing a date field:

String date = "20091231131415"; // yyyyMMddHHmmss
new Field("creationDate", date, Field.Store.YES, Field.Index.UN_TOKENIZED);

After some documents being indexed, the following query would retrieve
all documents created in 2009:

new ConstantScoreRangeQuery("creationDate", "20090101000000",
"20100101000000", true, true);

Query results would be sortable by simply adding this sort:

boolean descending = ...;   // either true or false
new Sort(new SortField("creationDate", SortField.STRING, descending));

Unfortunately this sequence doesn't work in 3.0.
ConstantScoreRangeQuery, for example, is gone and replaced with
NumericRangeQuery. With this in mind, Field creation should now become
as follows:

long date = 20091231131415L; // same format but different type
NumericField nf = new NumericField("creationDate", Field.Store.YES, true);
nf.setLongValue(date);

And the range query now looks like as:

NumericRangeQuery.newLongRange(
   "creationDate",
   20090101000000
   20100101000000
   true,
   true
)

This does work, but the above sort isn't. Exception says: "there are
more terms than documents in field "creationDate", but it's impossible
to sort on tokenized fields".

In order to get rid of this exception, I had to change one of the following:
- SortField must be changed from SortField.STRING to SortField.LONG
- NumericField constructor must use false for its "index" (last) parameter.

This is a bit weird. So, here are my questions:

1) I thought the difference between SortField.LONG and
SortField.STRING should only be as in numeric sorting VS
lexicographical sorting, right? Why would changing to SortField.LONG
prevent the exception?
2) How does that relate to passing index=true VS index=false in
NumericField constructor? Which of the two is preferred, assuming I
need the data to be stored and indexed as well as being able to run
range queries?
3) NumericField API is marked as experimental and volatile
(http://lucene.apache.org/java/3_0_1/api/core/index.html). Is there
any other "stable" API I can rely on in Lucene 3.0? If not, what would
be possible NumericField replacement I could use now?

Thanks in advance.

Regards,
Mindaugas

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: NumericField API

Posted by Uwe Schindler <uw...@thetaphi.de>.
Hi,

> >> 3) NumericField API is marked as experimental and volatile
> >> (http://lucene.apache.org/java/3_0_1/api/core/index.html). Is there
> >> any other "stable" API I can rely on in Lucene 3.0? If not, what
> >> would be
> > possible
> >> NumericField replacement I could use now?
> >
> > "Experimental" in Lucene's API *only* means that the API (method
> > signatures,
> > classes) may change suddenly. The features are tested and working.
> 
> My point was - I totally understand that a piece of API could have been
made
> deprecated and  replaced with something else. That's the life we're
living.
> But would it not then make sense to replace it with something else which
is
> also reasonably stable (in terms of API)?
> 
> Because developers aren't left with many options now - they have to
convert
> from using one API which is unavailable to another which is likely to
change
> rather sooner than later. It's just an early observation as historically
Lucene
> has been doing an amazing job in terms of API stability.

There are two problems:
- You can go back to your old code and you don't need to move to
NumericField at all. As noted before: The *replacement* for RangeQuery and
ConstantScoreRangeQuery is TermRangeQuery. NumericRangeQuery is a new API
and is totally different. If you use it, you have to get rid of old code and
old way of use patterns - sorry :-)
- 3.0 breaks backwards, so you cannot use any legacy-APIs anymore

Uwe


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: NumericField API

Posted by Mark Miller <ma...@gmail.com>.
On 6/1/10 9:34 AM, Mindaugas Žakšauskas wrote:
> It's just an early
> observation as historically Lucene has been doing an amazing job in
> terms of API stability.

Yes it has :)

Get ready for even more change in that area though :)

-- 
- Mark

http://www.lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: NumericField API

Posted by Mindaugas Žakšauskas <mi...@gmail.com>.
Hi,

Thanks for your reply Uwe. Just a couple of notes:

>> In order to get rid of this exception, I had to change one of the
> following:
>> - SortField must be changed from SortField.STRING to SortField.LONG
>
> This does the trick and is *not* weird. You are using *numeric* fields, so
> you cannot sort as lexical *terms/strings*

Fair enough. I don't know if this is easily doable, but maybe it would
be worth changing the exception text to be a bit more readable?
Because I had to spend a fair amount of time digging into
"creationDate field has become tokenized? How could that happen for
numeric fields?" direction. As from your answer - all that matters in
this case is the sorting rather than indexing.

> The above exception may be caused by something different: Can it be that you
> have an old index that already had non-NumericField documents in it? If this
> is so, you have a mixed field contents and then behavior of range query and
> sort is wrong.

I have completely reindexed all documents so this clearly isn't the case.

>
>> 3) NumericField API is marked as experimental and volatile
>> (http://lucene.apache.org/java/3_0_1/api/core/index.html). Is there any
>> other "stable" API I can rely on in Lucene 3.0? If not, what would be
> possible
>> NumericField replacement I could use now?
>
> "Experimental" in Lucene's API *only* means that the API (method signatures,
> classes) may change suddenly. The features are tested and working.

My point was - I totally understand that a piece of API could have
been made deprecated and  replaced with something else. That's the
life we're living. But would it not then make sense to replace it with
something else which is also reasonably stable (in terms of API)?

Because developers aren't left with many options now - they have to
convert from using one API which is unavailable to another which is
likely to change rather sooner than later. It's just an early
observation as historically Lucene has been doing an amazing job in
terms of API stability.

Regards,
Mindaugas

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: NumericField API

Posted by Uwe Schindler <uw...@thetaphi.de>.
Hi,

> I have recently been in charge of converting code that was using
> pre-3.0 API to be compatible with 3.0 API.
> 
> There was a piece of code which was storing a date field:
> 
> String date = "20091231131415"; // yyyyMMddHHmmss new
> Field("creationDate", date, Field.Store.YES, Field.Index.UN_TOKENIZED);
> 
> After some documents being indexed, the following query would retrieve all
> documents created in 2009:
> 
> new ConstantScoreRangeQuery("creationDate", "20090101000000",
> "20100101000000", true, true);
> 
> Query results would be sortable by simply adding this sort:
> 
> boolean descending = ...;   // either true or false
> new Sort(new SortField("creationDate", SortField.STRING, descending));
> 
> Unfortunately this sequence doesn't work in 3.0.
> ConstantScoreRangeQuery, for example, is gone and replaced with
> NumericRangeQuery. 

If you want the old behavior (and not native numeric ranges), you can use
TermRangeQuery - then your code is exactly the same as before, only
RangeQuery/ConstantScoreRangeQuery is replaced by TermRangeQuery. But this
is ineffective as real numeric queries are optimized in Lucene 2.9 and
later. So your guess is right, you should use NumericField and
NumericRangeQuery.

> With this in mind, Field creation should now become as
> follows:
> 
> long date = 20091231131415L; // same format but different type
> NumericField nf = new NumericField("creationDate", Field.Store.YES, true);
> nf.setLongValue(date);
> 

Correct.

> And the range query now looks like as:
> 
> NumericRangeQuery.newLongRange(
>    "creationDate",
>    20090101000000
>    20100101000000
>    true,
>    true
> )

Correct.

Alternatively there is no need to use that type of numbers, you can encode
the date in any variant, simpliest is Date.time() (miliseconds since epoch).

> This does work, but the above sort isn't. Exception says: "there are more
> terms than documents in field "creationDate", but it's impossible to sort
on
> tokenized fields".
> 
> In order to get rid of this exception, I had to change one of the
following:
> - SortField must be changed from SortField.STRING to SortField.LONG

This does the trick and is *not* weird. You are using *numeric* fields, so
you cannot sort as lexical *terms/strings*

> - NumericField constructor must use false for its "index" (last)
parameter.

Thats incorrect (see below).

> This is a bit weird. So, here are my questions:
> 
> 1) I thought the difference between SortField.LONG and SortField.STRING
> should only be as in numeric sorting VS lexicographical sorting, right?
Why
> would changing to SortField.LONG prevent the exception?

It is a *numeric* field, so you *cannot* sort by lexicographical order.

> 2) How does that relate to passing index=true VS index=false in
NumericField
> constructor? Which of the two is preferred, assuming I need the data to be
> stored and indexed as well as being able to run range queries?

This is incorrect. If you want to sort, you must turn on indexing, without
that sorting is not possible.

The above exception may be caused by something different: Can it be that you
have an old index that already had non-NumericField documents in it? If this
is so, you have a mixed field contents and then behavior of range query and
sort is wrong.

> 3) NumericField API is marked as experimental and volatile
> (http://lucene.apache.org/java/3_0_1/api/core/index.html). Is there any
> other "stable" API I can rely on in Lucene 3.0? If not, what would be
possible
> NumericField replacement I could use now?

"Experimental" in Lucene's API *only* means that the API (method signatures,
classes) may change suddenly. The features are tested and working.

Uwe


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org