You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Sergey Kabashnyuk <ks...@gmail.com> on 2008/11/20 15:10:29 UTC

BigDecimal values

Hello

I want to ask community an advice:
what is the best way to index and search java.math.BigDecimal values in
lucene 2.4.

Any code snippets  are welcome.

Sergey Kabashnyuk
eXo Platform SAS

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: BigDecimal values

Posted by Yonik Seeley <yo...@apache.org>.
On Thu, Nov 20, 2008 at 9:30 AM, Sergey Kabashnyuk <ks...@gmail.com> wrote:
> Thanks Ian
>
> Unfortunately, I have to index any possible number of java.math.BigDecimal
> I can rephrase my question this way:
>
> How can I convert java.math.BigDecimal numbers in to string
> for its storing in lexicographical order

Some early work I did in Solr handles this for integers (it's pretty
much unused code now though).  The format was designed to support
decimals also, but I never got around to doing the code.

See BCDIntField, and BCDUtils, specifically
BCDUtils.base10toBase10kSortableInt

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: BigDecimal values

Posted by Michael Ludwig <ml...@as-guides.com>.
Michael Ludwig schrieb:
> I assume what you mean is formatting the number so that the
> lexicographical order of any possible sequence of acceptable numbers
> is the same as its numerical order.
>
> You must find a canonical representation like the scientific notation
> and then tweak it as follows:
>
> * "N" for negative and "P" for positive numbers ("N" sorts before "P")
> * fixed-width zero-padded exponent first, like "E0000000003", base 10
> * one digit with marker, like "N2"
> * fixed-width zero-padded decimals with marker, like "D008000000000"
>
> This is 2008, "PE0000000003N2D008000000000". YMMV, of course.

This notation falls short of achieving the goal to make lexicographical
order coincide with numerical order. First, negative numbers won't sort
in ascending order; -1 will come before -2. Second, negative exponents
aren't accounted for at all. Third, there are probably other problems.

Take a look at Steven Rowe's post in this thread for better
thought through ideas.

Michael Ludwig

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: BigDecimal values

Posted by Michael Ludwig <ml...@as-guides.com>.
Sergey Kabashnyuk schrieb:
> Unfortunately, I have to index any possible number of
> java.math.BigDecimal

Hi Sergey,

quite a lot of numbers are possible for BigDecimal. Somehow the range
must be bounded.

Let's first draw the line where, for a given BigDecimal bd, the result
of bd.toString(), which since 1.5 returns a "standard canonical string
form", cannot be refed to the String constructor for BigDecimal. So when
reconstruction fails, that is out of range for you.

### 9.999E2147483647 still works
9.999E+2147483647 - toString()
99.99E+2147483646 - toEngineeringString()
Rekonstruktion via toString():            works
Rekonstruktion via toEngineeringString(): works

### 10.001E2147483647 too big, does not work
1.0001E+2147483648 - toString()
100.01E+2147483646 - toEngineeringString()
Rekonstruktion via toString():            NumberFormatException
Rekonstruktion via toEngineeringString(): works

Next, unlimited precision is a problem. Do you need a precision of two
billion digits? Probably not. De facto, precision is constrained by
available memory. So you see you must rephrase your requirement in order
to accomodate real-world conditions.

> I can rephrase my question this way:
> How can I convert java.math.BigDecimal numbers in to string
> for its storing in lexicographical order

I assume what you mean is formatting the number so that the
lexicographical order of any possible sequence of acceptable numbers
is the same as its numerical order.

You must find a canonical representation like the scientific notation
and then tweak it as follows:

* "N" for negative and "P" for positive numbers ("N" sorts before "P")
* fixed-width zero-padded exponent first, like "E0000000003", base 10
* one digit with marker, like "N2"
* fixed-width zero-padded decimals with marker, like "D008000000000"

This is 2008, "PE0000000003N2D008000000000". YMMV, of course.

I hope this helps.

Michael Ludwig

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: BigDecimal values

Posted by Sergey Kabashnyuk <ks...@gmail.com>.
Thanks Ian

Unfortunately, I have to index any possible number of java.math.BigDecimal
I can rephrase my question this way:

How can I convert java.math.BigDecimal numbers in to string
for its storing in lexicographical order

Sergey Kabashnyuk
eXo Platform SAS

> Hi
>
>
> Lucene only indexes strings.  The standard advice for numeric is to
> pad to desired width with leading zeros, if likely to be used in range
> searches.  How varied are the numbers you're going to be working with?
>  I only work with stuff with 2 decimal places and tend to lose that.
> e.g.
>
> 2.22 would be indexed as 000222
> 0.99  ... 000099
>
> And of course go through the same conversion when searching.
>
>
> But if you've got variable numbers of decimal places it might get more
> interesting.
>
>
> --
> Ian.
>
>
> On Thu, Nov 20, 2008 at 2:10 PM, Sergey Kabashnyuk <ks...@gmail.com>  
> wrote:
>> Hello
>>
>> I want to ask community an advice:
>> what is the best way to index and search java.math.BigDecimal values in
>> lucene 2.4.
>>
>> Any code snippets  are welcome.
>>
>> Sergey Kabashnyuk
>> eXo Platform SAS
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>



-- 
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: BigDecimal values

Posted by Ian Lea <ia...@gmail.com>.
Hi


Lucene only indexes strings.  The standard advice for numeric is to
pad to desired width with leading zeros, if likely to be used in range
searches.  How varied are the numbers you're going to be working with?
 I only work with stuff with 2 decimal places and tend to lose that.
e.g.

2.22 would be indexed as 000222
0.99  ... 000099

And of course go through the same conversion when searching.


But if you've got variable numbers of decimal places it might get more
interesting.


--
Ian.


On Thu, Nov 20, 2008 at 2:10 PM, Sergey Kabashnyuk <ks...@gmail.com> wrote:
> Hello
>
> I want to ask community an advice:
> what is the best way to index and search java.math.BigDecimal values in
> lucene 2.4.
>
> Any code snippets  are welcome.
>
> Sergey Kabashnyuk
> eXo Platform SAS

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org