You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Efe Selcuk <ef...@gmail.com> on 2016/10/25 02:03:41 UTC

[Spark 2] BigDecimal and 0

I’m trying to track down what seems to be a very slight imprecision in our
Spark application; two of our columns, which should be netting out to
exactly zero, are coming up with very small fractions of non-zero value.
The only thing that I’ve found out of place is that a case class entry into
a Dataset we’ve generated with BigDecimal(“0”) will end up as 0E-18 after
it goes through Spark, and I don’t know if there’s any appreciable
difference between that and the actual 0 value, which can be generated with
BigDecimal. Here’s a contrived example:

scala> case class Data(num: BigDecimal)
defined class Data

scala> val x = Data(0)
x: Data = Data(0)

scala> x.num
res9: BigDecimal = 0

scala> val y = Seq(x, x.copy()).toDS.reduce( (a,b) => a.copy(a.num + b.num))
y: Data = Data(0E-18)

scala> y.num
res12: BigDecimal = 0E-18

scala> BigDecimal("1") - 1
res15: scala.math.BigDecimal = 0

Am I looking at anything valuable?

Efe

Re: [Spark 2] BigDecimal and 0

Posted by Efe Selcuk <ef...@gmail.com>.
I should have noted that I understand the notation of 0E-18 (exponential
form, I think) and that in a normal case it is no different than 0; I just
wanted to make sure that there wasn't something tricky going on since the
representation was seemingly changing.

Michael, that's a fair point. I keep operating under the assumption of some
guaranteed performance from BigDecimal but I realize there is probably some
math happening that's causing results that can't perfectly be represented.

Thanks guys. I'm good now.

On Mon, Oct 24, 2016 at 8:57 PM Jakob Odersky <ja...@odersky.com> wrote:

> Yes, thanks for elaborating Michael.
> The other thing that I wanted to highlight was that in this specific
> case the value is actually exactly zero (0E-18 = 0*10^(-18) = 0).
>
> On Mon, Oct 24, 2016 at 8:50 PM, Michael Matsko <ms...@gwmail.gwu.edu>
> wrote:
> > Efe,
> >
> > I think Jakob's point is that that there is no problem.  When you deal
> with
> > real numbers, you don't get exact representations of numbers.  There is
> > always some slop in representations, things don't ever cancel out
> exactly.
> > Testing reals for equality to zero will almost never work.
> >
> > Look at Goldberg's paper
> >
> https://ece.uwaterloo.ca/~dwharder/NumericalAnalysis/02Numerics/Double/paper.pdf
> > for a quick intro.
> >
> > Mike
> >
> > On Oct 24, 2016, at 10:36 PM, Efe Selcuk <ef...@gmail.com> wrote:
> >
> > Okay, so this isn't contributing to any kind of imprecision. I suppose I
> > need to go digging further then. Thanks for the quick help.
> >
> > On Mon, Oct 24, 2016 at 7:34 PM Jakob Odersky <ja...@odersky.com> wrote:
> >>
> >> What you're seeing is merely a strange representation, 0E-18 is zero.
> >> The E-18 represents the precision that Spark uses to store the decimal
> >>
> >> On Mon, Oct 24, 2016 at 7:32 PM, Jakob Odersky <ja...@odersky.com>
> wrote:
> >> > An even smaller example that demonstrates the same behaviour:
> >> >
> >> >     Seq(Data(BigDecimal(0))).toDS.head
> >> >
> >> > On Mon, Oct 24, 2016 at 7:03 PM, Efe Selcuk <ef...@gmail.com>
> wrote:
> >> >> I’m trying to track down what seems to be a very slight imprecision
> in
> >> >> our
> >> >> Spark application; two of our columns, which should be netting out to
> >> >> exactly zero, are coming up with very small fractions of non-zero
> >> >> value. The
> >> >> only thing that I’ve found out of place is that a case class entry
> into
> >> >> a
> >> >> Dataset we’ve generated with BigDecimal(“0”) will end up as 0E-18
> after
> >> >> it
> >> >> goes through Spark, and I don’t know if there’s any appreciable
> >> >> difference
> >> >> between that and the actual 0 value, which can be generated with
> >> >> BigDecimal.
> >> >> Here’s a contrived example:
> >> >>
> >> >> scala> case class Data(num: BigDecimal)
> >> >> defined class Data
> >> >>
> >> >> scala> val x = Data(0)
> >> >> x: Data = Data(0)
> >> >>
> >> >> scala> x.num
> >> >> res9: BigDecimal = 0
> >> >>
> >> >> scala> val y = Seq(x, x.copy()).toDS.reduce( (a,b) => a.copy(a.num +
> >> >> b.num))
> >> >> y: Data = Data(0E-18)
> >> >>
> >> >> scala> y.num
> >> >> res12: BigDecimal = 0E-18
> >> >>
> >> >> scala> BigDecimal("1") - 1
> >> >> res15: scala.math.BigDecimal = 0
> >> >>
> >> >> Am I looking at anything valuable?
> >> >>
> >> >> Efe
>

Re: [Spark 2] BigDecimal and 0

Posted by Jakob Odersky <ja...@odersky.com>.
Yes, thanks for elaborating Michael.
The other thing that I wanted to highlight was that in this specific
case the value is actually exactly zero (0E-18 = 0*10^(-18) = 0).

On Mon, Oct 24, 2016 at 8:50 PM, Michael Matsko <ms...@gwmail.gwu.edu> wrote:
> Efe,
>
> I think Jakob's point is that that there is no problem.  When you deal with
> real numbers, you don't get exact representations of numbers.  There is
> always some slop in representations, things don't ever cancel out exactly.
> Testing reals for equality to zero will almost never work.
>
> Look at Goldberg's paper
> https://ece.uwaterloo.ca/~dwharder/NumericalAnalysis/02Numerics/Double/paper.pdf
> for a quick intro.
>
> Mike
>
> On Oct 24, 2016, at 10:36 PM, Efe Selcuk <ef...@gmail.com> wrote:
>
> Okay, so this isn't contributing to any kind of imprecision. I suppose I
> need to go digging further then. Thanks for the quick help.
>
> On Mon, Oct 24, 2016 at 7:34 PM Jakob Odersky <ja...@odersky.com> wrote:
>>
>> What you're seeing is merely a strange representation, 0E-18 is zero.
>> The E-18 represents the precision that Spark uses to store the decimal
>>
>> On Mon, Oct 24, 2016 at 7:32 PM, Jakob Odersky <ja...@odersky.com> wrote:
>> > An even smaller example that demonstrates the same behaviour:
>> >
>> >     Seq(Data(BigDecimal(0))).toDS.head
>> >
>> > On Mon, Oct 24, 2016 at 7:03 PM, Efe Selcuk <ef...@gmail.com> wrote:
>> >> I’m trying to track down what seems to be a very slight imprecision in
>> >> our
>> >> Spark application; two of our columns, which should be netting out to
>> >> exactly zero, are coming up with very small fractions of non-zero
>> >> value. The
>> >> only thing that I’ve found out of place is that a case class entry into
>> >> a
>> >> Dataset we’ve generated with BigDecimal(“0”) will end up as 0E-18 after
>> >> it
>> >> goes through Spark, and I don’t know if there’s any appreciable
>> >> difference
>> >> between that and the actual 0 value, which can be generated with
>> >> BigDecimal.
>> >> Here’s a contrived example:
>> >>
>> >> scala> case class Data(num: BigDecimal)
>> >> defined class Data
>> >>
>> >> scala> val x = Data(0)
>> >> x: Data = Data(0)
>> >>
>> >> scala> x.num
>> >> res9: BigDecimal = 0
>> >>
>> >> scala> val y = Seq(x, x.copy()).toDS.reduce( (a,b) => a.copy(a.num +
>> >> b.num))
>> >> y: Data = Data(0E-18)
>> >>
>> >> scala> y.num
>> >> res12: BigDecimal = 0E-18
>> >>
>> >> scala> BigDecimal("1") - 1
>> >> res15: scala.math.BigDecimal = 0
>> >>
>> >> Am I looking at anything valuable?
>> >>
>> >> Efe

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: [Spark 2] BigDecimal and 0

Posted by Michael Matsko <ms...@gwmail.gwu.edu>.
Efe,

I think Jakob's point is that that there is no problem.  When you deal with real numbers, you don't get exact representations of numbers.  There is always some slop in representations, things don't ever cancel out exactly.  Testing reals for equality to zero will almost never work.  

Look at Goldberg's paper https://ece.uwaterloo.ca/~dwharder/NumericalAnalysis/02Numerics/Double/paper.pdf for a quick intro.

Mike

> On Oct 24, 2016, at 10:36 PM, Efe Selcuk <ef...@gmail.com> wrote:
> 
> Okay, so this isn't contributing to any kind of imprecision. I suppose I need to go digging further then. Thanks for the quick help.
> 
>> On Mon, Oct 24, 2016 at 7:34 PM Jakob Odersky <ja...@odersky.com> wrote:
>> What you're seeing is merely a strange representation, 0E-18 is zero.
>> The E-18 represents the precision that Spark uses to store the decimal
>> 
>> On Mon, Oct 24, 2016 at 7:32 PM, Jakob Odersky <ja...@odersky.com> wrote:
>> > An even smaller example that demonstrates the same behaviour:
>> >
>> >     Seq(Data(BigDecimal(0))).toDS.head
>> >
>> > On Mon, Oct 24, 2016 at 7:03 PM, Efe Selcuk <ef...@gmail.com> wrote:
>> >> I’m trying to track down what seems to be a very slight imprecision in our
>> >> Spark application; two of our columns, which should be netting out to
>> >> exactly zero, are coming up with very small fractions of non-zero value. The
>> >> only thing that I’ve found out of place is that a case class entry into a
>> >> Dataset we’ve generated with BigDecimal(“0”) will end up as 0E-18 after it
>> >> goes through Spark, and I don’t know if there’s any appreciable difference
>> >> between that and the actual 0 value, which can be generated with BigDecimal.
>> >> Here’s a contrived example:
>> >>
>> >> scala> case class Data(num: BigDecimal)
>> >> defined class Data
>> >>
>> >> scala> val x = Data(0)
>> >> x: Data = Data(0)
>> >>
>> >> scala> x.num
>> >> res9: BigDecimal = 0
>> >>
>> >> scala> val y = Seq(x, x.copy()).toDS.reduce( (a,b) => a.copy(a.num + b.num))
>> >> y: Data = Data(0E-18)
>> >>
>> >> scala> y.num
>> >> res12: BigDecimal = 0E-18
>> >>
>> >> scala> BigDecimal("1") - 1
>> >> res15: scala.math.BigDecimal = 0
>> >>
>> >> Am I looking at anything valuable?
>> >>
>> >> Efe

Re: [Spark 2] BigDecimal and 0

Posted by Efe Selcuk <ef...@gmail.com>.
Okay, so this isn't contributing to any kind of imprecision. I suppose I
need to go digging further then. Thanks for the quick help.

On Mon, Oct 24, 2016 at 7:34 PM Jakob Odersky <ja...@odersky.com> wrote:

> What you're seeing is merely a strange representation, 0E-18 is zero.
> The E-18 represents the precision that Spark uses to store the decimal
>
> On Mon, Oct 24, 2016 at 7:32 PM, Jakob Odersky <ja...@odersky.com> wrote:
> > An even smaller example that demonstrates the same behaviour:
> >
> >     Seq(Data(BigDecimal(0))).toDS.head
> >
> > On Mon, Oct 24, 2016 at 7:03 PM, Efe Selcuk <ef...@gmail.com> wrote:
> >> I’m trying to track down what seems to be a very slight imprecision in
> our
> >> Spark application; two of our columns, which should be netting out to
> >> exactly zero, are coming up with very small fractions of non-zero
> value. The
> >> only thing that I’ve found out of place is that a case class entry into
> a
> >> Dataset we’ve generated with BigDecimal(“0”) will end up as 0E-18 after
> it
> >> goes through Spark, and I don’t know if there’s any appreciable
> difference
> >> between that and the actual 0 value, which can be generated with
> BigDecimal.
> >> Here’s a contrived example:
> >>
> >> scala> case class Data(num: BigDecimal)
> >> defined class Data
> >>
> >> scala> val x = Data(0)
> >> x: Data = Data(0)
> >>
> >> scala> x.num
> >> res9: BigDecimal = 0
> >>
> >> scala> val y = Seq(x, x.copy()).toDS.reduce( (a,b) => a.copy(a.num +
> b.num))
> >> y: Data = Data(0E-18)
> >>
> >> scala> y.num
> >> res12: BigDecimal = 0E-18
> >>
> >> scala> BigDecimal("1") - 1
> >> res15: scala.math.BigDecimal = 0
> >>
> >> Am I looking at anything valuable?
> >>
> >> Efe
>

Re: [Spark 2] BigDecimal and 0

Posted by Jakob Odersky <ja...@odersky.com>.
What you're seeing is merely a strange representation, 0E-18 is zero.
The E-18 represents the precision that Spark uses to store the decimal

On Mon, Oct 24, 2016 at 7:32 PM, Jakob Odersky <ja...@odersky.com> wrote:
> An even smaller example that demonstrates the same behaviour:
>
>     Seq(Data(BigDecimal(0))).toDS.head
>
> On Mon, Oct 24, 2016 at 7:03 PM, Efe Selcuk <ef...@gmail.com> wrote:
>> I’m trying to track down what seems to be a very slight imprecision in our
>> Spark application; two of our columns, which should be netting out to
>> exactly zero, are coming up with very small fractions of non-zero value. The
>> only thing that I’ve found out of place is that a case class entry into a
>> Dataset we’ve generated with BigDecimal(“0”) will end up as 0E-18 after it
>> goes through Spark, and I don’t know if there’s any appreciable difference
>> between that and the actual 0 value, which can be generated with BigDecimal.
>> Here’s a contrived example:
>>
>> scala> case class Data(num: BigDecimal)
>> defined class Data
>>
>> scala> val x = Data(0)
>> x: Data = Data(0)
>>
>> scala> x.num
>> res9: BigDecimal = 0
>>
>> scala> val y = Seq(x, x.copy()).toDS.reduce( (a,b) => a.copy(a.num + b.num))
>> y: Data = Data(0E-18)
>>
>> scala> y.num
>> res12: BigDecimal = 0E-18
>>
>> scala> BigDecimal("1") - 1
>> res15: scala.math.BigDecimal = 0
>>
>> Am I looking at anything valuable?
>>
>> Efe

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: [Spark 2] BigDecimal and 0

Posted by Jakob Odersky <ja...@odersky.com>.
An even smaller example that demonstrates the same behaviour:

    Seq(Data(BigDecimal(0))).toDS.head

On Mon, Oct 24, 2016 at 7:03 PM, Efe Selcuk <ef...@gmail.com> wrote:
> I’m trying to track down what seems to be a very slight imprecision in our
> Spark application; two of our columns, which should be netting out to
> exactly zero, are coming up with very small fractions of non-zero value. The
> only thing that I’ve found out of place is that a case class entry into a
> Dataset we’ve generated with BigDecimal(“0”) will end up as 0E-18 after it
> goes through Spark, and I don’t know if there’s any appreciable difference
> between that and the actual 0 value, which can be generated with BigDecimal.
> Here’s a contrived example:
>
> scala> case class Data(num: BigDecimal)
> defined class Data
>
> scala> val x = Data(0)
> x: Data = Data(0)
>
> scala> x.num
> res9: BigDecimal = 0
>
> scala> val y = Seq(x, x.copy()).toDS.reduce( (a,b) => a.copy(a.num + b.num))
> y: Data = Data(0E-18)
>
> scala> y.num
> res12: BigDecimal = 0E-18
>
> scala> BigDecimal("1") - 1
> res15: scala.math.BigDecimal = 0
>
> Am I looking at anything valuable?
>
> Efe

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org