You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Akhil Das <ak...@sigmoidanalytics.com> on 2015/12/03 07:32:25 UTC

Re: Multiplication on decimals in a dataframe query

Not quiet sure whats happening, but its not an issue with multiplication i
guess as the following query worked for me:

trades.select(trades("price")*9.5).show
+-------------+
|(price * 9.5)|
+-------------+
|        199.5|
|        228.0|
|        190.0|
|        199.5|
|        190.0|
|        256.5|
|        218.5|
|        275.5|
|        218.5|
......
......


Could it be with the precision? ccing dev list, may be you can open up a
jira for this as it seems to be a bug.

Thanks
Best Regards

On Mon, Nov 30, 2015 at 12:41 AM, Philip Dodds <ph...@gmail.com>
wrote:

> I hit a weird issue when I tried to multiply to decimals in a select
> (either in scala or as SQL), and Im assuming I must be missing the point.
>
> The issue is fairly easy to recreate with something like the following:
>
>
> val sqlContext = new org.apache.spark.sql.SQLContext(sc)
> import sqlContext.implicits._
> import org.apache.spark.sql.types.Decimal
>
> case class Trade(quantity: Decimal,price: Decimal)
>
> val data = Seq.fill(100) {
>   val price = Decimal(20+scala.util.Random.nextInt(10))
>     val quantity = Decimal(20+scala.util.Random.nextInt(10))
>
>   Trade(quantity, price)
> }
>
> val trades = sc.parallelize(data).toDF()
> trades.registerTempTable("trades")
>
> trades.select(trades("price")*trades("quantity")).show
>
> sqlContext.sql("select
> price/quantity,price*quantity,price+quantity,price-quantity from
> trades").show
>
> The odd part is if you run it you will see that the addition/division and
> subtraction works but the multiplication returns a null.
>
> Tested on 1.5.1/1.5.2 (Scala 2.10 and 2.11)
>
> ie.
>
> +------------------+
>
> |(price * quantity)|
>
> +------------------+
>
> |              null|
>
> |              null|
>
> |              null|
>
> |              null|
>
> |              null|
>
> +------------------+
>
>
> +--------------------+----+--------------------+--------------------+
>
> |                 _c0| _c1|                 _c2|                 _c3|
>
> +--------------------+----+--------------------+--------------------+
>
> |0.952380952380952381|null|41.00000000000000...|-1.00000000000000...|
>
> |1.380952380952380952|null|50.00000000000000...|8.000000000000000000|
>
> |1.272727272727272727|null|50.00000000000000...|6.000000000000000000|
>
> |0.833333333333333333|null|44.00000000000000...|-4.00000000000000...|
>
> |1.000000000000000000|null|58.00000000000000...|               0E-18|
>
> +--------------------+----+--------------------+--------------------+
>
>
> Just keen to know what I did wrong?
>
>
> Cheers
>
> P
>
> --
> Philip Dodds
>
>
>

Re: Multiplication on decimals in a dataframe query

Posted by Philip Dodds <ph...@gmail.com>.
Did a little more digging and it appears it was just the way I constructed
the Decimal

It works if you do

val data = Seq.fill(5) {
     Trade(Decimal(BigDecimal(5),38,20), Decimal(BigDecimal(5),38,20))
   }


On Thu, Dec 3, 2015 at 8:58 AM, Philip Dodds <ph...@gmail.com> wrote:

> Opened https://issues.apache.org/jira/browse/SPARK-12128
>
> Thanks
>
> P
>
> On Thu, Dec 3, 2015 at 8:51 AM, Philip Dodds <ph...@gmail.com>
> wrote:
>
>> I'll open up a JIRA for it,  it appears to work when you use a literal
>> number but not when it is coming from the same dataframe
>>
>> Thanks!
>>
>> P
>>
>> On Thu, Dec 3, 2015 at 1:52 AM, Sahil Sareen <sa...@gmail.com> wrote:
>>
>>> +1 looks like a bug
>>>
>>> I think referencing trades() twice in multiplication is broken,
>>>
>>> scala> trades.select(trades("quantity")*trades("quantity")).show
>>>
>>> +---------------------+
>>> |(quantity * quantity)|
>>> +---------------------+
>>> |                 null|
>>> |                 null|
>>>
>>> scala> sqlContext.sql("select price*price as PP from trades").show
>>>
>>> +----+
>>> |  PP|
>>> +----+
>>> |null|
>>> |null|
>>>
>>>
>>> -Sahil
>>>
>>> On Thu, Dec 3, 2015 at 12:02 PM, Akhil Das <ak...@sigmoidanalytics.com>
>>> wrote:
>>>
>>>> Not quiet sure whats happening, but its not an issue with
>>>> multiplication i guess as the following query worked for me:
>>>>
>>>> trades.select(trades("price")*9.5).show
>>>> +-------------+
>>>> |(price * 9.5)|
>>>> +-------------+
>>>> |        199.5|
>>>> |        228.0|
>>>> |        190.0|
>>>> |        199.5|
>>>> |        190.0|
>>>> |        256.5|
>>>> |        218.5|
>>>> |        275.5|
>>>> |        218.5|
>>>> ......
>>>> ......
>>>>
>>>>
>>>> Could it be with the precision? ccing dev list, may be you can open up
>>>> a jira for this as it seems to be a bug.
>>>>
>>>> Thanks
>>>> Best Regards
>>>>
>>>> On Mon, Nov 30, 2015 at 12:41 AM, Philip Dodds <ph...@gmail.com>
>>>> wrote:
>>>>
>>>>> I hit a weird issue when I tried to multiply to decimals in a select
>>>>> (either in scala or as SQL), and Im assuming I must be missing the point.
>>>>>
>>>>> The issue is fairly easy to recreate with something like the following:
>>>>>
>>>>>
>>>>> val sqlContext = new org.apache.spark.sql.SQLContext(sc)
>>>>> import sqlContext.implicits._
>>>>> import org.apache.spark.sql.types.Decimal
>>>>>
>>>>> case class Trade(quantity: Decimal,price: Decimal)
>>>>>
>>>>> val data = Seq.fill(100) {
>>>>>   val price = Decimal(20+scala.util.Random.nextInt(10))
>>>>>     val quantity = Decimal(20+scala.util.Random.nextInt(10))
>>>>>
>>>>>   Trade(quantity, price)
>>>>> }
>>>>>
>>>>> val trades = sc.parallelize(data).toDF()
>>>>> trades.registerTempTable("trades")
>>>>>
>>>>> trades.select(trades("price")*trades("quantity")).show
>>>>>
>>>>> sqlContext.sql("select
>>>>> price/quantity,price*quantity,price+quantity,price-quantity from
>>>>> trades").show
>>>>>
>>>>> The odd part is if you run it you will see that the addition/division
>>>>> and subtraction works but the multiplication returns a null.
>>>>>
>>>>> Tested on 1.5.1/1.5.2 (Scala 2.10 and 2.11)
>>>>>
>>>>> ie.
>>>>>
>>>>> +------------------+
>>>>>
>>>>> |(price * quantity)|
>>>>>
>>>>> +------------------+
>>>>>
>>>>> |              null|
>>>>>
>>>>> |              null|
>>>>>
>>>>> |              null|
>>>>>
>>>>> |              null|
>>>>>
>>>>> |              null|
>>>>>
>>>>> +------------------+
>>>>>
>>>>>
>>>>> +--------------------+----+--------------------+--------------------+
>>>>>
>>>>> |                 _c0| _c1|                 _c2|                 _c3|
>>>>>
>>>>> +--------------------+----+--------------------+--------------------+
>>>>>
>>>>> |0.952380952380952381|null|41.00000000000000...|-1.00000000000000...|
>>>>>
>>>>> |1.380952380952380952|null|50.00000000000000...|8.000000000000000000|
>>>>>
>>>>> |1.272727272727272727|null|50.00000000000000...|6.000000000000000000|
>>>>>
>>>>> |0.833333333333333333|null|44.00000000000000...|-4.00000000000000...|
>>>>>
>>>>> |1.000000000000000000|null|58.00000000000000...|               0E-18|
>>>>>
>>>>> +--------------------+----+--------------------+--------------------+
>>>>>
>>>>>
>>>>> Just keen to know what I did wrong?
>>>>>
>>>>>
>>>>> Cheers
>>>>>
>>>>> P
>>>>>
>>>>> --
>>>>> Philip Dodds
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>>
>> --
>> Philip Dodds
>>
>> philip.dodds@gmail.com
>> @philipdodds
>>
>>
>
>
> --
> Philip Dodds
>
> philip.dodds@gmail.com
> @philipdodds
>
>


-- 
Philip Dodds

philip.dodds@gmail.com
@philipdodds

Re: Multiplication on decimals in a dataframe query

Posted by Philip Dodds <ph...@gmail.com>.
Opened https://issues.apache.org/jira/browse/SPARK-12128

Thanks

P

On Thu, Dec 3, 2015 at 8:51 AM, Philip Dodds <ph...@gmail.com> wrote:

> I'll open up a JIRA for it,  it appears to work when you use a literal
> number but not when it is coming from the same dataframe
>
> Thanks!
>
> P
>
> On Thu, Dec 3, 2015 at 1:52 AM, Sahil Sareen <sa...@gmail.com> wrote:
>
>> +1 looks like a bug
>>
>> I think referencing trades() twice in multiplication is broken,
>>
>> scala> trades.select(trades("quantity")*trades("quantity")).show
>>
>> +---------------------+
>> |(quantity * quantity)|
>> +---------------------+
>> |                 null|
>> |                 null|
>>
>> scala> sqlContext.sql("select price*price as PP from trades").show
>>
>> +----+
>> |  PP|
>> +----+
>> |null|
>> |null|
>>
>>
>> -Sahil
>>
>> On Thu, Dec 3, 2015 at 12:02 PM, Akhil Das <ak...@sigmoidanalytics.com>
>> wrote:
>>
>>> Not quiet sure whats happening, but its not an issue with multiplication
>>> i guess as the following query worked for me:
>>>
>>> trades.select(trades("price")*9.5).show
>>> +-------------+
>>> |(price * 9.5)|
>>> +-------------+
>>> |        199.5|
>>> |        228.0|
>>> |        190.0|
>>> |        199.5|
>>> |        190.0|
>>> |        256.5|
>>> |        218.5|
>>> |        275.5|
>>> |        218.5|
>>> ......
>>> ......
>>>
>>>
>>> Could it be with the precision? ccing dev list, may be you can open up a
>>> jira for this as it seems to be a bug.
>>>
>>> Thanks
>>> Best Regards
>>>
>>> On Mon, Nov 30, 2015 at 12:41 AM, Philip Dodds <ph...@gmail.com>
>>> wrote:
>>>
>>>> I hit a weird issue when I tried to multiply to decimals in a select
>>>> (either in scala or as SQL), and Im assuming I must be missing the point.
>>>>
>>>> The issue is fairly easy to recreate with something like the following:
>>>>
>>>>
>>>> val sqlContext = new org.apache.spark.sql.SQLContext(sc)
>>>> import sqlContext.implicits._
>>>> import org.apache.spark.sql.types.Decimal
>>>>
>>>> case class Trade(quantity: Decimal,price: Decimal)
>>>>
>>>> val data = Seq.fill(100) {
>>>>   val price = Decimal(20+scala.util.Random.nextInt(10))
>>>>     val quantity = Decimal(20+scala.util.Random.nextInt(10))
>>>>
>>>>   Trade(quantity, price)
>>>> }
>>>>
>>>> val trades = sc.parallelize(data).toDF()
>>>> trades.registerTempTable("trades")
>>>>
>>>> trades.select(trades("price")*trades("quantity")).show
>>>>
>>>> sqlContext.sql("select
>>>> price/quantity,price*quantity,price+quantity,price-quantity from
>>>> trades").show
>>>>
>>>> The odd part is if you run it you will see that the addition/division
>>>> and subtraction works but the multiplication returns a null.
>>>>
>>>> Tested on 1.5.1/1.5.2 (Scala 2.10 and 2.11)
>>>>
>>>> ie.
>>>>
>>>> +------------------+
>>>>
>>>> |(price * quantity)|
>>>>
>>>> +------------------+
>>>>
>>>> |              null|
>>>>
>>>> |              null|
>>>>
>>>> |              null|
>>>>
>>>> |              null|
>>>>
>>>> |              null|
>>>>
>>>> +------------------+
>>>>
>>>>
>>>> +--------------------+----+--------------------+--------------------+
>>>>
>>>> |                 _c0| _c1|                 _c2|                 _c3|
>>>>
>>>> +--------------------+----+--------------------+--------------------+
>>>>
>>>> |0.952380952380952381|null|41.00000000000000...|-1.00000000000000...|
>>>>
>>>> |1.380952380952380952|null|50.00000000000000...|8.000000000000000000|
>>>>
>>>> |1.272727272727272727|null|50.00000000000000...|6.000000000000000000|
>>>>
>>>> |0.833333333333333333|null|44.00000000000000...|-4.00000000000000...|
>>>>
>>>> |1.000000000000000000|null|58.00000000000000...|               0E-18|
>>>>
>>>> +--------------------+----+--------------------+--------------------+
>>>>
>>>>
>>>> Just keen to know what I did wrong?
>>>>
>>>>
>>>> Cheers
>>>>
>>>> P
>>>>
>>>> --
>>>> Philip Dodds
>>>>
>>>>
>>>>
>>>
>>
>
>
> --
> Philip Dodds
>
> philip.dodds@gmail.com
> @philipdodds
>
>


-- 
Philip Dodds

philip.dodds@gmail.com
@philipdodds

Re: Multiplication on decimals in a dataframe query

Posted by Philip Dodds <ph...@gmail.com>.
I'll open up a JIRA for it,  it appears to work when you use a literal
number but not when it is coming from the same dataframe

Thanks!

P

On Thu, Dec 3, 2015 at 1:52 AM, Sahil Sareen <sa...@gmail.com> wrote:

> +1 looks like a bug
>
> I think referencing trades() twice in multiplication is broken,
>
> scala> trades.select(trades("quantity")*trades("quantity")).show
>
> +---------------------+
> |(quantity * quantity)|
> +---------------------+
> |                 null|
> |                 null|
>
> scala> sqlContext.sql("select price*price as PP from trades").show
>
> +----+
> |  PP|
> +----+
> |null|
> |null|
>
>
> -Sahil
>
> On Thu, Dec 3, 2015 at 12:02 PM, Akhil Das <ak...@sigmoidanalytics.com>
> wrote:
>
>> Not quiet sure whats happening, but its not an issue with multiplication
>> i guess as the following query worked for me:
>>
>> trades.select(trades("price")*9.5).show
>> +-------------+
>> |(price * 9.5)|
>> +-------------+
>> |        199.5|
>> |        228.0|
>> |        190.0|
>> |        199.5|
>> |        190.0|
>> |        256.5|
>> |        218.5|
>> |        275.5|
>> |        218.5|
>> ......
>> ......
>>
>>
>> Could it be with the precision? ccing dev list, may be you can open up a
>> jira for this as it seems to be a bug.
>>
>> Thanks
>> Best Regards
>>
>> On Mon, Nov 30, 2015 at 12:41 AM, Philip Dodds <ph...@gmail.com>
>> wrote:
>>
>>> I hit a weird issue when I tried to multiply to decimals in a select
>>> (either in scala or as SQL), and Im assuming I must be missing the point.
>>>
>>> The issue is fairly easy to recreate with something like the following:
>>>
>>>
>>> val sqlContext = new org.apache.spark.sql.SQLContext(sc)
>>> import sqlContext.implicits._
>>> import org.apache.spark.sql.types.Decimal
>>>
>>> case class Trade(quantity: Decimal,price: Decimal)
>>>
>>> val data = Seq.fill(100) {
>>>   val price = Decimal(20+scala.util.Random.nextInt(10))
>>>     val quantity = Decimal(20+scala.util.Random.nextInt(10))
>>>
>>>   Trade(quantity, price)
>>> }
>>>
>>> val trades = sc.parallelize(data).toDF()
>>> trades.registerTempTable("trades")
>>>
>>> trades.select(trades("price")*trades("quantity")).show
>>>
>>> sqlContext.sql("select
>>> price/quantity,price*quantity,price+quantity,price-quantity from
>>> trades").show
>>>
>>> The odd part is if you run it you will see that the addition/division
>>> and subtraction works but the multiplication returns a null.
>>>
>>> Tested on 1.5.1/1.5.2 (Scala 2.10 and 2.11)
>>>
>>> ie.
>>>
>>> +------------------+
>>>
>>> |(price * quantity)|
>>>
>>> +------------------+
>>>
>>> |              null|
>>>
>>> |              null|
>>>
>>> |              null|
>>>
>>> |              null|
>>>
>>> |              null|
>>>
>>> +------------------+
>>>
>>>
>>> +--------------------+----+--------------------+--------------------+
>>>
>>> |                 _c0| _c1|                 _c2|                 _c3|
>>>
>>> +--------------------+----+--------------------+--------------------+
>>>
>>> |0.952380952380952381|null|41.00000000000000...|-1.00000000000000...|
>>>
>>> |1.380952380952380952|null|50.00000000000000...|8.000000000000000000|
>>>
>>> |1.272727272727272727|null|50.00000000000000...|6.000000000000000000|
>>>
>>> |0.833333333333333333|null|44.00000000000000...|-4.00000000000000...|
>>>
>>> |1.000000000000000000|null|58.00000000000000...|               0E-18|
>>>
>>> +--------------------+----+--------------------+--------------------+
>>>
>>>
>>> Just keen to know what I did wrong?
>>>
>>>
>>> Cheers
>>>
>>> P
>>>
>>> --
>>> Philip Dodds
>>>
>>>
>>>
>>
>


-- 
Philip Dodds

philip.dodds@gmail.com
@philipdodds

Re: Multiplication on decimals in a dataframe query

Posted by Sahil Sareen <sa...@gmail.com>.
+1 looks like a bug

I think referencing trades() twice in multiplication is broken,

scala> trades.select(trades("quantity")*trades("quantity")).show

+---------------------+
|(quantity * quantity)|
+---------------------+
|                 null|
|                 null|

scala> sqlContext.sql("select price*price as PP from trades").show

+----+
|  PP|
+----+
|null|
|null|


-Sahil

On Thu, Dec 3, 2015 at 12:02 PM, Akhil Das <ak...@sigmoidanalytics.com>
wrote:

> Not quiet sure whats happening, but its not an issue with multiplication i
> guess as the following query worked for me:
>
> trades.select(trades("price")*9.5).show
> +-------------+
> |(price * 9.5)|
> +-------------+
> |        199.5|
> |        228.0|
> |        190.0|
> |        199.5|
> |        190.0|
> |        256.5|
> |        218.5|
> |        275.5|
> |        218.5|
> ......
> ......
>
>
> Could it be with the precision? ccing dev list, may be you can open up a
> jira for this as it seems to be a bug.
>
> Thanks
> Best Regards
>
> On Mon, Nov 30, 2015 at 12:41 AM, Philip Dodds <ph...@gmail.com>
> wrote:
>
>> I hit a weird issue when I tried to multiply to decimals in a select
>> (either in scala or as SQL), and Im assuming I must be missing the point.
>>
>> The issue is fairly easy to recreate with something like the following:
>>
>>
>> val sqlContext = new org.apache.spark.sql.SQLContext(sc)
>> import sqlContext.implicits._
>> import org.apache.spark.sql.types.Decimal
>>
>> case class Trade(quantity: Decimal,price: Decimal)
>>
>> val data = Seq.fill(100) {
>>   val price = Decimal(20+scala.util.Random.nextInt(10))
>>     val quantity = Decimal(20+scala.util.Random.nextInt(10))
>>
>>   Trade(quantity, price)
>> }
>>
>> val trades = sc.parallelize(data).toDF()
>> trades.registerTempTable("trades")
>>
>> trades.select(trades("price")*trades("quantity")).show
>>
>> sqlContext.sql("select
>> price/quantity,price*quantity,price+quantity,price-quantity from
>> trades").show
>>
>> The odd part is if you run it you will see that the addition/division and
>> subtraction works but the multiplication returns a null.
>>
>> Tested on 1.5.1/1.5.2 (Scala 2.10 and 2.11)
>>
>> ie.
>>
>> +------------------+
>>
>> |(price * quantity)|
>>
>> +------------------+
>>
>> |              null|
>>
>> |              null|
>>
>> |              null|
>>
>> |              null|
>>
>> |              null|
>>
>> +------------------+
>>
>>
>> +--------------------+----+--------------------+--------------------+
>>
>> |                 _c0| _c1|                 _c2|                 _c3|
>>
>> +--------------------+----+--------------------+--------------------+
>>
>> |0.952380952380952381|null|41.00000000000000...|-1.00000000000000...|
>>
>> |1.380952380952380952|null|50.00000000000000...|8.000000000000000000|
>>
>> |1.272727272727272727|null|50.00000000000000...|6.000000000000000000|
>>
>> |0.833333333333333333|null|44.00000000000000...|-4.00000000000000...|
>>
>> |1.000000000000000000|null|58.00000000000000...|               0E-18|
>>
>> +--------------------+----+--------------------+--------------------+
>>
>>
>> Just keen to know what I did wrong?
>>
>>
>> Cheers
>>
>> P
>>
>> --
>> Philip Dodds
>>
>>
>>
>