You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by lonikar <lo...@gmail.com> on 2015/09/09 21:31:29 UTC

Spark 1.5: How to trigger expression execution through UnsafeRow/TungstenProject

The tungsten, cogegen etc options are enabled by default. But I am not able
to get the execution through the UnsafeRow/TungstenProject. It still
executes using InternalRow/Project.

I see this in the SparkStrategies.scala: If unsafe mode is enabled and we
support these data types in Unsafe, use the tungsten project. Otherwise use
the normal project.

Can someone give an example code on what can trigger this? I tried some of
the primitive types but did not work.



--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-1-5-How-to-trigger-expression-execution-through-UnsafeRow-TungstenProject-tp14026.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: Spark 1.5: How to trigger expression execution through UnsafeRow/TungstenProject

Posted by lonikar <lo...@gmail.com>.

thanks that worked 



--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-1-5-How-to-trigger-expression-execution-through-UnsafeRow-TungstenProject-tp14026p14053.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: Spark 1.5: How to trigger expression execution through UnsafeRow/TungstenProject

Posted by Ted Yu <yu...@gmail.com>.

Here is the example from Reynold (
http://search-hadoop.com/m/q3RTtfvs1P1YDK8d) :

scala> val data = sc.parallelize(1 to size, 5).map(x =>
(util.Random.nextInt(size /
repetitions),util.Random.nextDouble)).toDF("key", "value")
data: org.apache.spark.sql.DataFrame = [key: int, value: double]

scala> data.explain
== Physical Plan ==
TungstenProject [_1#0 AS key#2,_2#1 AS value#3]
 Scan PhysicalRDD[_1#0,_2#1]

...
scala> val res = df.groupBy("key").agg(sum("value"))
res: org.apache.spark.sql.DataFrame = [key: int, sum(value): double]

scala> res.explain
15/09/09 14:17:26 INFO MemoryStore: ensureFreeSpace(88456) called with
curMem=84037, maxMem=556038881
15/09/09 14:17:26 INFO MemoryStore: Block broadcast_2 stored as values in
memory (estimated size 86.4 KB, free 530.1 MB)
15/09/09 14:17:26 INFO MemoryStore: ensureFreeSpace(19788) called with
curMem=172493, maxMem=556038881
15/09/09 14:17:26 INFO MemoryStore: Block broadcast_2_piece0 stored as
bytes in memory (estimated size 19.3 KB, free 530.1 MB)
15/09/09 14:17:26 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory
on localhost:42098 (size: 19.3 KB, free: 530.2 MB)
15/09/09 14:17:26 INFO SparkContext: Created broadcast 2 from explain at
<console>:27
== Physical Plan ==
TungstenAggregate(key=[key#19],
functions=[(sum(value#20),mode=Final,isDistinct=false)],
output=[key#19,sum(value)#21])
 TungstenExchange hashpartitioning(key#19)
  TungstenAggregate(key=[key#19],
functions=[(sum(value#20),mode=Partial,isDistinct=false)],
output=[key#19,currentSum#25])
   Scan ParquetRelation[file:/tmp/data][key#19,value#20]

FYI

On Wed, Sep 9, 2015 at 12:31 PM, lonikar <lo...@gmail.com> wrote:

> The tungsten, cogegen etc options are enabled by default. But I am not able
> to get the execution through the UnsafeRow/TungstenProject. It still
> executes using InternalRow/Project.
>
> I see this in the SparkStrategies.scala: If unsafe mode is enabled and we
> support these data types in Unsafe, use the tungsten project. Otherwise use
> the normal project.
>
> Can someone give an example code on what can trigger this? I tried some of
> the primitive types but did not work.
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-1-5-How-to-trigger-expression-execution-through-UnsafeRow-TungstenProject-tp14026.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>