You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Takeshi Yamamuro <li...@gmail.com> on 2016/10/25 02:54:11 UTC

Re: Get size of intermediate results

-dev +user

Hi,

You have tried this?
scala> val df = Seq((1, 0), (2, 0), (3, 0), (4, 0)).toDF.cache
scala> df.queryExecution.executedPlan(0).execute().foreach(x => Unit)
scala> df.rdd.toDebugString

res4: String =

(4) MapPartitionsRDD[13] at rdd at <console>:26 []

 |  MapPartitionsRDD[12] at rdd at <console>:26 []

 |  MapPartitionsRDD[11] at rdd at <console>:26 []

 |  LocalTableScan [_1#41, _2#42]

 MapPartitionsRDD[9] at cache at <console>:23 []

 |      CachedPartitions: 4; MemorySize: 1104.0 B; ExternalBlockStoreSize:
0.0 B; DiskSize: 0.0 B

 |  MapPartitionsRDD[8] at cache at <console>:23 []

 |  ParallelCollectionRDD[7] at cache at <console>:23 []

// maropu

On Fri, Oct 21, 2016 at 10:18 AM, Egor Pahomov <pa...@gmail.com>
wrote:

> I needed the same for debugging and I just added "count" action in debug
> mode for every step I was interested in. It's very time-consuming, but I
> debug not very often.
>
> 2016-10-20 2:17 GMT-07:00 Andreas Hechenberger <in...@hechenberger.me>:
>
>> Hey awesome Spark-Dev's :)
>>
>> i am new to spark and i read a lot but now i am stuck :( so please be
>> kind, if i ask silly questions.
>>
>> I want to analyze some algorithms and strategies in spark and for one
>> experiment i want to know the size of the intermediate results between
>> iterations/jobs. Some of them are written to disk and some are in the
>> cache, i guess. I am not afraid of looking into the code (i already did)
>> but its complex and have no clue where to start :( It would be nice if
>> someone can point me in the right direction or where i can find more
>> information about the structure of spark core devel :)
>>
>> I already setup the devel environment and i can compile spark. It was
>> really awesome how smoothly the setup was :) Thx for that.
>>
>> Servus
>> Andy
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>
>>
>
>
> --
>
>
> *Sincerely yoursEgor Pakhomov*
>



-- 
---
Takeshi Yamamuro