You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Takeshi Yamamuro <li...@gmail.com> on 2016/10/25 02:54:11 UTC
Re: Get size of intermediate results
-dev +user
Hi,
You have tried this?
scala> val df = Seq((1, 0), (2, 0), (3, 0), (4, 0)).toDF.cache
scala> df.queryExecution.executedPlan(0).execute().foreach(x => Unit)
scala> df.rdd.toDebugString
res4: String =
(4) MapPartitionsRDD[13] at rdd at <console>:26 []
| MapPartitionsRDD[12] at rdd at <console>:26 []
| MapPartitionsRDD[11] at rdd at <console>:26 []
| LocalTableScan [_1#41, _2#42]
MapPartitionsRDD[9] at cache at <console>:23 []
| CachedPartitions: 4; MemorySize: 1104.0 B; ExternalBlockStoreSize:
0.0 B; DiskSize: 0.0 B
| MapPartitionsRDD[8] at cache at <console>:23 []
| ParallelCollectionRDD[7] at cache at <console>:23 []
// maropu
On Fri, Oct 21, 2016 at 10:18 AM, Egor Pahomov <pa...@gmail.com>
wrote:
> I needed the same for debugging and I just added "count" action in debug
> mode for every step I was interested in. It's very time-consuming, but I
> debug not very often.
>
> 2016-10-20 2:17 GMT-07:00 Andreas Hechenberger <in...@hechenberger.me>:
>
>> Hey awesome Spark-Dev's :)
>>
>> i am new to spark and i read a lot but now i am stuck :( so please be
>> kind, if i ask silly questions.
>>
>> I want to analyze some algorithms and strategies in spark and for one
>> experiment i want to know the size of the intermediate results between
>> iterations/jobs. Some of them are written to disk and some are in the
>> cache, i guess. I am not afraid of looking into the code (i already did)
>> but its complex and have no clue where to start :( It would be nice if
>> someone can point me in the right direction or where i can find more
>> information about the structure of spark core devel :)
>>
>> I already setup the devel environment and i can compile spark. It was
>> really awesome how smoothly the setup was :) Thx for that.
>>
>> Servus
>> Andy
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>
>>
>
>
> --
>
>
> *Sincerely yoursEgor Pakhomov*
>
--
---
Takeshi Yamamuro