You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Anirudh Perugu <an...@stonybrook.edu> on 2016/11/13 01:56:02 UTC

toDebugString is clipped

Hello all,

I am trying to understanding how graphx works internally.

I created a small program in graphx :
1. I create a new graph
val graph: Graph[(String, Double), Int] = Graph(vertexRDD, edgeRDD)
2. Now I want to see how my vertices were created, hence I use
scala> graph.vertices.toDebugString
res11: String =
(48) VertexRDDImpl[11] at RDD at VertexRDD.scala:57 []
 |   VertexRDD, VertexRDD ZippedPartitionsRDD2[9] at zipPartitions at
VertexRDD.scala:322 []
 |       CachedPartitions: 48; MemorySize: 328.0 KB;
ExternalBlockStoreSize: 0.0 B; DiskSize: 0.0 B
 |   ShuffledRDD[5] at partitionBy at VertexRDD.scala:319 []
 +-(48) ParallelCollectionRDD[0] at parallelize at <console>:45 []
 |   MapPartitionsRDD[8] at mapPartitions at VertexRDD.scala:361 []
 |   ShuffledRDD[7] at partitionBy at VertexRDD.scala:361 []
 +-(48) VertexRDD.createRoutingTables - vid2pid (aggregation)
MapPartitionsRDD[6] at mapPartitions at VertexRDD.scala:356 []
    |   EdgeRDD, EdgeRDD MapPartitionsRDD[2] at mapPartitionsWithIndex at
EdgeRDD.scala:105 []
    |   ParallelCollectionRDD[1] at parallelize at <cons...
scala>
But this doesn't give me the whole picture as you can see it is clipped (10
lines I guess is the default),
(a) is there an option to increase this number so that I can see the whole
output.
(b) i know that indentations indicate a shuffle boundary & the parentheses
indicate parallelism at each step of this physical plan so does this mean
the above can be put into a picture like :
RDD A (VertexRDD.cre..) [48 partitions]
                                          \
                                             --- RDD C (VertexRDD,
VertexRDD Zipped...)[48 partitions]
                                          /
RDD B (ParallelCollecti..) [48 partitions]

I am fairly new to spark, so please feel free to correct!

Thanks
Anirudh

Re: toDebugString is clipped

Posted by Sean Owen <so...@cloudera.com>.

I believe it's the shell (Scala shell) that's cropping the output. See
http://blog.ssanj.net/posts/2016-10-16-output-in-scala-repl-is-truncated.html

On Sun, Nov 13, 2016 at 1:56 AM Anirudh Perugu <
anirudh.perugu@stonybrook.edu> wrote:

> Hello all,
>
> I am trying to understanding how graphx works internally.
>
> I created a small program in graphx :
> 1. I create a new graph
> val graph: Graph[(String, Double), Int] = Graph(vertexRDD, edgeRDD)
> 2. Now I want to see how my vertices were created, hence I use
> scala> graph.vertices.toDebugString
> res11: String =
> (48) VertexRDDImpl[11] at RDD at VertexRDD.scala:57 []
>  |   VertexRDD, VertexRDD ZippedPartitionsRDD2[9] at zipPartitions at
> VertexRDD.scala:322 []
>  |       CachedPartitions: 48; MemorySize: 328.0 KB;
> ExternalBlockStoreSize: 0.0 B; DiskSize: 0.0 B
>  |   ShuffledRDD[5] at partitionBy at VertexRDD.scala:319 []
>  +-(48) ParallelCollectionRDD[0] at parallelize at <console>:45 []
>  |   MapPartitionsRDD[8] at mapPartitions at VertexRDD.scala:361 []
>  |   ShuffledRDD[7] at partitionBy at VertexRDD.scala:361 []
>  +-(48) VertexRDD.createRoutingTables - vid2pid (aggregation)
> MapPartitionsRDD[6] at mapPartitions at VertexRDD.scala:356 []
>     |   EdgeRDD, EdgeRDD MapPartitionsRDD[2] at mapPartitionsWithIndex at
> EdgeRDD.scala:105 []
>     |   ParallelCollectionRDD[1] at parallelize at <cons...
> scala>
> But this doesn't give me the whole picture as you can see it is clipped
> (10 lines I guess is the default),
> (a) is there an option to increase this number so that I can see the whole
> output.
> (b) i know that indentations indicate a shuffle boundary & the parentheses
> indicate parallelism at each step of this physical plan so does this mean
> the above can be put into a picture like :
> RDD A (VertexRDD.cre..) [48 partitions]
>                                           \
>                                              --- RDD C (VertexRDD,
> VertexRDD Zipped...)[48 partitions]
>                                           /
> RDD B (ParallelCollecti..) [48 partitions]
>
> I am fairly new to spark, so please feel free to correct!
>
> Thanks
> Anirudh
>