You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@beam.apache.org by 吴亭 <wu...@gmail.com> on 2020/06/22 10:53:21 UTC

Strange graph generated by beam

Hi,

We are using the beam to read the stream from Kafka and using spark as the
runner, and I found our spark graph looks very strange.

For each batch stream, it will generate 3 stages, one of them of our actual
work, that I can understand.

Another two is kind of duplicated, you can take a look at the attached
graphs:
[image: image.png]


[image: image.png]
You will see the second graph actually includes the first one, I have no
idea why it will display two?

Is this correct?

Br,
Tim

Re: Strange graph generated by beam

Posted by 吴亭 <wu...@gmail.com>.

Hi Luke,

Thanks for your reply.

I have no idea if we can customize the labels for the graphs. Those graphs
are auto-generated based on the input of Beam spark runner.

Anyway, here is more detailed info that I forgot to describe in my post.

As I said there are two weird graphs for every DStream batch, both of the
two stages have exactly the same stack trace:

org.apache.spark.streaming.dstream.DStream.<init>(DStream.scala:117)
org.apache.beam.runners.spark.io.SparkUnboundedSource$ReadReportDStream.<init>(SparkUnboundedSource.java:172)
org.apache.beam.runners.spark.io.SparkUnboundedSource.read(SparkUnboundedSource.java:117)
org.apache.beam.runners.spark.translation.streaming.StreamingTransformTranslator$2.evaluate(StreamingTransformTranslator.java:119)
org.apache.beam.runners.spark.translation.streaming.StreamingTransformTranslator$2.evaluate(StreamingTransformTranslator.java:113)
org.apache.beam.runners.spark.SparkRunner$Evaluator.doVisitTransform(SparkRunner.java:443)
org.apache.beam.runners.spark.SparkRunner$Evaluator.visitPrimitiveTransform(SparkRunner.java:429)
org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:657)
org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:649)
org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:649)
org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:649)
org.apache.beam.sdk.runners.TransformHierarchy$Node.access$600(TransformHierarchy.java:311)
org.apache.beam.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:245)
org.apache.beam.sdk.Pipeline.traverseTopologically(Pipeline.java:458)
org.apache.beam.runners.spark.translation.streaming.SparkRunnerStreamingContextFactory.call(SparkRunnerStreamingContextFactory.java:89)
org.apache.beam.runners.spark.translation.streaming.SparkRunnerStreamingContextFactory.call(SparkRunnerStreamingContextFactory.java:47)
org.apache.spark.streaming.api.java.JavaStreamingContext$$anonfun$7.apply(JavaStreamingContext.scala:627)
org.apache.spark.streaming.api.java.JavaStreamingContext$$anonfun$7.apply(JavaStreamingContext.scala:626)
scala.Option.getOrElse(Option.scala:121)
org.apache.spark.streaming.StreamingContext$.getOrCreate(StreamingContext.scala:828)

If you go to the code of SparkUnboundedSource, here are the two places
which generate those two graphs:
(Graph contains map partitions rdd)
Line 114: JavaDStream<Metadata> metadataDStream = mapWithStateDStream.map(new
Tuple2MetadataFunction());

(Graph without map partitions rdd)

Line 104:JavaMapWithStateDStream<

        Source<T>, CheckpointMarkT, Tuple2<byte[], Instant>,
Tuple2<Iterable<byte[]>, Metadata>>
    mapWithStateDStream =
        inputDStream.mapWithState(
            StateSpec.function(
                    StateSpecFunctions.<T,
CheckpointMarkT>mapSourceFunction(rc, stepName))
                .numPartitions(sourceDStream.getNumPartitions()));

I am confused about why it will generate two graphs instead of one.

Br,
Tim




Luke Cwik <lc...@google.com> 于2020年6月22日周一 下午5:06写道：

> It would be helpful if the names of the transforms were visible on the
> graph otherwise it is really hard understanding what each stage and step do.
>
> On Mon, Jun 22, 2020 at 3:53 AM 吴亭 <wu...@gmail.com> wrote:
>
>> Hi,
>>
>> We are using the beam to read the stream from Kafka and using spark as
>> the runner, and I found our spark graph looks very strange.
>>
>> For each batch stream, it will generate 3 stages, one of them of our
>> actual work, that I can understand.
>>
>> Another two is kind of duplicated, you can take a look at the attached
>> graphs:
>> [image: image.png]
>>
>>
>> [image: image.png]
>> You will see the second graph actually includes the first one, I have no
>> idea why it will display two?
>>
>> Is this correct?
>>
>> Br,
>> Tim
>>
>

Re: Strange graph generated by beam

Posted by Luke Cwik <lc...@google.com>.

It would be helpful if the names of the transforms were visible on the
graph otherwise it is really hard understanding what each stage and step do.

On Mon, Jun 22, 2020 at 3:53 AM 吴亭 <wu...@gmail.com> wrote:

> Hi,
>
> We are using the beam to read the stream from Kafka and using spark as the
> runner, and I found our spark graph looks very strange.
>
> For each batch stream, it will generate 3 stages, one of them of our
> actual work, that I can understand.
>
> Another two is kind of duplicated, you can take a look at the attached
> graphs:
> [image: image.png]
>
>
> [image: image.png]
> You will see the second graph actually includes the first one, I have no
> idea why it will display two?
>
> Is this correct?
>
> Br,
> Tim
>