You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Aviem Zur (JIRA)" <ji...@apache.org> on 2016/12/13 15:36:58 UTC

[jira] [Comment Edited] (BEAM-1146) Decrease spark runner startup overhead

    [ https://issues.apache.org/jira/browse/BEAM-1146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15745427#comment-15745427 ] 

Aviem Zur edited comment on BEAM-1146 at 12/13/16 3:36 PM:
-----------------------------------------------------------

Possible solutions:
# Limiting reflections search criteria to a few specific packages cuts the time down to 1 second. However, custom user {{Coder}} and {{Source}} implementations may not fall within these packages, and could encounter Kryo serialization errors.
# Adding an annotation to all Coders and sources, so we can register them specifically, similar solutions to similar problems exist in beam code today and utilize ServiceLoader and annotations (For example to find implementations of {{IOChannelFactoryRegistrar}}). However, users will have to know to add these annotation to their custom {{Coder}} and {{Source}} implementations as well.
# Some combination of the previous 2 solutions.


was (Author: aviemzur):
Possible solutions:
# Limiting reflections search criteria to a few specific packages cuts the time down to 1 second. However, custom user Coders and Sources may not fall within these packages, and could encounter Kryo serialization errors.
# Adding an annotation to all Coders and sources, so we can register them specifically, similar solutions to similar problems exist in beam code today and utilize ServiceLoader and annotations (For example to find implementations of {{IOChannelFactoryRegistrar}}).
# Some combination of the previous 2 solutions.

> Decrease spark runner startup overhead
> --------------------------------------
>
>                 Key: BEAM-1146
>                 URL: https://issues.apache.org/jira/browse/BEAM-1146
>             Project: Beam
>          Issue Type: Improvement
>          Components: runner-spark
>            Reporter: Aviem Zur
>            Assignee: Amit Sela
>
> BEAM-921 introduced a lazy singleton instantiated once in each machine (driver & executors) which utilizes reflection to find all subclasses of Source and Coder
> While this is beneficial in it's own right, the change added about one minute of overhead in spark runner startup time (which cause the first job/stage to take up to a minute).
> The change is in class {{BeamSparkRunnerRegistrator}}
> The reason reflection (specifically reflections library) was used here is because  there is no current way of knowing all the source and coder classes at runtime.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)