You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2016/08/12 22:00:22 UTC

[jira] [Commented] (BEAM-549) SparkRunner should support Beam's KafkaIO instead of providing it's own.

    [ https://issues.apache.org/jira/browse/BEAM-549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15419570#comment-15419570 ] 

ASF GitHub Bot commented on BEAM-549:
-------------------------------------

GitHub user amitsela opened a pull request:

    https://github.com/apache/incubator-beam/pull/822

    [BEAM-549] SparkRunner should support Beam's KafkaIO instead of providing it's own.

    Be sure to do all of the following to help us incorporate your contribution
    quickly and easily:
    
     - [ ] Make sure the PR title is formatted like:
       `[BEAM-<Jira issue #>] Description of pull request`
     - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
           Travis-CI on your fork and ensure the whole test matrix passes).
     - [ ] Replace `<Jira issue #>` in the title with the actual Jira issue
           number, if there is one.
     - [ ] If this contribution is large, please file an Apache
           [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt).
    
    ---


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/amitsela/incubator-beam BEAM-549

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-beam/pull/822.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #822
    
----
commit 960b4e9847df3d3fd5b9ab99bcfbeae6b2c24f01
Author: Sela <an...@paypal.com>
Date:   2016-08-12T21:40:09Z

    Spark Kafka io support via an adapting PTransform.

commit be567246ff88f9049d1aa206b7d7b556a3a8d565
Author: Sela <an...@paypal.com>
Date:   2016-08-12T21:41:34Z

    Expose relevant fields as public, they are final anyway.

commit 9414d760fcb1953f55eb99dfce78da56c464c15f
Author: Sela <an...@paypal.com>
Date:   2016-08-12T21:42:26Z

    Remove SparkRunner's implementation of KafkaIO.

commit 0f822ac80dfdca83cc9204e8784561d5f066b43e
Author: Sela <an...@paypal.com>
Date:   2016-08-12T21:43:07Z

    Dependency on Beam's KafkaIO.

commit 89f52131b044e6e99c96f9bb5741050233422ebc
Author: Sela <an...@paypal.com>
Date:   2016-08-12T21:44:23Z

    Translate reading of Kafka using Beam's KafkaIO, let the runner override with it's own PTransform.

commit c1965659d5b477fe33c2ab396e9067ce3c2967a7
Author: Sela <an...@paypal.com>
Date:   2016-08-12T21:45:52Z

    Adapt tests.

----


> SparkRunner should support Beam's KafkaIO instead of providing it's own.
> ------------------------------------------------------------------------
>
>                 Key: BEAM-549
>                 URL: https://issues.apache.org/jira/browse/BEAM-549
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-spark
>            Reporter: Amit Sela
>            Assignee: Amit Sela
>
> For portability, and in the spirit of Apache Beam, the SparkRunner should use the Beam implementation of KafkaIO instead of it's own.
> Having said that, the runner will translate the KafkaIO as defined in the pipeline into it's own internal implementation, but should still map the properties the user defined in the pipeline in a way that the IO behaves the same - i.e., brokers, topic, etc.
> Eventually, the SparkRunner will implement reading from Kafka using Spark's KafakUtils.createDirectStream() as described here: http://spark.apache.org/docs/1.6.2/streaming-kafka-integration.html#approach-2-direct-approach-no-receivers 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)