You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by viirya <gi...@git.apache.org> on 2015/12/08 11:06:14 UTC

[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

GitHub user viirya opened a pull request:

    https://github.com/apache/spark/pull/10197

    [SPARK-12203][STREAMING] Add KafkaDirectInputDStream

    JIRA: https://issues.apache.org/jira/browse/SPARK-12203
    
    Currently, we have DirectKafkaInputDStream, which directly pulls messages from Kafka Brokers without any receivers, and KafkaInputDStream, which pulls messages from a Kafka Broker using receiver with zookeeper.
    
    As we observed, because DirectKafkaInputDStream retrieves messages from Kafka after each batch finishes, it posts a latency compared with KafkaInputDStream that continues to pull messages during each batch window.
    
    So we try to add KafkaDirectInputDStream that directly pulls messages from Kafka Brokers as DirectKafkaInputDStream, but it uses receivers as KafkaInputDStream and pulls messages as blocks during each batch window.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/viirya/spark-1 kafka-direct-receiver

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/10197.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #10197
    
----
commit 43351d134d007769f36b097601d46eb1cba62715
Author: Liang-Chi Hsieh <vi...@appier.com>
Date:   2015-12-08T09:43:57Z

    Add KafkaDirectInputDStream.

commit 3e3e1b821093e4cdf120493d22d2ce0d517f5263
Author: Liang-Chi Hsieh <vi...@appier.com>
Date:   2015-12-08T09:58:23Z

    Remove unnecessary spaces.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10197#issuecomment-163067069
  
    **[Test build #47376 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47376/consoleFull)** for PR 10197 at commit [`a397662`](https://github.com/apache/spark/commit/a39766272356474eed82407f4c84fa31b0734c33).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the pull request:

    https://github.com/apache/spark/pull/10197#issuecomment-163088878
  
    @jerryshao Yes. We would like to directly pull messages from Kafka like DirectKafkaInputDStream, but also use receivers like KafkaInputDStream.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the pull request:

    https://github.com/apache/spark/pull/10197#issuecomment-167704274
  
    I would like to close this now. But the latency should be a problem in real use case. You can see a [benchmarking](http://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at) result.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the pull request:

    https://github.com/apache/spark/pull/10197#issuecomment-163096778
  
    I'm really doubt changing to low-level API could guarantee exact-once semantics without any other changes.
    
    CC\ @koeninger 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10197#issuecomment-164228478
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10197#issuecomment-164173864
  
    **[Test build #47614 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47614/consoleFull)** for PR 10197 at commit [`c590f5e`](https://github.com/apache/spark/commit/c590f5e3760b6ff1109da920766691615e8dc514).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10197#issuecomment-162840417
  
    **[Test build #47329 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47329/consoleFull)** for PR 10197 at commit [`3e3e1b8`](https://github.com/apache/spark/commit/3e3e1b821093e4cdf120493d22d2ce0d517f5263).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

Posted by viirya <gi...@git.apache.org>.
Github user viirya closed the pull request at:

    https://github.com/apache/spark/pull/10197


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10197#issuecomment-164212137
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47619/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the pull request:

    https://github.com/apache/spark/pull/10197#issuecomment-163078191
  
    Hi @viirya , I really don't see any difference compared to receiver based Kafka stream, the only difference is that you change the high-level consumer API to low-level `SimpleConsumer`, am I right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10197#issuecomment-164212136
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10197#issuecomment-164204637
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47617/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

Posted by koeninger <gi...@git.apache.org>.
Github user koeninger commented on the pull request:

    https://github.com/apache/spark/pull/10197#issuecomment-167808178
  
    That yahoo benchmark has a lot of issues, they've already been contacted by
    myself and others as to some obvious errors they made in their spark job.
    
    On Mon, Dec 28, 2015 at 8:57 PM, Liang-Chi Hsieh <no...@github.com>
    wrote:
    
    > Closed #10197 <https://github.com/apache/spark/pull/10197>.
    >
    > —
    > Reply to this email directly or view it on GitHub
    > <https://github.com/apache/spark/pull/10197#event-501830369>.
    >



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10197#issuecomment-162840627
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the pull request:

    https://github.com/apache/spark/pull/10197#issuecomment-164204167
  
    Forgot to commit new file...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

Posted by koeninger <gi...@git.apache.org>.
Github user koeninger commented on the pull request:

    https://github.com/apache/spark/pull/10197#issuecomment-163302006
  
    The reason direct stream has some latency is because it is figuring out, in advance, on the driver, which offsets are in each partition.  That means that all relevant state is on the driver (so it can be checkpointed esailty), and rdds are immutable once defined (so can be freely retried in the case of an executor failure).  
    
    Even that latency should be minimized if you have a batch size that's tuned down close to what your hardware can support.  I've done sub-second batches.
    
    Maybe I'm confused, but...
    
    - Why is this largely just a cut and paste of existing code?
    - How does this handle multiple receivers?
    - How does this handle errors?  You copied the error handling code from the direct stream, but that assumes it's in a task that will be retried, not a receiver.  Looks like you're just silently catching all exceptions.
    - Why does this still mention checkpoints, when it doesn't seem to have any interaction with checkpoints?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10197#issuecomment-164215968
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10197#issuecomment-162934829
  
    **[Test build #47340 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47340/consoleFull)** for PR 10197 at commit [`a397662`](https://github.com/apache/spark/commit/a39766272356474eed82407f4c84fa31b0734c33).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10197#issuecomment-163073031
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10197#issuecomment-164228455
  
    **[Test build #47621 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47621/consoleFull)** for PR 10197 at commit [`e5d0a2b`](https://github.com/apache/spark/commit/e5d0a2b47e5930ded9b728bcb994872a579f9823).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:\n  * `trait KafkaDirect extends Logging `\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10197#issuecomment-164204635
  
    **[Test build #47617 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47617/consoleFull)** for PR 10197 at commit [`e69b791`](https://github.com/apache/spark/commit/e69b791a06cc4499620ed0b5df82c8d7f48c7f1d).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:\n  * `trait KafkaDirect extends Logging `\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10197#issuecomment-163072940
  
    **[Test build #47376 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47376/consoleFull)** for PR 10197 at commit [`a397662`](https://github.com/apache/spark/commit/a39766272356474eed82407f4c84fa31b0734c33).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10197#issuecomment-164215971
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47620/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10197#issuecomment-163073033
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47376/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10197#issuecomment-163009606
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47340/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10197#issuecomment-163009506
  
    **[Test build #47340 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47340/consoleFull)** for PR 10197 at commit [`a397662`](https://github.com/apache/spark/commit/a39766272356474eed82407f4c84fa31b0734c33).
     * This patch **fails from timeout after a configured wait of \`250m\`**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10197#issuecomment-164215964
  
    **[Test build #47620 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47620/consoleFull)** for PR 10197 at commit [`7059ab8`](https://github.com/apache/spark/commit/7059ab86c9eca800473da56c2bc75c1cac7d94d1).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:\n  * `trait KafkaDirect extends Logging `\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10197#issuecomment-163009604
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10197#issuecomment-164215857
  
    **[Test build #47620 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47620/consoleFull)** for PR 10197 at commit [`7059ab8`](https://github.com/apache/spark/commit/7059ab86c9eca800473da56c2bc75c1cac7d94d1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

Posted by viirya <gi...@git.apache.org>.
Github user viirya closed the pull request at:

    https://github.com/apache/spark/pull/10197


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the pull request:

    https://github.com/apache/spark/pull/10197#issuecomment-162931179
  
    retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10197#issuecomment-164174224
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47614/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the pull request:

    https://github.com/apache/spark/pull/10197#issuecomment-163102160
  
    Hmm, I think its exactly-once semantics should be as same as what DirectKafkaInputDStream does. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the pull request:

    https://github.com/apache/spark/pull/10197#issuecomment-163064821
  
    retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10197#issuecomment-164174221
  
    **[Test build #47614 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47614/consoleFull)** for PR 10197 at commit [`c590f5e`](https://github.com/apache/spark/commit/c590f5e3760b6ff1109da920766691615e8dc514).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:\n  * `trait KafkaDirect extends Logging `\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10197#issuecomment-164174223
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10197#issuecomment-164212134
  
    **[Test build #47619 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47619/consoleFull)** for PR 10197 at commit [`b68003c`](https://github.com/apache/spark/commit/b68003cc8088a1f2d2112c1c8332c545ab0bbc17).
     * This patch **fails MiMa tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:\n  * `trait KafkaDirect extends Logging `\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the pull request:

    https://github.com/apache/spark/pull/10197#issuecomment-164172702
  
    I refactored it to reuse most of current codes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10197#issuecomment-162904331
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10197#issuecomment-164204585
  
    **[Test build #47617 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47617/consoleFull)** for PR 10197 at commit [`e69b791`](https://github.com/apache/spark/commit/e69b791a06cc4499620ed0b5df82c8d7f48c7f1d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

Posted by viirya <gi...@git.apache.org>.
GitHub user viirya reopened a pull request:

    https://github.com/apache/spark/pull/10197

    [SPARK-12203][STREAMING] Add KafkaDirectInputDStream

    JIRA: https://issues.apache.org/jira/browse/SPARK-12203
    
    Currently, we have DirectKafkaInputDStream, which directly pulls messages from Kafka Brokers without any receivers, and KafkaInputDStream, which pulls messages from a Kafka Broker using receivers with zookeeper.
    
    As we observed, because DirectKafkaInputDStream retrieves messages from Kafka after each batch finishes, it posts a latency compared with KafkaInputDStream that continues to pull messages during each batch window.
    
    So we try to add KafkaDirectInputDStream that directly pulls messages from Kafka Brokers as DirectKafkaInputDStream, but it uses receivers as KafkaInputDStream and pulls messages as blocks during each batch window.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/viirya/spark-1 kafka-direct-receiver

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/10197.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #10197
    
----
commit 43351d134d007769f36b097601d46eb1cba62715
Author: Liang-Chi Hsieh <vi...@appier.com>
Date:   2015-12-08T09:43:57Z

    Add KafkaDirectInputDStream.

commit 3e3e1b821093e4cdf120493d22d2ce0d517f5263
Author: Liang-Chi Hsieh <vi...@appier.com>
Date:   2015-12-08T09:58:23Z

    Remove unnecessary spaces.

commit a39766272356474eed82407f4c84fa31b0734c33
Author: Liang-Chi Hsieh <vi...@appier.com>
Date:   2015-12-08T14:28:57Z

    Fix scala style.

commit c590f5e3760b6ff1109da920766691615e8dc514
Author: Liang-Chi Hsieh <vi...@appier.com>
Date:   2015-12-12T18:00:13Z

    Refactor it to reuse most of codes.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the pull request:

    https://github.com/apache/spark/pull/10197#issuecomment-163089826
  
    So what is the advantage compared to use high-level API? From the point of pulling messages from Kafka, I think there's no difference between high-level API and `SimpleConsumer` except offset related things.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10197#issuecomment-164204636
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10197#issuecomment-162904333
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47338/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10197#issuecomment-164218581
  
    **[Test build #47621 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47621/consoleFull)** for PR 10197 at commit [`e5d0a2b`](https://github.com/apache/spark/commit/e5d0a2b47e5930ded9b728bcb994872a579f9823).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10197#issuecomment-164210138
  
    **[Test build #47619 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47619/consoleFull)** for PR 10197 at commit [`b68003c`](https://github.com/apache/spark/commit/b68003cc8088a1f2d2112c1c8332c545ab0bbc17).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10197#issuecomment-162840628
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47329/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10197#issuecomment-162840624
  
    **[Test build #47329 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47329/consoleFull)** for PR 10197 at commit [`3e3e1b8`](https://github.com/apache/spark/commit/3e3e1b821093e4cdf120493d22d2ce0d517f5263).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:\n  * `public class JavaBinarizerExample `\n  * `public class JavaBucketizerExample `\n  * `public class JavaDCTExample `\n  * `public class JavaElementwiseProductExample `\n  * `public class JavaMinMaxScalerExample `\n  * `public class JavaNGramExample `\n  * `public class JavaNormalizerExample `\n  * `public class JavaOneHotEncoderExample `\n  * `public class JavaPCAExample `\n  * `public class JavaPolynomialExpansionExample `\n  * `public class JavaRFormulaExample `\n  * `public class JavaSQLTransformerExample `\n  * `public class JavaStandardScalerExample `\n  * `public class JavaStopWordsRemoverExample `\n  * `public class JavaStringIndexerExample `\n  * `public class JavaTokenizerExample `\n  * `public class JavaVectorAssemblerExample `\n  * `public class JavaVectorIndexerExample `\n  * `public class JavaVectorSlicerExample `\n  * `final class DecisionTreeClassifier @Since(\"1.4.0\") (`\n  * `final class GBTCl
 assifier @Since(\"1.4.0\") (`\n  * `class LogisticRegression @Since(\"1.2.0\") (`\n  * `class MultilayerPerceptronClassifier @Since(\"1.5.0\") (`\n  * `class NaiveBayes @Since(\"1.5.0\") (`\n  * `final class OneVsRest @Since(\"1.4.0\") (`\n  * `final class RandomForestClassifier @Since(\"1.4.0\") (`\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the pull request:

    https://github.com/apache/spark/pull/10197#issuecomment-163092418
  
    We need the exactly once feature of DirectKafkaInputDStream. But we observed that it introduces the  latency compared with KafkaInputDStream due to its implementation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10197#issuecomment-164228480
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47621/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the pull request:

    https://github.com/apache/spark/pull/10197#issuecomment-163313509
  
    As I can see from the implementation, the reason direct stream has some latency is because it is going to generate the rdd after each batch window finishes. So it certainly introduces extra latency compared with receiver-based KafkaInputDStream which continues to produce blocks that are formed the base of rdd later. The latency will grow as you increase your batch duration as it takes longer to generate the rdd.
    
    The codes are mostly as same as receiver-based input dstream and DirectKafkaInputDStream. Maybe we can refactor it later to share most of the codes.
    
    For multiple receivers, different receivers should handle different topic-partitions.
    
    You are right. It should not silently catche these exceptions. This should be fixed. The checkpoint description should be removed too.
    
    @koeninger  Thanks for comments. I am not sure if this is useful to others. But for our cases, we need a Kafka input dstream which has exactly once feature and does not introduce latency as DirectKafkaInputDStream did. Observed on our tests, it actually helps reduce the extra latency.
    
    As it is very initial attempt, I will close it now and see if we can reopen it later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org