You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by tdas <gi...@git.apache.org> on 2014/04/25 22:04:38 UTC

[GitHub] spark pull request: [SPARK-1022

GitHub user tdas opened a pull request:

    https://github.com/apache/spark/pull/557

    [SPARK-1022

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tdas/spark kafka-unit-test

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/557.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #557
    
----
commit 13acf26273ff482db76c27b353c733f9a8ce8a4a
Author: jerryshao <sa...@intel.com>
Date:   2014-01-24T10:51:21Z

    Add real Kafka streaming test
    
    Conflicts:
    	external/kafka/src/test/java/org/apache/spark/streaming/kafka/JavaKafkaStreamSuite.java
    	external/kafka/src/test/scala/org/apache/spark/streaming/kafka/KafkaStreamSuite.scala
    	project/SparkBuild.scala

commit 21769da9984a5fe2591ad68081c2b497b9facd40
Author: jerryshao <sa...@intel.com>
Date:   2014-02-10T02:09:43Z

    Minor style changes, and tests ignored due to flakiness
    
    Conflicts:
    	project/SparkBuild.scala

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1022] Kafka unit test that actually sen...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the pull request:

    https://github.com/apache/spark/pull/557#issuecomment-49718360
  
    Hi TD, I've fixed this flaky issue in my local test, how should I submit my patch, should I push to your branch or create a new PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1022] Kafka unit test that actually sen...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the pull request:

    https://github.com/apache/spark/pull/557#issuecomment-48683925
  
    Yes, I did revisit this patch again long ago, but as you mentioned it is so flaky. I guess the main reason is that our Kafka API call is in high level which involves Zookeeper's behavior (unknown latency or others), so our unit test cannot precisely reproduce the result each time. I think if we shift to some low level API, we can have more power to control the behavior,  this problem can be easily fixed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1022] Kafka unit test that actually sen...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/557#issuecomment-41434156
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1022] Kafka unit test that actually sen...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the pull request:

    https://github.com/apache/spark/pull/557#issuecomment-50983733
  
    Yeah,  I will submit to new PR about this ASAP :).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1022] Kafka unit test that actually sen...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/557


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1022] Kafka unit test that actually sen...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the pull request:

    https://github.com/apache/spark/pull/557#issuecomment-50704077
  
    Yeah, will merge #1508 as soon as the tests pass.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1022] Kafka unit test that actually sen...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the pull request:

    https://github.com/apache/spark/pull/557#issuecomment-50874683
  
    @jerryshao  Since #1508  has been merged, can you update this and test it to make sure this works?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1022] Kafka unit test that actually sen...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the pull request:

    https://github.com/apache/spark/pull/557#issuecomment-41636650
  
    Hi TD, the exception (method not found in ZkClient) is introduced by zkclient-0.1.jar. Actually Kafka 0.8 relies on zkclient-0.3.jar which will be automatically imported by Kafka. So by removing the dependency line in build profile: `"com.github.sgroschupf"    % "zkclient"   % "0.1"          excludeAll(excludeNetty),` can remove this exception.
    
    As tests seems to fail unpredictably, it also confuses me a lot, I'm not sure it's the problem of producer do not produce any data, or the problem of receiver do not receive any data. I will trace it to see the details if I can.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1022] Kafka unit test that actually sen...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the pull request:

    https://github.com/apache/spark/pull/557#issuecomment-50939365
  
    Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1022] Kafka unit test that actually sen...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/557#issuecomment-41434147
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1022] Kafka unit test that actually sen...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the pull request:

    https://github.com/apache/spark/pull/557#issuecomment-48667093
  
    @jerryshao I know this is old stuff but did you get  a chance to take a look into this? If not, it would be great if you can while working on other kafka + streaming stuff. :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1022] Kafka unit test that actually sen...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the pull request:

    https://github.com/apache/spark/pull/557#issuecomment-41434251
  
    @jerryshao It would be great if you can take a look at this sometime. The tests seems to fail 1 in 3 times as the receiver does not receive any data. Also there are a number of warning and errors in the log when the test are run, some of which are particularly disturbing (method not found in ZkClient!)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1022] Kafka unit test that actually sen...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the pull request:

    https://github.com/apache/spark/pull/557#issuecomment-50703783
  
    I will create another PR, thanks a lot. I think it would be better to merge (https://github.com/apache/spark/pull/1508) before, so we can also test the Java API correctly. Thanks a lot.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1022] Kafka unit test that actually sen...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/557#issuecomment-41438790
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14491/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1022] Kafka unit test that actually sen...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the pull request:

    https://github.com/apache/spark/pull/557#issuecomment-48694168
  
    I'm not super familiar with Kafka's internal, so this unit test is mostly taken Kafka's unit test as references, there might be some miss use without diving into the details deeply. I'll try to fix this, really depends on my knowledge of Kafka :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1022] Kafka unit test that actually sen...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the pull request:

    https://github.com/apache/spark/pull/557#issuecomment-50703510
  
    I think the main reason is that some times when we feed our test data to Kafka, the KafkaReceiver is still starting, after the receiver started, it will ignore the previous sent data and read from the end of the queue (because of the behavior Kafka 0.8's auto.offset.reset=largest), so previous sent test data will never be consumed, and the test will be failed. So the simple way is to sleep for several seconds after `ssc.start()` and then feed the data.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1022] Kafka unit test that actually sen...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/557#issuecomment-41438785
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1022] Kafka unit test that actually sen...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the pull request:

    https://github.com/apache/spark/pull/557#issuecomment-50680831
  
    Dang, I missed this update. Go ahead and make a new PR. I can close this PR. 
    Incidentally, what was the issue? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1022] Kafka unit test that actually sen...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the pull request:

    https://github.com/apache/spark/pull/557#issuecomment-41489049
  
    Hi TD, thanks for bringing it out, I will checkout your branch to see the details.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1022] Kafka unit test that actually sen...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the pull request:

    https://github.com/apache/spark/pull/557#issuecomment-48690832
  
    Its weird that zookeeper behavior is so unpredictable that this test would
    be so flaky. If we make the low-level API we should still keep the
    high-level API for backward compatibility for a while, so it would be good
    if a fix for this can be figured out.
    
    TD
    
    
    On Thu, Jul 10, 2014 at 6:07 PM, Saisai Shao <no...@github.com>
    wrote:
    
    > Yes, I did revisit this patch again long ago, but as you mentioned it is
    > so flaky. I guess the main reason is that our Kafka API call is in high
    > level which involves Zookeeper's behavior (unknown latency or others), so
    > our unit test cannot precisely reproduce the result each time. I think if
    > we shift to some low level API, we can have more power to control the
    > behavior, this problem can be easily fixed.
    >
    > —
    > Reply to this email directly or view it on GitHub
    > <https://github.com/apache/spark/pull/557#issuecomment-48683925>.
    >


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---