You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by tdas <gi...@git.apache.org> on 2014/04/25 22:04:38 UTC
[GitHub] spark pull request: [SPARK-1022
GitHub user tdas opened a pull request:
https://github.com/apache/spark/pull/557
[SPARK-1022
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/tdas/spark kafka-unit-test
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/557.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #557
----
commit 13acf26273ff482db76c27b353c733f9a8ce8a4a
Author: jerryshao <sa...@intel.com>
Date: 2014-01-24T10:51:21Z
Add real Kafka streaming test
Conflicts:
external/kafka/src/test/java/org/apache/spark/streaming/kafka/JavaKafkaStreamSuite.java
external/kafka/src/test/scala/org/apache/spark/streaming/kafka/KafkaStreamSuite.scala
project/SparkBuild.scala
commit 21769da9984a5fe2591ad68081c2b497b9facd40
Author: jerryshao <sa...@intel.com>
Date: 2014-02-10T02:09:43Z
Minor style changes, and tests ignored due to flakiness
Conflicts:
project/SparkBuild.scala
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1022] Kafka unit test that actually sen...
Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the pull request:
https://github.com/apache/spark/pull/557#issuecomment-49718360
Hi TD, I've fixed this flaky issue in my local test, how should I submit my patch, should I push to your branch or create a new PR?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1022] Kafka unit test that actually sen...
Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the pull request:
https://github.com/apache/spark/pull/557#issuecomment-48683925
Yes, I did revisit this patch again long ago, but as you mentioned it is so flaky. I guess the main reason is that our Kafka API call is in high level which involves Zookeeper's behavior (unknown latency or others), so our unit test cannot precisely reproduce the result each time. I think if we shift to some low level API, we can have more power to control the behavior, this problem can be easily fixed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1022] Kafka unit test that actually sen...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/557#issuecomment-41434156
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1022] Kafka unit test that actually sen...
Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the pull request:
https://github.com/apache/spark/pull/557#issuecomment-50983733
Yeah, I will submit to new PR about this ASAP :).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-1022] Kafka unit test that actually sen...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/557
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-1022] Kafka unit test that actually sen...
Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the pull request:
https://github.com/apache/spark/pull/557#issuecomment-50704077
Yeah, will merge #1508 as soon as the tests pass.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1022] Kafka unit test that actually sen...
Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the pull request:
https://github.com/apache/spark/pull/557#issuecomment-50874683
@jerryshao Since #1508 has been merged, can you update this and test it to make sure this works?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1022] Kafka unit test that actually sen...
Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the pull request:
https://github.com/apache/spark/pull/557#issuecomment-41636650
Hi TD, the exception (method not found in ZkClient) is introduced by zkclient-0.1.jar. Actually Kafka 0.8 relies on zkclient-0.3.jar which will be automatically imported by Kafka. So by removing the dependency line in build profile: `"com.github.sgroschupf" % "zkclient" % "0.1" excludeAll(excludeNetty),` can remove this exception.
As tests seems to fail unpredictably, it also confuses me a lot, I'm not sure it's the problem of producer do not produce any data, or the problem of receiver do not receive any data. I will trace it to see the details if I can.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1022] Kafka unit test that actually sen...
Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the pull request:
https://github.com/apache/spark/pull/557#issuecomment-50939365
Jenkins, test this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1022] Kafka unit test that actually sen...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/557#issuecomment-41434147
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1022] Kafka unit test that actually sen...
Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the pull request:
https://github.com/apache/spark/pull/557#issuecomment-48667093
@jerryshao I know this is old stuff but did you get a chance to take a look into this? If not, it would be great if you can while working on other kafka + streaming stuff. :)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1022] Kafka unit test that actually sen...
Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the pull request:
https://github.com/apache/spark/pull/557#issuecomment-41434251
@jerryshao It would be great if you can take a look at this sometime. The tests seems to fail 1 in 3 times as the receiver does not receive any data. Also there are a number of warning and errors in the log when the test are run, some of which are particularly disturbing (method not found in ZkClient!)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1022] Kafka unit test that actually sen...
Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the pull request:
https://github.com/apache/spark/pull/557#issuecomment-50703783
I will create another PR, thanks a lot. I think it would be better to merge (https://github.com/apache/spark/pull/1508) before, so we can also test the Java API correctly. Thanks a lot.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1022] Kafka unit test that actually sen...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/557#issuecomment-41438790
All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14491/
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1022] Kafka unit test that actually sen...
Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the pull request:
https://github.com/apache/spark/pull/557#issuecomment-48694168
I'm not super familiar with Kafka's internal, so this unit test is mostly taken Kafka's unit test as references, there might be some miss use without diving into the details deeply. I'll try to fix this, really depends on my knowledge of Kafka :)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1022] Kafka unit test that actually sen...
Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the pull request:
https://github.com/apache/spark/pull/557#issuecomment-50703510
I think the main reason is that some times when we feed our test data to Kafka, the KafkaReceiver is still starting, after the receiver started, it will ignore the previous sent data and read from the end of the queue (because of the behavior Kafka 0.8's auto.offset.reset=largest), so previous sent test data will never be consumed, and the test will be failed. So the simple way is to sleep for several seconds after `ssc.start()` and then feed the data.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1022] Kafka unit test that actually sen...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/557#issuecomment-41438785
Merged build finished. All automated tests passed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1022] Kafka unit test that actually sen...
Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the pull request:
https://github.com/apache/spark/pull/557#issuecomment-50680831
Dang, I missed this update. Go ahead and make a new PR. I can close this PR.
Incidentally, what was the issue?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1022] Kafka unit test that actually sen...
Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the pull request:
https://github.com/apache/spark/pull/557#issuecomment-41489049
Hi TD, thanks for bringing it out, I will checkout your branch to see the details.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1022] Kafka unit test that actually sen...
Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the pull request:
https://github.com/apache/spark/pull/557#issuecomment-48690832
Its weird that zookeeper behavior is so unpredictable that this test would
be so flaky. If we make the low-level API we should still keep the
high-level API for backward compatibility for a while, so it would be good
if a fix for this can be figured out.
TD
On Thu, Jul 10, 2014 at 6:07 PM, Saisai Shao <no...@github.com>
wrote:
> Yes, I did revisit this patch again long ago, but as you mentioned it is
> so flaky. I guess the main reason is that our Kafka API call is in high
> level which involves Zookeeper's behavior (unknown latency or others), so
> our unit test cannot precisely reproduce the result each time. I think if
> we shift to some low level API, we can have more power to control the
> behavior, this problem can be easily fixed.
>
> —
> Reply to this email directly or view it on GitHub
> <https://github.com/apache/spark/pull/557#issuecomment-48683925>.
>
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---