You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by uncleGen <gi...@git.apache.org> on 2017/01/18 03:19:08 UTC

[GitHub] spark pull request #16629: [SPARK-19185][DStream] Add more clear hint for 'C...

GitHub user uncleGen opened a pull request:

    https://github.com/apache/spark/pull/16629

    [SPARK-19185][DStream] Add more clear hint for 'ConcurrentModificationExceptions'

    ## What changes were proposed in this pull request?
    
    When same kafka partition is consumed from multiple threads, task will fail with `ConcurrentModificationExceptions`. KafkaConsumer is not safe for multi-threaded access. So, we may give a more clear hint for users when encounter problems. Besides, a new config `spark.streaming.kafka.consumer.cache.enabled` is added for users to use consumer cache or not
    
    ## How was this patch tested?
    
    existing ut


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/uncleGen/spark SPARK-19185

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/16629.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #16629
    
----
commit 384710db67fa2440f50ece4df64a8c1b996f7167
Author: uncleGen <hu...@gmail.com>
Date:   2017-01-18T03:08:49Z

    Add more clear hint for 'ConcurrentModificationExceptions'

commit b8b44ef0c62267425fb2b8ed3ab16f10d303f7ca
Author: uncleGen <hu...@gmail.com>
Date:   2017-01-18T03:18:34Z

    update

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16629: [SPARK-19185][DStream] Add more clear hint for 'Concurre...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16629
  
    **[Test build #71562 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71562/testReport)** for PR 16629 at commit [`b8b44ef`](https://github.com/apache/spark/commit/b8b44ef0c62267425fb2b8ed3ab16f10d303f7ca).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16629: [SPARK-19185][DStream] Add more clear hint for 'Concurre...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16629
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16629: [SPARK-19185][DStream] Add more clear hint for 'Concurre...

Posted by uncleGen <gi...@git.apache.org>.
Github user uncleGen commented on the issue:

    https://github.com/apache/spark/pull/16629
  
    @srowen Yes, this pr does not provide a way to support `ConsumerCache` in multi-thread, but give users a more clear hint to this issue. I think it may be more complex before we achieve a better solution.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16629: [SPARK-19185][DStream] Add more clear hint for 'C...

Posted by uncleGen <gi...@git.apache.org>.
Github user uncleGen closed the pull request at:

    https://github.com/apache/spark/pull/16629


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16629: [SPARK-19185][DStream] Add more clear hint for 'Concurre...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/16629
  
    I don't think this resolves the problem?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16629: [SPARK-19185][DStream] Add more clear hint for 'Concurre...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16629
  
    **[Test build #71562 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71562/testReport)** for PR 16629 at commit [`b8b44ef`](https://github.com/apache/spark/commit/b8b44ef0c62267425fb2b8ed3ab16f10d303f7ca).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16629: [SPARK-19185][DStream] Add more clear hint for 'Concurre...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16629
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71562/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16629: [SPARK-19185][DStream] Add more clear hint for 'Concurre...

Posted by koeninger <gi...@git.apache.org>.
Github user koeninger commented on the issue:

    https://github.com/apache/spark/pull/16629
  
    I don't think it's a problem to make disabling the cache configurable, as long as it's on by default. I don't think the additional static constructors in kafka utils are necessary, are they?
    
    I'm not sure it's a good idea to just blindly recommend people turn it off it they get that exception though, it's not that simple. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16629: [SPARK-19185][DStream] Add more clear hint for 'Concurre...

Posted by uncleGen <gi...@git.apache.org>.
Github user uncleGen commented on the issue:

    https://github.com/apache/spark/pull/16629
  
    cc @zsxwing and @koeninger 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16629: [SPARK-19185][DStream] Add more clear hint for 'Concurre...

Posted by stp008 <gi...@git.apache.org>.
Github user stp008 commented on the issue:

    https://github.com/apache/spark/pull/16629
  
    actually if u turn it off then for each batch new consumer will be created. For me the solution is to introduce pool of consumers equal to the total count of concurrent job for example


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org