You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by eatoncys <gi...@git.apache.org> on 2017/11/25 09:05:15 UTC

[GitHub] spark pull request #19819: [SPARK-22606][Streaming]Add threadId to the Cache...

GitHub user eatoncys opened a pull request:

    https://github.com/apache/spark/pull/19819

    [SPARK-22606][Streaming]Add threadId to the CachedKafkaConsumer key

    ## What changes were proposed in this pull request?
    If the value of param 'spark.streaming.concurrentJobs' is more than one, and the value of param 'spark.executor.cores' is more than one, there may be two or more tasks in one executor will use the same kafka consumer at the same time, then it will throw an exception: "KafkaConsumer is not safe for multi-threaded access";
    for example:
    spark.streaming.concurrentJobs=2
    spark.executor.cores=2
    spark.cores.max=2
    if there is only one topic with one partition('topic1',0) to consume, there will be two jobs to run at the same time, and they will use the same cacheKey('groupid','topic1',0) to get the CachedKafkaConsumer from the cache list of' private var cache: ju.LinkedHashMap[CacheKey, CachedKafkaConsumer[_, _]]' , then it will get the same CachedKafkaConsumer.
    
    this PR add threadId  to the CachedKafkaConsumer key to prevent two thread using a consumer at the same time.
    
    
    
    ## How was this patch tested?
    existing ut test


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/eatoncys/spark kafka

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19819.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19819
    
----
commit aa02d8904fcbaa91df47ac224d90345bd555a372
Author: 10129659 <ch...@zte.com.cn>
Date:   2017-11-25T08:15:17Z

    Add threadId to CachedKafkaConsumer key

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19819: [SPARK-22606][Streaming]Add threadId to the CachedKafkaC...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19819
  
    Build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19819: [SPARK-22606][Streaming]Add threadId to the CachedKafkaC...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19819
  
    **[Test build #84183 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84183/testReport)** for PR 19819 at commit [`aa02d89`](https://github.com/apache/spark/commit/aa02d8904fcbaa91df47ac224d90345bd555a372).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19819: [SPARK-22606][Streaming]Add threadId to the CachedKafkaC...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19819
  
    **[Test build #84183 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84183/testReport)** for PR 19819 at commit [`aa02d89`](https://github.com/apache/spark/commit/aa02d8904fcbaa91df47ac224d90345bd555a372).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19819: [SPARK-22606][Streaming]Add threadId to the Cache...

Posted by eatoncys <gi...@git.apache.org>.
Github user eatoncys closed the pull request at:

    https://github.com/apache/spark/pull/19819


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19819: [SPARK-22606][Streaming]Add threadId to the CachedKafkaC...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19819
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19819: [SPARK-22606][Streaming]Add threadId to the CachedKafkaC...

Posted by lvdongr <gi...@git.apache.org>.
Github user lvdongr commented on the issue:

    https://github.com/apache/spark/pull/19819
  
    I've seen your PR:  https://github.com/apache/spark/pull/20997, a good solution @gaborgsomogyi 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19819: [SPARK-22606][Streaming]Add threadId to the CachedKafkaC...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19819
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1054/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19819: [SPARK-22606][Streaming]Add threadId to the CachedKafkaC...

Posted by gaborgsomogyi <gi...@git.apache.org>.
Github user gaborgsomogyi commented on the issue:

    https://github.com/apache/spark/pull/19819
  
    I think this can be closed as the problem solved.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19819: [SPARK-22606][Streaming]Add threadId to the CachedKafkaC...

Posted by gaborgsomogyi <gi...@git.apache.org>.
Github user gaborgsomogyi commented on the issue:

    https://github.com/apache/spark/pull/19819
  
    It will create a new consumer for each thread. This could be quite resource consuming when several topics shared with thread pools.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19819: [SPARK-22606][Streaming]Add threadId to the CachedKafkaC...

Posted by lvdongr <gi...@git.apache.org>.
Github user lvdongr commented on the issue:

    https://github.com/apache/spark/pull/19819
  
    Will the cached consumer to the same partition increase , when different tasks  consume the same partition and no place to remove?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19819: [SPARK-22606][Streaming]Add threadId to the CachedKafkaC...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19819
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84183/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org