You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by daroo <gi...@git.apache.org> on 2017/11/20 17:05:45 UTC

[GitHub] spark pull request #19789: [SPARK-22562][Streaming] CachedKafkaConsumer unsa...

GitHub user daroo opened a pull request:

    https://github.com/apache/spark/pull/19789

    [SPARK-22562][Streaming] CachedKafkaConsumer unsafe eviction from cache

    ## What changes were proposed in this pull request?
    Fixes a problem when one thread wants to add a new consumer into fully packed cache and another one still uses an instance of cached consumer which is marked for eviction. In such cases underlying KafkaConsumer throws ConcurrentModificationException.
    
    My solution is to always remove the eldest consumer from the cache, but sometimes delay calling close() method (in separate thread) until is no longer used (released) by KafkaRDDIterator
    
    ## How was this patch tested?
    Any ideas how to write good unit test to cover this are more than welcome. In the meantime I'll try to run the code on our DEV env for a longer period of time.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/daroo/spark SPARK-22562

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19789.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19789
    
----
commit 9b16ddd723bc3dc324c33bd39a4aa2b065e926b1
Author: Dariusz Szablinski <da...@ig.com>
Date:   2017-11-20T16:47:29Z

    [SPARK-22562][Streaming] CachedKafkaConsumer unsafe eviction from cache

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19789: [SPARK-22562][Streaming] CachedKafkaConsumer unsafe evic...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19789
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90551/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19789: [SPARK-22562][Streaming] CachedKafkaConsumer unsafe evic...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19789
  
    **[Test build #85575 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85575/testReport)** for PR 19789 at commit [`7700bf0`](https://github.com/apache/spark/commit/7700bf02c06d6fe3909053a1e146569a67b21de3).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19789: [SPARK-22562][Streaming] CachedKafkaConsumer unsafe evic...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19789
  
    **[Test build #85575 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85575/testReport)** for PR 19789 at commit [`7700bf0`](https://github.com/apache/spark/commit/7700bf02c06d6fe3909053a1e146569a67b21de3).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19789: [SPARK-22562][Streaming] CachedKafkaConsumer unsafe evic...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19789
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84143/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19789: [SPARK-22562][Streaming] CachedKafkaConsumer unsa...

Posted by koeninger <gi...@git.apache.org>.
Github user koeninger commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19789#discussion_r152056775
  
    --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/CachedKafkaConsumer.scala ---
    @@ -155,11 +178,11 @@ object CachedKafkaConsumer extends Logging {
             logInfo(s"Cache miss for $k")
             logDebug(cache.keySet.toString)
             val c = new CachedKafkaConsumer[K, V](groupId, topic, partition, kafkaParams)
    -        cache.put(k, c)
    +        cache.put(k, c.acquireAndGet())
             c
           } else {
             // any given topicpartition should have a consistent key and value type
    -        v.asInstanceOf[CachedKafkaConsumer[K, V]]
    +        v.acquireAndGet().asInstanceOf[CachedKafkaConsumer[K, V]]
    --- End diff --
    
    Shouldn't this method call be after the cast?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19789: [SPARK-22562][Streaming] CachedKafkaConsumer unsafe evic...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19789
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19789: [SPARK-22562][Streaming] CachedKafkaConsumer unsa...

Posted by daroo <gi...@git.apache.org>.
Github user daroo commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19789#discussion_r152060617
  
    --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/CachedKafkaConsumer.scala ---
    @@ -155,11 +178,11 @@ object CachedKafkaConsumer extends Logging {
             logInfo(s"Cache miss for $k")
             logDebug(cache.keySet.toString)
             val c = new CachedKafkaConsumer[K, V](groupId, topic, partition, kafkaParams)
    -        cache.put(k, c)
    +        cache.put(k, c.acquireAndGet())
             c
           } else {
             // any given topicpartition should have a consistent key and value type
    -        v.asInstanceOf[CachedKafkaConsumer[K, V]]
    +        v.acquireAndGet().asInstanceOf[CachedKafkaConsumer[K, V]]
    --- End diff --
    
    I can be, but actually it doesn't matter as the cache value has CachedKafkaConsumer[_, _] type


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19789: [SPARK-22562][Streaming] CachedKafkaConsumer unsafe evic...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19789
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19789: [SPARK-22562][Streaming] CachedKafkaConsumer unsafe evic...

Posted by koeninger <gi...@git.apache.org>.
Github user koeninger commented on the issue:

    https://github.com/apache/spark/pull/19789
  
    Seems reasonable.
    
    On Wed, Nov 22, 2017 at 1:52 PM, Daroo <no...@github.com> wrote:
    
    > It fails on the current master branch and doesn't after the patch
    >
    > —
    > You are receiving this because you were mentioned.
    > Reply to this email directly, view it on GitHub
    > <https://github.com/apache/spark/pull/19789#issuecomment-346456558>, or mute
    > the thread
    > <https://github.com/notifications/unsubscribe-auth/AAGAB_y8Z7XRTNv08wpbZh3-UN2HEwA2ks5s5HuFgaJpZM4QklrV>
    > .
    >



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19789: [SPARK-22562][Streaming] CachedKafkaConsumer unsafe evic...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19789
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19789: [SPARK-22562][Streaming] CachedKafkaConsumer unsafe evic...

Posted by daroo <gi...@git.apache.org>.
Github user daroo commented on the issue:

    https://github.com/apache/spark/pull/19789
  
    @HyukjinKwon I've looked at the latest changes in the code and I think this problem shouldn't happen (at least in practice) anymore. So the JIRA can be closed.
    One minor issue I've noticed is that the cache size can keep growing beyond configured limit in certain cases.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19789: [SPARK-22562][Streaming] CachedKafkaConsumer unsafe evic...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19789
  
    **[Test build #84143 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84143/testReport)** for PR 19789 at commit [`c2c3ed9`](https://github.com/apache/spark/commit/c2c3ed9e2e672b3b9205a824b8cdd2f9ca2131eb).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19789: [SPARK-22562][Streaming] CachedKafkaConsumer unsafe evic...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19789
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19789: [SPARK-22562][Streaming] CachedKafkaConsumer unsafe evic...

Posted by daroo <gi...@git.apache.org>.
Github user daroo commented on the issue:

    https://github.com/apache/spark/pull/19789
  
    It fails on the current master branch and doesn't after the patch


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19789: [SPARK-22562][Streaming] CachedKafkaConsumer unsa...

Posted by daroo <gi...@git.apache.org>.
Github user daroo commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19789#discussion_r152951885
  
    --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaRDD.scala ---
    @@ -211,8 +211,8 @@ private[spark] class KafkaRDD[K, V](
         var requestOffset = part.fromOffset
     
         def closeIfNeeded(): Unit = {
    -      if (!useConsumerCache && consumer != null) {
    -        consumer.close
    +      if (consumer != null) {
    +          consumer.close()
    --- End diff --
    
    formatting changed


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19789: [SPARK-22562][Streaming] CachedKafkaConsumer unsafe evic...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/19789
  
    ok to test


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19789: [SPARK-22562][Streaming] CachedKafkaConsumer unsafe evic...

Posted by daroo <gi...@git.apache.org>.
Github user daroo commented on the issue:

    https://github.com/apache/spark/pull/19789
  
    sure :-)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19789: [SPARK-22562][Streaming] CachedKafkaConsumer unsafe evic...

Posted by koeninger <gi...@git.apache.org>.
Github user koeninger commented on the issue:

    https://github.com/apache/spark/pull/19789
  
    ok to test
    
    On Wed, Nov 22, 2017 at 2:49 PM, Daroo <no...@github.com> wrote:
    
    > Cool. Could you please authorize it for testing?
    >
    > —
    > You are receiving this because you were mentioned.
    > Reply to this email directly, view it on GitHub
    > <https://github.com/apache/spark/pull/19789#issuecomment-346469044>, or mute
    > the thread
    > <https://github.com/notifications/unsubscribe-auth/AAGAB8S6e62SscDmZ6tM3eTVPlHTj6nQks5s5IjkgaJpZM4QklrV>
    > .
    >



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19789: [SPARK-22562][Streaming] CachedKafkaConsumer unsafe evic...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19789
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19789: [SPARK-22562][Streaming] CachedKafkaConsumer unsafe evic...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/19789
  
    cc @marmbrus @zsxwing 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19789: [SPARK-22562][Streaming] CachedKafkaConsumer unsafe evic...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19789
  
    **[Test build #90551 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90551/testReport)** for PR 19789 at commit [`639f11c`](https://github.com/apache/spark/commit/639f11c9fd856f58d067036f9b9131ea9efdc13d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19789: [SPARK-22562][Streaming] CachedKafkaConsumer unsafe evic...

Posted by daroo <gi...@git.apache.org>.
Github user daroo commented on the issue:

    https://github.com/apache/spark/pull/19789
  
    Hey @koeninger ,
    could you please have a look at my PR?
    
    Thanx a lot,
    Darek


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19789: [SPARK-22562][Streaming] CachedKafkaConsumer unsafe evic...

Posted by daroo <gi...@git.apache.org>.
Github user daroo commented on the issue:

    https://github.com/apache/spark/pull/19789
  
    Hi @HyukjinKwon
    
    It seems that @zsxwing is not really interested in this PR. Should I cancel it and close https://issues.apache.org/jira/browse/SPARK-22562 ?
    
    BTW. Why AppVeyor CI build takes so looooong time? The build has failed due to reaching max allowed time (1,5h) :-(



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19789: [SPARK-22562][Streaming] CachedKafkaConsumer unsafe evic...

Posted by koeninger <gi...@git.apache.org>.
Github user koeninger commented on the issue:

    https://github.com/apache/spark/pull/19789
  
    Seems reasonable to me but you should probably ask zsxwing if it fits in
    with plans for the structured streaming kafka code.
    
    On Thu, Nov 23, 2017 at 10:23 PM, Hyukjin Kwon <no...@github.com>
    wrote:
    
    > *@HyukjinKwon* commented on this pull request.
    >
    > Does this look good to you @koeninger <https://github.com/koeninger>?
    > ------------------------------
    >
    > In external/kafka-0-10/src/main/scala/org/apache/spark/
    > streaming/kafka010/KafkaRDD.scala
    > <https://github.com/apache/spark/pull/19789#discussion_r152895550>:
    >
    > > @@ -211,8 +211,8 @@ private[spark] class KafkaRDD[K, V](
    >      var requestOffset = part.fromOffset
    >
    >      def closeIfNeeded(): Unit = {
    > -      if (!useConsumerCache && consumer != null) {
    > -        consumer.close
    > +      if (consumer != null) {
    > +          consumer.close()
    >
    > I think this should be double spaced
    >
    > —
    > You are receiving this because you were mentioned.
    > Reply to this email directly, view it on GitHub
    > <https://github.com/apache/spark/pull/19789#pullrequestreview-78827551>,
    > or mute the thread
    > <https://github.com/notifications/unsubscribe-auth/AAGAB7EEFdiDoEY3ov2wmiDWrnCZN4gqks5s5kTLgaJpZM4QklrV>
    > .
    >



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19789: [SPARK-22562][Streaming] CachedKafkaConsumer unsafe evic...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/19789
  
    cc @zsxwing 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19789: [SPARK-22562][Streaming] CachedKafkaConsumer unsafe evic...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19789
  
    **[Test build #84162 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84162/testReport)** for PR 19789 at commit [`7700bf0`](https://github.com/apache/spark/commit/7700bf02c06d6fe3909053a1e146569a67b21de3).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19789: [SPARK-22562][Streaming] CachedKafkaConsumer unsafe evic...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19789
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19789: [SPARK-22562][Streaming] CachedKafkaConsumer unsafe evic...

Posted by koeninger <gi...@git.apache.org>.
Github user koeninger commented on the issue:

    https://github.com/apache/spark/pull/19789
  
    My main comment is that if you have a situation where there's actually contention on the size of the cache, chances are things are going to be screwed up anyway due to consumers being recreated and losing any benefit of buffering.
    
    Are you running into this issue in situations where the cache is appropriately sized given the number of topicpartitions and executors?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19789: [SPARK-22562][Streaming] CachedKafkaConsumer unsafe evic...

Posted by daroo <gi...@git.apache.org>.
Github user daroo commented on the issue:

    https://github.com/apache/spark/pull/19789
  
    @HyukjinKwon
    @zsxwing 
    
    Any chance to go ahead with this?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19789: [SPARK-22562][Streaming] CachedKafkaConsumer unsa...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19789#discussion_r152895550
  
    --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaRDD.scala ---
    @@ -211,8 +211,8 @@ private[spark] class KafkaRDD[K, V](
         var requestOffset = part.fromOffset
     
         def closeIfNeeded(): Unit = {
    -      if (!useConsumerCache && consumer != null) {
    -        consumer.close
    +      if (consumer != null) {
    +          consumer.close()
    --- End diff --
    
    I think this should be double spaced


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19789: [SPARK-22562][Streaming] CachedKafkaConsumer unsafe evic...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19789
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19789: [SPARK-22562][Streaming] CachedKafkaConsumer unsafe evic...

Posted by daroo <gi...@git.apache.org>.
Github user daroo commented on the issue:

    https://github.com/apache/spark/pull/19789
  
    Hey Spark commiters @jiangxb1987 @srowen @cloud-fan @HyukjinKwon
    
    Could you please authorize my PR for testing?
    
    Thanks
    Darek


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19789: [SPARK-22562][Streaming] CachedKafkaConsumer unsafe evic...

Posted by koeninger <gi...@git.apache.org>.
Github user koeninger commented on the issue:

    https://github.com/apache/spark/pull/19789
  
    Yeah, subscribepattern could definitely be an issue.
    
    As far as unit testing, have you tried anything along the lines of setting
    the cache size artificially low and then introducing new topicpartitions?
    
    On Mon, Nov 20, 2017 at 11:43 AM, Daroo <no...@github.com> wrote:
    
    > I see you point, but it's kind of difficult to size it properly when you
    > use SubscribePattern (i.e. dynamic number of topics/partitions) consumer
    > strategy
    >
    > —
    > You are receiving this because you were mentioned.
    > Reply to this email directly, view it on GitHub
    > <https://github.com/apache/spark/pull/19789#issuecomment-345772062>, or mute
    > the thread
    > <https://github.com/notifications/unsubscribe-auth/AAGABxwDwzBPfH-VH8BoU8g3BjyMj_JQks5s4bpYgaJpZM4QklrV>
    > .
    >



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19789: [SPARK-22562][Streaming] CachedKafkaConsumer unsafe evic...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19789
  
    **[Test build #84162 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84162/testReport)** for PR 19789 at commit [`7700bf0`](https://github.com/apache/spark/commit/7700bf02c06d6fe3909053a1e146569a67b21de3).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19789: [SPARK-22562][Streaming] CachedKafkaConsumer unsafe evic...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/19789
  
    ok to test


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19789: [SPARK-22562][Streaming] CachedKafkaConsumer unsa...

Posted by daroo <gi...@git.apache.org>.
Github user daroo closed the pull request at:

    https://github.com/apache/spark/pull/19789


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19789: [SPARK-22562][Streaming] CachedKafkaConsumer unsafe evic...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/19789
  
    I think we should wait for @zsxwing's feedback.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19789: [SPARK-22562][Streaming] CachedKafkaConsumer unsafe evic...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19789
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19789: [SPARK-22562][Streaming] CachedKafkaConsumer unsafe evic...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19789
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85575/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19789: [SPARK-22562][Streaming] CachedKafkaConsumer unsafe evic...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19789
  
    **[Test build #90551 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90551/testReport)** for PR 19789 at commit [`639f11c`](https://github.com/apache/spark/commit/639f11c9fd856f58d067036f9b9131ea9efdc13d).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19789: [SPARK-22562][Streaming] CachedKafkaConsumer unsafe evic...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/19789
  
    @daroo, mind reopening this if you have some time to update?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19789: [SPARK-22562][Streaming] CachedKafkaConsumer unsafe evic...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19789
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84162/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19789: [SPARK-22562][Streaming] CachedKafkaConsumer unsafe evic...

Posted by daroo <gi...@git.apache.org>.
Github user daroo commented on the issue:

    https://github.com/apache/spark/pull/19789
  
    I don't know what are the plans for the structured streaming which @zsxwing has in mind, but before I created the PR I actually had seen how this problem is currently solved in kafka-0-10-sql module and I didn't like it. My main concern is that in certain cases the cache size my grow well beyond configured spark.sql.kafkaConsumerCache.capacity. That’s why I’ve chosen to do it in a bit different way.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19789: [SPARK-22562][Streaming] CachedKafkaConsumer unsafe evic...

Posted by daroo <gi...@git.apache.org>.
Github user daroo commented on the issue:

    https://github.com/apache/spark/pull/19789
  
    It seems that your "magic spell" didn't work. No build was triggered 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19789: [SPARK-22562][Streaming] CachedKafkaConsumer unsafe evic...

Posted by koeninger <gi...@git.apache.org>.
Github user koeninger commented on the issue:

    https://github.com/apache/spark/pull/19789
  
    You'll need to get a commiter's attention to merge it anyway
    
    On Nov 23, 2017 01:48, "Daroo" <no...@github.com> wrote:
    
    It seems that your "magic spell" didn't work. No build was triggered
    
    —
    You are receiving this because you were mentioned.
    Reply to this email directly, view it on GitHub
    <https://github.com/apache/spark/pull/19789#issuecomment-346548898>, or mute
    the thread
    <https://github.com/notifications/unsubscribe-auth/AAGAB4Atem0hApHVaY_o1F8ksWNy6u0Lks5s5SNYgaJpZM4QklrV>
    .



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19789: [SPARK-22562][Streaming] CachedKafkaConsumer unsafe evic...

Posted by daroo <gi...@git.apache.org>.
Github user daroo commented on the issue:

    https://github.com/apache/spark/pull/19789
  
    Hey @zsxwing
    
    Could you please have a look at my PR? It's been a month almost since I opened it. @koeninger said that it looks good to him, but everyone else is waiting for your feedback.
    
    Thanks a lot,
    Darek


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19789: [SPARK-22562][Streaming] CachedKafkaConsumer unsafe evic...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/19789
  
    ok to test


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19789: [SPARK-22562][Streaming] CachedKafkaConsumer unsafe evic...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/19789
  
    @daroo thanks for checking it.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19789: [SPARK-22562][Streaming] CachedKafkaConsumer unsafe evic...

Posted by daroo <gi...@git.apache.org>.
Github user daroo commented on the issue:

    https://github.com/apache/spark/pull/19789
  
    Cool. Could you please authorize it for testing?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19789: [SPARK-22562][Streaming] CachedKafkaConsumer unsafe evic...

Posted by daroo <gi...@git.apache.org>.
Github user daroo commented on the issue:

    https://github.com/apache/spark/pull/19789
  
    I've added a test. @koeninger is it something you had in mind?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19789: [SPARK-22562][Streaming] CachedKafkaConsumer unsafe evic...

Posted by koeninger <gi...@git.apache.org>.
Github user koeninger commented on the issue:

    https://github.com/apache/spark/pull/19789
  
    What are you actually asserting in that test and/or does it reliably fail
    if run on the version of your code before the patch?
    
    On Wed, Nov 22, 2017 at 1:33 PM, Daroo <no...@github.com> wrote:
    
    > I've added a test. @koeninger <https://github.com/koeninger> is it
    > something you had in mind?
    >
    > —
    > You are receiving this because you were mentioned.
    > Reply to this email directly, view it on GitHub
    > <https://github.com/apache/spark/pull/19789#issuecomment-346452116>, or mute
    > the thread
    > <https://github.com/notifications/unsubscribe-auth/AAGAB_Sz5wdH2nf4eM0z-DMwAInQD0MQks5s5HclgaJpZM4QklrV>
    > .
    >



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19789: [SPARK-22562][Streaming] CachedKafkaConsumer unsafe evic...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19789
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19789: [SPARK-22562][Streaming] CachedKafkaConsumer unsafe evic...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19789
  
    **[Test build #84143 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84143/testReport)** for PR 19789 at commit [`c2c3ed9`](https://github.com/apache/spark/commit/c2c3ed9e2e672b3b9205a824b8cdd2f9ca2131eb).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class CachedKafkaConsumerSuite extends SparkFunSuite with BeforeAndAfterAll `


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19789: [SPARK-22562][Streaming] CachedKafkaConsumer unsafe evic...

Posted by daroo <gi...@git.apache.org>.
Github user daroo commented on the issue:

    https://github.com/apache/spark/pull/19789
  
    I see you point, but it's kind of difficult to size it properly when you use SubscribePattern (i.e. dynamic number of topics/partitions) consumer strategy


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org