You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by koeninger <gi...@git.apache.org> on 2015/02/10 21:52:49 UTC

[GitHub] spark pull request: [SPARK-4964] [Streaming] refactor createRDD to...

GitHub user koeninger opened a pull request:

    https://github.com/apache/spark/pull/4511

    [SPARK-4964] [Streaming] refactor createRDD to take leaders via map instead of array

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/koeninger/spark-1 kafkaRdd-leader-to-broker

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/4511.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #4511
    
----
commit e9cece40539da353fb76fede46e6cc447f0fa95e
Author: cody koeninger <co...@koeninger.org>
Date:   2015-02-10T19:22:17Z

    [SPARK-4964] pass leaders as a map to ensure 1 leader per TopicPartition

commit 5173f3fd9e3191dc77e0f9adf618d29de6eebed9
Author: cody koeninger <co...@koeninger.org>
Date:   2015-02-10T20:51:06Z

    [SPARK-4964] test the Java variations of createRDD

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4964] [Streaming] refactor createRDD to...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4511#discussion_r24454369
  
    --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala ---
    @@ -211,12 +220,17 @@ object KafkaUtils {
           sc: SparkContext,
           kafkaParams: Map[String, String],
           offsetRanges: Array[OffsetRange],
    -      leaders: Array[Leader],
    +      leaders: Map[TopicAndPartition, Broker],
           messageHandler: MessageAndMetadata[K, V] => R
         ): RDD[R] = {
    -    val leaderMap = leaders
    -      .map(l => TopicAndPartition(l.topic, l.partition) -> (l.host, l.port))
    -      .toMap
    +    val leaderMap = if (leaders.isEmpty) {
    +      leadersForRanges(kafkaParams, offsetRanges)
    +    } else {
    +      // This could be avoided by refactoring KafkaRDD.leaders and KafkaCluster to use Broker
    --- End diff --
    
    This can be done in the future. We will need a bit of refactoring. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4964] [Streaming] refactor createRDD to...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4511#discussion_r24452877
  
    --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/Broker.scala ---
    @@ -17,41 +17,54 @@
     
     package org.apache.spark.streaming.kafka
     
    -import kafka.common.TopicAndPartition
    -
     import org.apache.spark.annotation.Experimental
     
     /**
      * :: Experimental ::
    - * Represent the host info for the leader of a Kafka partition.
    + * Represent the host and port info for a Kafka broker.
    + * Differs from the Kafka project's internal kafka.cluster.Broker, which contains a server ID
      */
     @Experimental
    -final class Leader private(
    -    /** Kafka topic name */
    -    val topic: String,
    -    /** Kafka partition id */
    -    val partition: Int,
    -    /** Leader's hostname */
    +final class Broker private(
    +    /** Broker's hostname */
         val host: String,
    -    /** Leader's port */
    -    val port: Int) extends Serializable
    +    /** Broker's port */
    +    val port: Int) extends Serializable {
    +
    +  override def equals(obj: Any): Boolean = obj match {
    +    case that: Broker =>
    +      this.host == that.host &&
    +      this.port == that.port
    +    case _ => false
    +  }
    +
    +  override def hashCode: Int = {
    +    41 * (41 + host.hashCode) + port
    +  }
    +
    +  override def toString(): String = {
    +    s"Broker($host, $port)"
    +  }
    +
    --- End diff --
    
    nit: extra space


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4964] [Streaming] refactor createRDD to...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the pull request:

    https://github.com/apache/spark/pull/4511#issuecomment-73847054
  
    Thanks @koeninger I have merged this! 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4964] [Streaming] refactor createRDD to...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the pull request:

    https://github.com/apache/spark/pull/4511#issuecomment-73825333
  
    test this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4964] [Streaming] refactor createRDD to...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the pull request:

    https://github.com/apache/spark/pull/4511#issuecomment-73793015
  
    I was surprised to see that the KafkaRDDSuite was not modified, then i realized that the KafkaRDDSuite doesnot use the public API to create RDDs. Can you please update KafkaRDDSuite to do that, so that the public APIs get tested?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4964] [Streaming] refactor createRDD to...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4511#discussion_r24466211
  
    --- Diff: external/kafka/src/test/scala/org/apache/spark/streaming/kafka/KafkaRDDSuite.scala ---
    @@ -40,43 +41,70 @@ class KafkaRDDSuite extends KafkaStreamSuiteBase with BeforeAndAfter {
         tearDownKafka()
       }
    --- End diff --
    
    Can you make both tests reuse the kafka? All you need to do is setup and tear down Kafka in beforeAll and afterAll, respectively. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4964] [Streaming] refactor createRDD to...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4511#issuecomment-73831038
  
      [Test build #597 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/597/consoleFull) for   PR 4511 at commit [`6f8680b`](https://github.com/apache/spark/commit/6f8680bc910a197d0cb13d279aa62e5798ed7ac3).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4964] [Streaming] refactor createRDD to...

Posted by koeninger <gi...@git.apache.org>.
Github user koeninger commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4511#discussion_r24459700
  
    --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala ---
    @@ -211,12 +220,17 @@ object KafkaUtils {
           sc: SparkContext,
           kafkaParams: Map[String, String],
           offsetRanges: Array[OffsetRange],
    -      leaders: Array[Leader],
    +      leaders: Map[TopicAndPartition, Broker],
           messageHandler: MessageAndMetadata[K, V] => R
         ): RDD[R] = {
    -    val leaderMap = leaders
    -      .map(l => TopicAndPartition(l.topic, l.partition) -> (l.host, l.port))
    -      .toMap
    +    val leaderMap = if (leaders.isEmpty) {
    +      leadersForRanges(kafkaParams, offsetRanges)
    +    } else {
    +      // This could be avoided by refactoring KafkaRDD.leaders and KafkaCluster to use Broker
    --- End diff --
    
    Agreed, don't want to disrupt the way things are, and should be able to be done without affecting publicly exposed classes


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4964] [Streaming] refactor createRDD to...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4511#discussion_r24454653
  
    --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala ---
    @@ -154,6 +154,19 @@ object KafkaUtils {
           jssc.ssc, kafkaParams.toMap, Map(topics.mapValues(_.intValue()).toSeq: _*), storageLevel)
       }
     
    +  /** get leaders for the given offset ranges, or throw an exception */
    +  private def leadersForRanges(
    +      kafkaParams: Map[String, String],
    +      offsetRanges: Array[OffsetRange]): Map[TopicAndPartition, (String, Int)] = {
    +    val kc = new KafkaCluster(kafkaParams)
    --- End diff --
    
    We keep creating a KafkaCluster object everywhere... what is the cost of creating a KafkaCluster object? Is there any? I guess not because it is just creating a config object and nothing else... isnt it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4964] [Streaming] refactor createRDD to...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the pull request:

    https://github.com/apache/spark/pull/4511#issuecomment-73817894
  
    Generally look pretty good. Just one major comment about reuseing the kafka harness across tests. Other than that, minor nits. Will merge once these are done and tests pass.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4964] [Streaming] refactor createRDD to...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4511#issuecomment-73829002
  
      [Test build #27269 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27269/consoleFull) for   PR 4511 at commit [`f7151d4`](https://github.com/apache/spark/commit/f7151d4e80c9bfe72d0240902d448df4412dc172).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4964] [Streaming] refactor createRDD to...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4511#issuecomment-73826099
  
      [Test build #597 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/597/consoleFull) for   PR 4511 at commit [`6f8680b`](https://github.com/apache/spark/commit/6f8680bc910a197d0cb13d279aa62e5798ed7ac3).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4964] [Streaming] refactor createRDD to...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the pull request:

    https://github.com/apache/spark/pull/4511#issuecomment-73817917
  
    ok to test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4964] [Streaming] refactor createRDD to...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the pull request:

    https://github.com/apache/spark/pull/4511#issuecomment-73825313
  
    ok to test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4964] [Streaming] refactor createRDD to...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/4511


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4964] [Streaming] refactor createRDD to...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4511#discussion_r24453244
  
    --- Diff: external/kafka/src/test/scala/org/apache/spark/streaming/kafka/KafkaStreamSuite.scala ---
    @@ -53,8 +53,8 @@ abstract class KafkaStreamSuiteBase extends FunSuite with Eventually with Loggin
       private val zkConnectionTimeout = 6000
       private val zkSessionTimeout = 6000
       private var zookeeper: EmbeddedZookeeper = _
    -  private val brokerHost = "localhost"
    -  private var brokerPort = 9092
    +  protected val brokerHost = "localhost"
    +  protected var brokerPort = 9092
    --- End diff --
    
    Notice that in the previous PR i refactored to put brokerAddress behind the ready flag? I had done this to make sure we dont accidentally use the port value before the broker is ready. This is because the port number can change depending on which port does the broker setup successfully bind to. If you really need the port, then could you put this behind the ready flag as well?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4964] [Streaming] refactor createRDD to...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4511#discussion_r24467329
  
    --- Diff: external/kafka/src/test/scala/org/apache/spark/streaming/kafka/KafkaRDDSuite.scala ---
    @@ -92,8 +120,17 @@ class KafkaRDDSuite extends KafkaStreamSuiteBase with BeforeAndAfter {
           )
           until <- kc.getLatestLeaderOffsets(topicPartitions).right.toOption
         } yield {
    -      KafkaRDD[String, String, StringDecoder, StringDecoder, String](
    -        sc, kc.kafkaParams, from, until, mmd => s"${mmd.offset} ${mmd.message}")
    +      val leaders = until.map { case (tp, lo) =>
    --- End diff --
    
    nit: Could you simplify this piece of code. It took me 5 mins to understand whats going on here. Its very Scala-esque and concise and elegant, I completely agree, but its still hard to read :)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4964] [Streaming] refactor createRDD to...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4511#issuecomment-73833754
  
      [Test build #27269 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27269/consoleFull) for   PR 4511 at commit [`f7151d4`](https://github.com/apache/spark/commit/f7151d4e80c9bfe72d0240902d448df4412dc172).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4964] [Streaming] refactor createRDD to...

Posted by koeninger <gi...@git.apache.org>.
Github user koeninger commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4511#discussion_r24459792
  
    --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala ---
    @@ -154,6 +154,19 @@ object KafkaUtils {
           jssc.ssc, kafkaParams.toMap, Map(topics.mapValues(_.intValue()).toSeq: _*), storageLevel)
       }
     
    +  /** get leaders for the given offset ranges, or throw an exception */
    +  private def leadersForRanges(
    +      kafkaParams: Map[String, String],
    +      offsetRanges: Array[OffsetRange]): Map[TopicAndPartition, (String, Int)] = {
    +    val kc = new KafkaCluster(kafkaParams)
    --- End diff --
    
    That's right, it's basically just stateless config.  Methods called on it may be expensive, but i've tried to limit those to places where it's unavoidable (looking up info that wasn't provided)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4964] [Streaming] refactor createRDD to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4511#issuecomment-73819460
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27251/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4964] [Streaming] refactor createRDD to...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4511#discussion_r24472396
  
    --- Diff: external/kafka/src/test/scala/org/apache/spark/streaming/kafka/KafkaRDDSuite.scala ---
    @@ -40,43 +41,70 @@ class KafkaRDDSuite extends KafkaStreamSuiteBase with BeforeAndAfter {
         tearDownKafka()
       }
    --- End diff --
    
    True. But, ....
    
    1 is a riskier change, that should be done later (not for this release), dont want to introduce flakiness
    2. is a greater refactoring, that is definitely not worth doing now. I am not sure though nested testsuites are a good idea or not. Stickign all the Kafka tests in a single testsuite is something that we can consider, there are pros and cons. 
    
    For now at least we should reuse the kafka harness within the same testsuite. I did that for the other Kafka testsuites. The bigger question we can address in separate PRs meant for next release cycle.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4964] [Streaming] refactor createRDD to...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4511#discussion_r24467148
  
    --- Diff: external/kafka/src/test/scala/org/apache/spark/streaming/kafka/KafkaRDDSuite.scala ---
    @@ -40,43 +41,70 @@ class KafkaRDDSuite extends KafkaStreamSuiteBase with BeforeAndAfter {
         tearDownKafka()
       }
     
    -  test("Kafka RDD") {
    +  test("Kafka RDD basic usage") {
    +    val sparkConf = new SparkConf().setMaster("local[4]").setAppName(this.getClass.getSimpleName)
    +    sc = new SparkContext(sparkConf)
    +    val topic = "topicbasic"
    +    createTopic(topic)
    +    val messages = Set("the", "quick", "brown", "fox")
    +    sendMessages(topic, messages.toArray)
    +
    +
    +    val kafkaParams = Map("metadata.broker.list" -> brokerAddress,
    +      "group.id" -> s"test-consumer-${Random.nextInt(10000)}")
    +
    +    val offsetRanges = Array(OffsetRange(topic, 0, 0, messages.size))
    +
    +    val rdd =  KafkaUtils.createRDD[String, String, StringDecoder, StringDecoder](
    +      sc, kafkaParams, offsetRanges)
    +
    +    val received = rdd.map(_._2).collect.toSet
    +    assert(received === messages)
    +  }
    +
    +  test("Kafka RDD integration") {
    --- End diff --
    
    nit: integration doesnt quite mean much. Might be worth naming the tests "create by topic" and "create by topic and offset ranges"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4964] [Streaming] refactor createRDD to...

Posted by koeninger <gi...@git.apache.org>.
Github user koeninger commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4511#discussion_r24472248
  
    --- Diff: external/kafka/src/test/scala/org/apache/spark/streaming/kafka/KafkaRDDSuite.scala ---
    @@ -40,43 +41,70 @@ class KafkaRDDSuite extends KafkaStreamSuiteBase with BeforeAndAfter {
         tearDownKafka()
       }
    --- End diff --
    
    Sure, but I think there are two bigger things that could be done here
    
    1.  Change the thread sleep in setupKafka to polling using eventually
    
    2.  Change this group of tests to use nested suites, so kafka / zk setup is only done once


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4964] [Streaming] refactor createRDD to...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/4511#issuecomment-73784156
  
    OK to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4964] [Streaming] refactor createRDD to...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4511#discussion_r24467088
  
    --- Diff: external/kafka/src/test/scala/org/apache/spark/streaming/kafka/KafkaRDDSuite.scala ---
    @@ -40,43 +41,70 @@ class KafkaRDDSuite extends KafkaStreamSuiteBase with BeforeAndAfter {
         tearDownKafka()
       }
     
    -  test("Kafka RDD") {
    +  test("Kafka RDD basic usage") {
    --- End diff --
    
    nit: this is a KafkaRDDSuite ... so "KafkaRDD" does not need to added to every unit test name.:)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4964] [Streaming] refactor createRDD to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4511#issuecomment-73781990
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4964] [Streaming] refactor createRDD to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4511#issuecomment-73833756
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27269/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org