You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by ChengXiangLi <gi...@git.apache.org> on 2014/08/29 09:28:23 UTC

[GitHub] spark pull request: SPARK-2895: Add mapPartitionsWithContext relat...

GitHub user ChengXiangLi opened a pull request:

    https://github.com/apache/spark/pull/2194

    SPARK-2895: Add mapPartitionsWithContext related support on Spark Java API.

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ChengXiangLi/spark spark-2895

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/2194.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2194
    
----
commit 8f4aab21c438cdd2fe371d49286559e038c5c061
Author: chengxiang li <ch...@intel.com>
Date:   2014-08-29T07:12:09Z

    add mappartitionWithContext related support on Spark Java API.

commit 37b5b6b8d00c9cbcc3d9c8dbc47577af29c60196
Author: chengxiang li <ch...@intel.com>
Date:   2014-08-29T07:30:21Z

    fix several code style issue.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-2895: Add mapPartitionsWithContext relat...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2194#discussion_r17016536
  
    --- Diff: core/src/test/java/org/apache/spark/JavaAPISuite.java ---
    @@ -708,6 +708,104 @@ public void mapPartitions() {
       }
     
       @Test
    +  public void mapPartitionsWithContext() {
    +    JavaRDD<Integer> rdd = sc.parallelize(Arrays.asList(1, 2, 3, 4), 2);
    +    JavaRDD<String> partitionSumsWithContext = rdd.mapPartitionsWithContext(
    +      new Function2<TaskContext, Iterator<Integer>, Iterator<String>>() {
    +        @Override
    +        public Iterator<String> call(TaskContext context,
    +          Iterator<Integer> iter) throws Exception {
    +
    +          int sum = 0;
    +          while (iter.hasNext()) {
    +            sum += iter.next();
    +          }
    +          return Collections.singletonList(sum + "-partition-" + context.partitionId()).iterator();
    +        }
    +      }, false);
    +    Assert.assertEquals("[3-partition-0, 7-partition-1]",
    +            partitionSumsWithContext.collect().toString());
    +  }
    +
    +  @Test
    +  public void mapPartitionsToPair() {
    +    JavaRDD<Integer> rdd = sc.parallelize(Arrays.asList(1, 2, 3, 4), 2);
    +    JavaPairRDD<Integer, String> pairRdd = rdd.mapPartitionsToPair(
    +      new PairFlatMapFunction<Iterator<Integer>, Integer, String>() {
    +        @Override
    +        public Iterable<Tuple2<Integer, String>> call(Iterator<Integer> iter) throws Exception {
    +          int sum = 0;
    +          while (iter.hasNext()) {
    +            sum += iter.next();
    +          }
    +          return Collections.singletonList(new Tuple2<Integer, String>(sum, "a"));
    +        }
    +      }
    +    );
    +
    +    Assert.assertEquals("[(3,a), (7,a)]", pairRdd.collect().toString());
    +  }
    +
    +  @Test
    +  public void mapPartitionsToPairWithContext() {
    +    JavaRDD<Integer> rdd = sc.parallelize(Arrays.asList(1, 2, 3, 4), 2);
    +    JavaPairRDD<Integer, String> pairRdd = rdd.mapPartitionsToPairWithContext(
    +      new PairFlatMapFunction2<TaskContext, Iterator<Integer>, Integer, String>() {
    +        @Override
    +        public Iterable<Tuple2<Integer, String>> call(TaskContext context, Iterator<Integer> iter)
    +          throws Exception {
    +
    +          int sum = 0;
    +          while (iter.hasNext()) {
    +            sum += iter.next();
    +          }
    +          return Collections.singletonList(
    +                   new Tuple2<Integer, String>(sum, "partition-" + context.partitionId()));
    +        }
    +      }, false);
    +
    +    Assert.assertEquals("[(3,partition-0), (7,partition-1)]", pairRdd.collect().toString());
    +  }
    +
    +  @Test
    +  public void mapPartitionsToDouble() {
    --- End diff --
    
    remove this once you remove the api above


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-2895: Add mapPartitionsWithContext relat...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2194#issuecomment-54409536
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19738/consoleFull) for   PR 2194 at commit [`882b82e`](https://github.com/apache/spark/commit/882b82e51a72c9e853d2795e0ee8d46082dc02dd).
     * This patch **fails** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-2895: Add mapPartitionsWithContext relat...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2194#issuecomment-55693096
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20367/consoleFull) for   PR 2194 at commit [`a5b7b41`](https://github.com/apache/spark/commit/a5b7b412dcff6d680e38a8f1f79fea065762e1c3).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-2895: Add mapPartitionsWithContext relat...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/2194#issuecomment-54405548
  
    This looks fine to me, especially since it adds Java tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-2895: Add mapPartitionsWithContext relat...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2194#issuecomment-55696453
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20367/consoleFull) for   PR 2194 at commit [`a5b7b41`](https://github.com/apache/spark/commit/a5b7b412dcff6d680e38a8f1f79fea065762e1c3).
     * This patch **fails** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-2895: Add mapPartitionsWithContext relat...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/2194#issuecomment-55692911
  
    Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-2895: Add mapPartitionsWithContext relat...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/2194#issuecomment-54405450
  
    Jenkins, ok to test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-2895: Add mapPartitionsWithContext relat...

Posted by ChengXiangLi <gi...@git.apache.org>.
Github user ChengXiangLi commented on the pull request:

    https://github.com/apache/spark/pull/2194#issuecomment-54102109
  
    Hi, @pwendell , For several Hive features, such as HIVE-7843 and HIVE-7627, Hive need to access task id, no other dependency on task context currently. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-2895: Add mapPartitionsWithContext relat...

Posted by ChengXiangLi <gi...@git.apache.org>.
Github user ChengXiangLi commented on the pull request:

    https://github.com/apache/spark/pull/2194#issuecomment-55686321
  
    We still hit API incompatibilities error  as https://github.com/apache/spark/pull/2285 is not finished yet.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-2895: Add mapPartitionsWithContext relat...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/2194#issuecomment-57042319
  
    I pushed a commit to close this one in favor of https://github.com/apache/spark/pull/2425


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-2895: Add mapPartitionsWithContext relat...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/2194#issuecomment-55695570
  
    I proposed a slightly different approach to this here:
    https://issues.apache.org/jira/browse/SPARK-3543
    
    This would remove the need for special methods xWithContext.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-2895: Add mapPartitionsWithContext relat...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2194#issuecomment-53845634
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-2895: Add mapPartitionsWithContext relat...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2194#discussion_r17016552
  
    --- Diff: core/src/test/java/org/apache/spark/JavaAPISuite.java ---
    @@ -708,6 +708,104 @@ public void mapPartitions() {
       }
     
       @Test
    +  public void mapPartitionsWithContext() {
    +    JavaRDD<Integer> rdd = sc.parallelize(Arrays.asList(1, 2, 3, 4), 2);
    +    JavaRDD<String> partitionSumsWithContext = rdd.mapPartitionsWithContext(
    +      new Function2<TaskContext, Iterator<Integer>, Iterator<String>>() {
    +        @Override
    +        public Iterator<String> call(TaskContext context,
    +          Iterator<Integer> iter) throws Exception {
    +
    +          int sum = 0;
    +          while (iter.hasNext()) {
    +            sum += iter.next();
    +          }
    +          return Collections.singletonList(sum + "-partition-" + context.partitionId()).iterator();
    +        }
    +      }, false);
    +    Assert.assertEquals("[3-partition-0, 7-partition-1]",
    +            partitionSumsWithContext.collect().toString());
    +  }
    +
    +  @Test
    +  public void mapPartitionsToPair() {
    +    JavaRDD<Integer> rdd = sc.parallelize(Arrays.asList(1, 2, 3, 4), 2);
    +    JavaPairRDD<Integer, String> pairRdd = rdd.mapPartitionsToPair(
    +      new PairFlatMapFunction<Iterator<Integer>, Integer, String>() {
    +        @Override
    +        public Iterable<Tuple2<Integer, String>> call(Iterator<Integer> iter) throws Exception {
    +          int sum = 0;
    +          while (iter.hasNext()) {
    +            sum += iter.next();
    +          }
    +          return Collections.singletonList(new Tuple2<Integer, String>(sum, "a"));
    +        }
    +      }
    +    );
    +
    +    Assert.assertEquals("[(3,a), (7,a)]", pairRdd.collect().toString());
    +  }
    +
    +  @Test
    +  public void mapPartitionsToPairWithContext() {
    +    JavaRDD<Integer> rdd = sc.parallelize(Arrays.asList(1, 2, 3, 4), 2);
    +    JavaPairRDD<Integer, String> pairRdd = rdd.mapPartitionsToPairWithContext(
    +      new PairFlatMapFunction2<TaskContext, Iterator<Integer>, Integer, String>() {
    +        @Override
    +        public Iterable<Tuple2<Integer, String>> call(TaskContext context, Iterator<Integer> iter)
    +          throws Exception {
    +
    +          int sum = 0;
    +          while (iter.hasNext()) {
    +            sum += iter.next();
    +          }
    +          return Collections.singletonList(
    +                   new Tuple2<Integer, String>(sum, "partition-" + context.partitionId()));
    +        }
    +      }, false);
    +
    +    Assert.assertEquals("[(3,partition-0), (7,partition-1)]", pairRdd.collect().toString());
    +  }
    +
    +  @Test
    +  public void mapPartitionsToDouble() {
    +    JavaRDD<Integer> rdd = sc.parallelize(Arrays.asList(1, 2, 3, 4), 2);
    +    JavaDoubleRDD pairRdd = rdd.mapPartitionsToDouble(
    +      new DoubleFlatMapFunction<Iterator<Integer>>() {
    +        @Override
    +        public Iterable<Double> call(Iterator<Integer> iter) throws Exception {
    +          int sum = 0;
    +          while (iter.hasNext()) {
    +            sum += iter.next();
    +          }
    +          return Collections.singletonList(Double.valueOf(sum));
    +        }
    +      }
    +    );
    +
    +    Assert.assertEquals("[3.0, 7.0]", pairRdd.collect().toString());
    +  }
    +
    +  @Test
    +  public void mapPartitionsToDoubleWithContext() {
    --- End diff --
    
    remove this once you remove the api above


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-2895: Add mapPartitionsWithContext relat...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/2194#issuecomment-54426812
  
    @ScrapCodes  does mima check not exclude developer apis?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-2895: Add mapPartitionsWithContext relat...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2194#issuecomment-55206641
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/41/consoleFull) for   PR 2194 at commit [`882b82e`](https://github.com/apache/spark/commit/882b82e51a72c9e853d2795e0ee8d46082dc02dd).
     * This patch **fails** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-2895: Add mapPartitionsWithContext relat...

Posted by ScrapCodes <gi...@git.apache.org>.
Github user ScrapCodes commented on the pull request:

    https://github.com/apache/spark/pull/2194#issuecomment-54465038
  
    I am looking at this. Mima check should have excluded those methods. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-2895: Add mapPartitionsWithContext relat...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2194#discussion_r17016500
  
    --- Diff: core/src/main/scala/org/apache/spark/api/java/JavaRDDLike.scala ---
    @@ -186,6 +186,62 @@ trait JavaRDDLike[T, This <: JavaRDDLike[T, This]] extends Serializable {
       }
     
       /**
    +   * :: DeveloperApi ::
    +   * Return a new RDD by applying a function to each partition of this RDD. This is a variant of
    +   * mapPartitions that also passes the TaskContext into the closure.
    +   *
    +   * `preservesPartitioning` indicates whether the input function preserves the partitioner, which
    +   * should be `false` unless this is a pair RDD and the input function doesn't modify the keys.
    +   */
    +  @DeveloperApi
    +  def mapPartitionsWithContext[R](
    +      f: JFunction2[TaskContext, java.util.Iterator[T], java.util.Iterator[R]],
    +      preservesPartitioning: Boolean = false): JavaRDD[R] = {
    +
    +    new JavaRDD(rdd.mapPartitionsWithContext(
    +      ((a, b) => f(a, asJavaIterator(b))), preservesPartitioning)(fakeClassTag))(fakeClassTag)
    +  }
    +
    +
    +  /**
    +   * :: DeveloperApi ::
    +   * Return a new JavaDoubleRDD by applying a function to each partition of this RDD. This is a
    +   * variant of mapPartitions that also passes the TaskContext into the closure.
    +   *
    +   * `preservesPartitioning` indicates whether the input function preserves the partitioner, which
    +   * should be `false` unless this is a pair RDD and the input function doesn't modify the keys.
    +   */
    +  @DeveloperApi
    +  def mapPartitionsToDoubleWithContext(
    --- End diff --
    
    can we remove this one? I don't think it is needed for Hive, is it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-2895: Add mapPartitionsWithContext relat...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2194#discussion_r17399411
  
    --- Diff: core/src/main/scala/org/apache/spark/api/java/JavaRDDLike.scala ---
    @@ -186,6 +186,39 @@ trait JavaRDDLike[T, This <: JavaRDDLike[T, This]] extends Serializable {
       }
     
       /**
    +   * :: DeveloperApi ::
    +   * Return a new RDD by applying a function to each partition of this RDD. This is a variant of
    +   * mapPartitions that also passes the TaskContext into the closure.
    +   *
    +   * `preservesPartitioning` indicates whether the input function preserves the partitioner, which
    +   * should be `false` unless this is a pair RDD and the input function doesn't modify the keys.
    +   */
    +  @DeveloperApi
    +  def mapPartitionsWithContext[R](
    +      f: JFunction2[TaskContext, java.util.Iterator[T], java.util.Iterator[R]],
    +      preservesPartitioning: Boolean = false): JavaRDD[R] = {
    --- End diff --
    
    you can't have default argument values in Java


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-2895: Add mapPartitionsWithContext relat...

Posted by ChengXiangLi <gi...@git.apache.org>.
Github user ChengXiangLi commented on the pull request:

    https://github.com/apache/spark/pull/2194#issuecomment-54425347
  
    we hit the binary incompatibilities error here, i already annotated new added methods as DeveloperApi, do i miss something here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-2895: Add mapPartitionsWithContext relat...

Posted by ScrapCodes <gi...@git.apache.org>.
Github user ScrapCodes commented on the pull request:

    https://github.com/apache/spark/pull/2194#issuecomment-54043344
  
    It might be good to add a test suite for this in `JavaAPISuite.java`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-2895: Add mapPartitionsWithContext relat...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/2194#issuecomment-55179429
  
    Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-2895: Add mapPartitionsWithContext relat...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/2194


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-2895: Add mapPartitionsWithContext relat...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2194#discussion_r17016670
  
    --- Diff: core/src/main/java/org/apache/spark/api/java/function/DoubleFlatMapFunction2.java ---
    @@ -0,0 +1,28 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.api.java.function;
    +
    +import java.io.Serializable;
    +
    +/**
    + * A function that takes arguments of type T1 and T2, and returns zero or more records of type
    + * Double from each input record.
    + */
    +public interface DoubleFlatMapFunction2<T1, T2> extends Serializable {
    --- End diff --
    
    remove this one


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-2895: Add mapPartitionsWithContext relat...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2194#discussion_r17016634
  
    --- Diff: core/src/main/scala/org/apache/spark/api/java/JavaRDDLike.scala ---
    @@ -186,6 +186,62 @@ trait JavaRDDLike[T, This <: JavaRDDLike[T, This]] extends Serializable {
       }
     
       /**
    +   * :: DeveloperApi ::
    +   * Return a new RDD by applying a function to each partition of this RDD. This is a variant of
    +   * mapPartitions that also passes the TaskContext into the closure.
    +   *
    +   * `preservesPartitioning` indicates whether the input function preserves the partitioner, which
    +   * should be `false` unless this is a pair RDD and the input function doesn't modify the keys.
    +   */
    +  @DeveloperApi
    +  def mapPartitionsWithContext[R](
    +      f: JFunction2[TaskContext, java.util.Iterator[T], java.util.Iterator[R]],
    +      preservesPartitioning: Boolean = false): JavaRDD[R] = {
    +
    +    new JavaRDD(rdd.mapPartitionsWithContext(
    +      ((a, b) => f(a, asJavaIterator(b))), preservesPartitioning)(fakeClassTag))(fakeClassTag)
    +  }
    +
    +
    +  /**
    +   * :: DeveloperApi ::
    +   * Return a new JavaDoubleRDD by applying a function to each partition of this RDD. This is a
    +   * variant of mapPartitions that also passes the TaskContext into the closure.
    +   *
    +   * `preservesPartitioning` indicates whether the input function preserves the partitioner, which
    +   * should be `false` unless this is a pair RDD and the input function doesn't modify the keys.
    +   */
    +  @DeveloperApi
    +  def mapPartitionsToDoubleWithContext(
    +      f: DoubleFlatMapFunction2[TaskContext, java.util.Iterator[T]],
    +      preservesPartitioning: Boolean): JavaDoubleRDD = {
    +
    +    def fn = (context: TaskContext, x: Iterator[T]) =>
    +      asScalaIterator(f.call(context, asJavaIterator(x)).iterator())
    +    new JavaDoubleRDD(
    +      rdd.mapPartitionsWithContext(fn, preservesPartitioning).map(x => x.doubleValue()))
    +  }
    +
    +  /**
    +   * :: DeveloperApi ::
    +   * Return a new JavaPairRDD by applying a function to each partition of this RDD. This is a
    +   * variant of mapPartitions that also passes the TaskContext into the closure.
    +   *
    +   * `preservesPartitioning` indicates whether the input function preserves the partitioner, which
    +   * should be `false` unless this is a pair RDD and the input function doesn't modify the keys.
    +   */
    +  @DeveloperApi
    +  def mapPartitionsToPairWithContext[K2, V2](
    +      f: PairFlatMapFunction2[TaskContext, java.util.Iterator[T], K2, V2],
    +      preservesPartitioning: Boolean): JavaPairRDD[K2, V2] = {
    +
    +    def fn = (context: TaskContext, x: Iterator[T]) =>
    --- End diff --
    
    can this call mapPartitionsWithContext and just rap it around?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-2895: Add mapPartitionsWithContext relat...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2194#issuecomment-54694378
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-2895: Add mapPartitionsWithContext relat...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2194#issuecomment-54406091
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19738/consoleFull) for   PR 2194 at commit [`882b82e`](https://github.com/apache/spark/commit/882b82e51a72c9e853d2795e0ee8d46082dc02dd).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-2895: Add mapPartitionsWithContext relat...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/2194#issuecomment-54219469
  
    @JoshRosen can you also take a look at this. It is pretty short, but it is about the java api.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-2895: Add mapPartitionsWithContext relat...

Posted by ScrapCodes <gi...@git.apache.org>.
Github user ScrapCodes commented on the pull request:

    https://github.com/apache/spark/pull/2194#issuecomment-54598665
  
    @rxin There is a reason and (workaround type of)fix for this on #2285.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-2895: Add mapPartitionsWithContext relat...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2194#issuecomment-55202162
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/41/consoleFull) for   PR 2194 at commit [`882b82e`](https://github.com/apache/spark/commit/882b82e51a72c9e853d2795e0ee8d46082dc02dd).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-2895: Add mapPartitionsWithContext relat...

Posted by ScrapCodes <gi...@git.apache.org>.
Github user ScrapCodes commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2194#discussion_r16946230
  
    --- Diff: core/src/main/scala/org/apache/spark/api/java/JavaRDDLike.scala ---
    @@ -186,6 +186,56 @@ trait JavaRDDLike[T, This <: JavaRDDLike[T, This]] extends Serializable {
       }
     
       /**
    +   * :: DeveloperApi ::
    +   * Return a new RDD by applying a function to each partition of this RDD. This is a variant of
    +   * mapPartitions that also passes the TaskContext into the closure.
    +   *
    +   * `preservesPartitioning` indicates whether the input function preserves the partitioner, which
    +   * should be `false` unless this is a pair RDD and the input function doesn't modify the keys.
    +   */
    +  @DeveloperApi
    +  def mapPartitionsWithContext[R](
    +                                   f: JFunction2[TaskContext, java.util.Iterator[T], java.util.Iterator[R]],
    --- End diff --
    
    Wrong indentation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-2895: Add mapPartitionsWithContext relat...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/2194#issuecomment-54097265
  
    @ChengXiangLi could you describe a bit more what the context is being used for? This is an unstable API so I'm a bit hesitant to expose this in its current form. It would be better to look at exactly what Hive needs from this interface and see if we can come up with a stable interface for it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org