You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by xoltar <gi...@git.apache.org> on 2014/02/24 02:05:24 UTC

[GitHub] incubator-spark pull request: For outputformats that are Configura...

GitHub user xoltar opened a pull request:

    https://github.com/apache/incubator-spark/pull/638

    For outputformats that are Configurable, call setConf before sending data to them.

    This allows us to use, e.g. HBase's TableOutputFormat with PairRDDFunctions.saveAsNewAPIHadoopFile, which otherwise would throw NullPointerException because the output table name hasn't been configured.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/xoltar/incubator-spark SPARK-1108

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-spark/pull/638.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #638
    
----
commit 7cbcaa10bbf01cf04bba7f2883d1fb9564cd3660
Author: Bryn Keller <br...@intel.com>
Date:   2014-02-20T06:00:44Z

    For outputformats that are Configurable, call setConf before sending data to them. This allows us to use, e.g. HBase TableOutputFormat, which otherwise would throw NullPointerException because the output table name hasn't been configured

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: For outputformats that are Configura...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/incubator-spark/pull/638#discussion_r9980747
  
    --- Diff: core/src/test/scala/org/apache/spark/rdd/PairRDDFunctionsSuite.scala ---
    @@ -330,4 +335,74 @@ class PairRDDFunctionsSuite extends FunSuite with SharedSparkContext {
           (1, ArrayBuffer(1)),
           (2, ArrayBuffer(1))))
       }
    +
    +  test("saveNewAPIHadoopFile should call setConf if format is configurable") {
    +    val pairs = sc.parallelize(Array((new Integer(1), new Integer(1))))
    +    val conf = new Configuration()
    +
    +    //No error, non-configurable formats still work
    --- End diff --
    
    Mind adding spaces after these? `// No error, non-configurable formats`... Also it would be nice (but up to you) to use `/* ... */` for multi-line comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: For outputformats that are Configura...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/incubator-spark/pull/638#issuecomment-35850754
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: For outputformats that are Configura...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/incubator-spark/pull/638#issuecomment-35856419
  
    Thanks a lot for tracking this down, fixing it, and adding tests! I added some minor style comments, modulo those comments LGTM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: For outputformats that are Configura...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/incubator-spark/pull/638#discussion_r9980719
  
    --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala ---
    @@ -617,6 +617,10 @@ class PairRDDFunctions[K: ClassTag, V: ClassTag](self: RDD[(K, V)])
             attemptNumber)
           val hadoopContext = newTaskAttemptContext(wrappedConf.value, attemptId)
           val format = outputFormatClass.newInstance
    +      format match {
    +        case c:Configurable => c.setConf(wrappedConf.value)
    --- End diff --
    
    I don't think this is specific to hbase - I think this is something we should really have been doing always but it only was noticed due to the fact that hbase replies on this configuration.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: For outputformats that are Configura...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/incubator-spark/pull/638#issuecomment-35851045
  
    Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: For outputformats that are Configura...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/incubator-spark/pull/638#issuecomment-35854652
  
    Merged build finished.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: For outputformats that are Configura...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/incubator-spark/pull/638


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-spark pull request: For outputformats that are Configura...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/incubator-spark/pull/638#discussion_r9980723
  
    --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala ---
    @@ -617,6 +617,10 @@ class PairRDDFunctions[K: ClassTag, V: ClassTag](self: RDD[(K, V)])
             attemptNumber)
           val hadoopContext = newTaskAttemptContext(wrappedConf.value, attemptId)
           val format = outputFormatClass.newInstance
    +      format match {
    +        case c:Configurable => c.setConf(wrappedConf.value)
    --- End diff --
    
    Add a space after the colon: `case c: Configurable => `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: For outputformats that are Configura...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/incubator-spark/pull/638#issuecomment-35852729
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: For outputformats that are Configura...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/incubator-spark/pull/638#discussion_r9980734
  
    --- Diff: core/src/test/scala/org/apache/spark/rdd/PairRDDFunctionsSuite.scala ---
    @@ -26,6 +26,11 @@ import com.google.common.io.Files
     
     import org.apache.spark.SparkContext._
     import org.apache.spark.{Partitioner, SharedSparkContext}
    +import org.apache.hadoop.mapreduce._
    --- End diff --
    
    Mind making your new imports fit the normal style?
    
    https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide#SparkCodeStyleGuide-Imports


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: For outputformats that are Configura...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/incubator-spark/pull/638#issuecomment-35856428
  
    We should put this fix in 0.9 as well once it's ready to merge.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: For outputformats that are Configura...

Posted by CodingCat <gi...@git.apache.org>.
Github user CodingCat commented on a diff in the pull request:

    https://github.com/apache/incubator-spark/pull/638#discussion_r9979453
  
    --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala ---
    @@ -617,6 +617,10 @@ class PairRDDFunctions[K: ClassTag, V: ClassTag](self: RDD[(K, V)])
             attemptNumber)
           val hadoopContext = newTaskAttemptContext(wrappedConf.value, attemptId)
           val format = outputFormatClass.newInstance
    +      format match {
    +        case c:Configurable => c.setConf(wrappedConf.value)
    --- End diff --
    
    do we need some comments here to indicate that this line is to support a special case in HBase?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: For outputformats that are Configura...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/incubator-spark/pull/638#issuecomment-35854654
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/12823/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: For outputformats that are Configura...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/incubator-spark/pull/638#discussion_r9980751
  
    --- Diff: core/src/test/scala/org/apache/spark/rdd/PairRDDFunctionsSuite.scala ---
    @@ -330,4 +335,74 @@ class PairRDDFunctionsSuite extends FunSuite with SharedSparkContext {
           (1, ArrayBuffer(1)),
           (2, ArrayBuffer(1))))
       }
    +
    +  test("saveNewAPIHadoopFile should call setConf if format is configurable") {
    +    val pairs = sc.parallelize(Array((new Integer(1), new Integer(1))))
    +    val conf = new Configuration()
    +
    +    //No error, non-configurable formats still work
    +    pairs.saveAsNewAPIHadoopFile[FakeFormat]("ignored")
    +
    +    //Configurable intercepts get configured
    +    //ConfigTestFormat throws an exception if we try to write to it
    +    //when setConf hasn't been thrown first.
    +    //Assertion is in ConfigTestFormat.getRecordWriter
    +    pairs.saveAsNewAPIHadoopFile[ConfigTestFormat]("ignored")
    +  }
    +}
    +
    +// These classes are fakes for testing
    +// "saveNewAPIHadoopFile should call setConf if format is configurable".
    +// Unfortunately, they have to be top level classes, and not defined in
    +// the test method, because otherwise Scala won't generate no-args constructors
    +// and the test will therefore throw InstantiationException when saveAsNewAPIHadoopFile
    +// tries to instantiate them with Class.newInstance.
    +class FakeWriter extends RecordWriter[Integer,Integer] {
    --- End diff --
    
    `Integer, Integer`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: For outputformats that are Configura...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/incubator-spark/pull/638#issuecomment-35852730
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: For outputformats that are Configura...

Posted by xoltar <gi...@git.apache.org>.
Github user xoltar commented on the pull request:

    https://github.com/apache/incubator-spark/pull/638#issuecomment-35857946
  
    Thanks, last change should address all code review comments. Also cleaned up some imports in PairRDDFunctionsSuite that weren't needed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---