You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by davies <gi...@git.apache.org> on 2014/10/30 00:51:32 UTC

[GitHub] spark pull request: [SPARK-2672] support compressed file in wholeT...

GitHub user davies opened a pull request:

    https://github.com/apache/spark/pull/3005

    [SPARK-2672] support compressed file in wholeTextFile

    The wholeFile() can not read compressed files, it should be, just like textFile().

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/davies/spark whole

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/3005.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3005
    
----
commit 22e8b3e0d17a118590e66d2d2f517ce3ec511c5f
Author: Davies Liu <da...@databricks.com>
Date:   2014-10-29T23:50:34Z

    support compressed file in wholeTextFile

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2672] support compressed file in wholeT...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/3005#issuecomment-61031277
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22491/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2672] support compressed file in wholeT...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3005#issuecomment-62816723
  
      [Test build #517 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/517/consoleFull) for   PR 3005 at commit [`c83571a`](https://github.com/apache/spark/commit/c83571a7e47bcdf27fe386908708334d4fe9f796).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2672] support compressed file in wholeT...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/3005#discussion_r19702340
  
    --- Diff: core/src/main/scala/org/apache/spark/input/WholeTextFileInputFormat.scala ---
    @@ -34,17 +33,24 @@ import org.apache.hadoop.mapreduce.lib.input.CombineFileSplit
      * the value is the entire content of file.
      */
     
    -private[spark] class WholeTextFileInputFormat extends CombineFileInputFormat[String, String] {
    +private[spark] class WholeTextFileInputFormat
    +  extends CombineFileInputFormat[String, String] with Configurable {
    +
       override protected def isSplitable(context: JobContext, file: Path): Boolean = false
     
    +  private var conf: Configuration = _
    +  def setConf(c: Configuration) = {
    --- End diff --
    
    what's the return type?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2672] support compressed file in wholeT...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3005#issuecomment-62806011
  
      [Test build #517 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/517/consoleFull) for   PR 3005 at commit [`c83571a`](https://github.com/apache/spark/commit/c83571a7e47bcdf27fe386908708334d4fe9f796).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2672] support compressed file in wholeT...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/3005#issuecomment-61904592
  
    @rxin I had addressed your comments, could you take a look again?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2672] support compressed file in wholeT...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/3005#discussion_r19702341
  
    --- Diff: core/src/main/scala/org/apache/spark/input/WholeTextFileRecordReader.scala ---
    @@ -34,7 +36,13 @@ private[spark] class WholeTextFileRecordReader(
         split: CombineFileSplit,
         context: TaskAttemptContext,
         index: Integer)
    -  extends RecordReader[String, String] {
    +  extends RecordReader[String, String] with Configurable {
    +
    +  private var conf: Configuration = _
    +  def setConf(c: Configuration) = {
    --- End diff --
    
    similarly what's the return type?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2672] support compressed file in wholeT...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/3005#issuecomment-62818068
  
    Looks good to me (thanks for fixing the semicolons!).  I'm going to merge this into master and 1.2.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2672] support compressed file in wholeT...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/3005#issuecomment-62817107
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23282/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2672] support compressed file in wholeT...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/3005#issuecomment-62785571
  
    This looks good to me!  I'm going to merge this (I'll remove those semicolons on merge).  Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2672] support compressed file in wholeT...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/3005#issuecomment-61394315
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22732/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2672] support compressed file in wholeT...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3005#issuecomment-61392990
  
      [Test build #22732 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22732/consoleFull) for   PR 3005 at commit [`c83571a`](https://github.com/apache/spark/commit/c83571a7e47bcdf27fe386908708334d4fe9f796).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2672] support compressed file in wholeT...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/3005#issuecomment-62785700
  
    (Actually, let me just re-run Jenkins, just to be safe).
    
    Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2672] support compressed file in wholeT...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3005#issuecomment-61394314
  
      [Test build #22732 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22732/consoleFull) for   PR 3005 at commit [`c83571a`](https://github.com/apache/spark/commit/c83571a7e47bcdf27fe386908708334d4fe9f796).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class DecimalType(DataType):`
      * `case class UnscaledValue(child: Expression) extends UnaryExpression `
      * `case class MakeDecimal(child: Expression, precision: Int, scale: Int) extends UnaryExpression `
      * `case class MutableLiteral(var value: Any, dataType: DataType, nullable: Boolean = true)`
      * `case class PrecisionInfo(precision: Int, scale: Int)`
      * `case class DecimalType(precisionInfo: Option[PrecisionInfo]) extends FractionalType `
      * `final class Decimal extends Ordered[Decimal] with Serializable `
      * `  trait DecimalIsConflicted extends Numeric[Decimal] `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2672] support compressed file in wholeT...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/3005


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2672] support compressed file in wholeT...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3005#issuecomment-61031274
  
      [Test build #22491 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22491/consoleFull) for   PR 3005 at commit [`22e8b3e`](https://github.com/apache/spark/commit/22e8b3e0d17a118590e66d2d2f517ce3ec511c5f).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2672] support compressed file in wholeT...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/3005#issuecomment-62447098
  
    @JoshRosen @mateiz Do you have time to review this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2672] support compressed file in wholeT...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3005#issuecomment-62806486
  
      [Test build #23282 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23282/consoleFull) for   PR 3005 at commit [`a43fcfb`](https://github.com/apache/spark/commit/a43fcfb7410bc1ca35597336a03140fe6412494f).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2672] support compressed file in wholeT...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/3005#discussion_r20245940
  
    --- Diff: core/src/main/scala/org/apache/spark/input/WholeTextFileRecordReader.scala ---
    @@ -57,8 +65,16 @@ private[spark] class WholeTextFileRecordReader(
     
       override def nextKeyValue(): Boolean = {
         if (!processed) {
    +      val conf = new Configuration
    +      val factory = new CompressionCodecFactory(conf);
    +      val codec = factory.getCodec(path); // infers from file ext.
    --- End diff --
    
    Nit: don't need this semicolon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2672] support compressed file in wholeT...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/3005#discussion_r20246375
  
    --- Diff: core/src/main/scala/org/apache/spark/input/WholeTextFileRecordReader.scala ---
    @@ -57,8 +65,16 @@ private[spark] class WholeTextFileRecordReader(
     
       override def nextKeyValue(): Boolean = {
         if (!processed) {
    +      val conf = new Configuration
    +      val factory = new CompressionCodecFactory(conf);
    --- End diff --
    
    Another semicolon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2672] support compressed file in wholeT...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3005#issuecomment-62817103
  
      [Test build #23282 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23282/consoleFull) for   PR 3005 at commit [`a43fcfb`](https://github.com/apache/spark/commit/a43fcfb7410bc1ca35597336a03140fe6412494f).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2672] support compressed file in wholeT...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3005#issuecomment-61025571
  
      [Test build #22491 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22491/consoleFull) for   PR 3005 at commit [`22e8b3e`](https://github.com/apache/spark/commit/22e8b3e0d17a118590e66d2d2f517ce3ec511c5f).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org