You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by gatorsmile <gi...@git.apache.org> on 2016/06/17 03:52:40 UTC

[GitHub] spark pull request #13728: [SPARK-16010] [SQL] Code Refactoring, Test Case I...

GitHub user gatorsmile opened a pull request:

    https://github.com/apache/spark/pull/13728

    [SPARK-16010] [SQL] Code Refactoring, Test Case Improvement and Description Updates for SQLConf spark.sql.parquet.filterPushdown

    #### What changes were proposed in this pull request?
    Starting Spark 2.0, vectorized decoding is introduced for parquet reading. This feature changes the filter pushdown behavior of parquet reading. Thus, this PR updates the out-of-dated description of two external `SQLConf`: `spark.sql.parquet.filterPushdown` and `spark.sql.parquet.enableVectorizedReader`. 
    
    The PR also slightly simplifies the code for building `parquetReader`. cc @davies @liancheng @marmbrus 
    
    Because the current test cases do not verify the behavior when `spark.sql.parquet.filterPushdown` is set to `false`, added a test case for improving the test case coverage. Also, improved the test case when the parquet file path points to either non-existent files or non-existent hosts. 
    
    #### How was this patch tested?
    Added the related test cases.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gatorsmile/spark addTestForParquet

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/13728.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #13728
    
----
commit a7b89bd414874fc62576f7fb54e9f5e2ffe5f397
Author: gatorsmile <ga...@gmail.com>
Date:   2016-06-17T02:59:00Z

    fix

commit a1da7981638723e07753e0aa97686602d3bb38a3
Author: gatorsmile <ga...@gmail.com>
Date:   2016-06-17T03:19:52Z

    update the comment

commit ad1f18cf4ebf189581997876cd13614ec940b961
Author: gatorsmile <ga...@gmail.com>
Date:   2016-06-17T03:33:16Z

    update the document

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13728: [SPARK-16010] [SQL] Code Refactoring, Test Case Improvem...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13728
  
    **[Test build #60691 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60691/consoleFull)** for PR 13728 at commit [`9967cc7`](https://github.com/apache/spark/commit/9967cc72545324e7a542fcf1b49372d977c0011b).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13728: [SPARK-16010] [SQL] Code Refactoring, Test Case I...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13728#discussion_r67466469
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala ---
    @@ -545,4 +545,28 @@ class ParquetFilterSuite extends QueryTest with ParquetTest with SharedSQLContex
           }
         }
       }
    +
    +  test("Verify SQLConf PARQUET_FILTER_PUSHDOWN_ENABLED") {
    +    import testImplicits._
    +
    +    Seq("true", "false").foreach { pushDown =>
    +      // When SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key is set to true and all the data types
    +      // of the table schema are AtomicType, the parquet reader uses vectorizedReader.
    +      // In this mode, filters will not be pushed down, no matter whether
    +      // SQLConf.PARQUET_FILTER_PUSHDOWN_ENABLED.key is true or not.
    +      withSQLConf(SQLConf.PARQUET_FILTER_PUSHDOWN_ENABLED.key -> pushDown,
    +          SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key -> "false") {
    +        withTempPath { dir =>
    +          val path = s"${dir.getCanonicalPath}/table1"
    +          (1 to 3).map(i => (i, i.toString)).toDF("a", "b").write.parquet(path)
    +          // When a filter is pushed to Parquet, Parquet can apply it to every row.
    --- End diff --
    
    If vectorized reader is disabled, then it will fall back to parquet's reader which would filter row by row as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13728: [SPARK-16010] [SQL] Code Refactoring, Test Case Improvem...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13728
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61006/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13728: [SPARK-16010] [SQL] Code Refactoring, Test Case I...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13728#discussion_r67467841
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala ---
    @@ -545,4 +545,28 @@ class ParquetFilterSuite extends QueryTest with ParquetTest with SharedSQLContex
           }
         }
       }
    +
    +  test("Verify SQLConf PARQUET_FILTER_PUSHDOWN_ENABLED") {
    +    import testImplicits._
    +
    +    Seq("true", "false").foreach { pushDown =>
    +      // When SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key is set to true and all the data types
    +      // of the table schema are AtomicType, the parquet reader uses vectorizedReader.
    +      // In this mode, filters will not be pushed down, no matter whether
    +      // SQLConf.PARQUET_FILTER_PUSHDOWN_ENABLED.key is true or not.
    +      withSQLConf(SQLConf.PARQUET_FILTER_PUSHDOWN_ENABLED.key -> pushDown,
    +          SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key -> "false") {
    +        withTempPath { dir =>
    +          val path = s"${dir.getCanonicalPath}/table1"
    +          (1 to 3).map(i => (i, i.toString)).toDF("a", "b").write.parquet(path)
    +          // When a filter is pushed to Parquet, Parquet can apply it to every row.
    --- End diff --
    
    Based on the test cases, it sounds like each row group contains only one row... Not sure how Parquet implements it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13728: [SPARK-16010] [SQL] Code Refactoring, Test Case Improvem...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13728
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13728: [SPARK-16010] [SQL] Code Refactoring, Test Case Improvem...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13728
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60691/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13728: [SPARK-16010] [SQL] Code Refactoring, Test Case Improvem...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13728
  
    **[Test build #60691 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60691/consoleFull)** for PR 13728 at commit [`9967cc7`](https://github.com/apache/spark/commit/9967cc72545324e7a542fcf1b49372d977c0011b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13728: [SPARK-16010] [SQL] Code Refactoring, Test Case Improvem...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13728
  
    **[Test build #60692 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60692/consoleFull)** for PR 13728 at commit [`d1b2cbb`](https://github.com/apache/spark/commit/d1b2cbbe73e74ee80dd3afa6a9a1fe5214138b22).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13728: [SPARK-16010] [SQL] Code Refactoring, Test Case Improvem...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/13728
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13728: [SPARK-16010] [SQL] Code Refactoring, Test Case Improvem...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13728
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60692/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13728: [SPARK-16010] [SQL] Code Refactoring, Test Case Improvem...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13728
  
    **[Test build #60692 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60692/consoleFull)** for PR 13728 at commit [`d1b2cbb`](https://github.com/apache/spark/commit/d1b2cbbe73e74ee80dd3afa6a9a1fe5214138b22).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13728: [SPARK-16010] [SQL] Code Refactoring, Test Case Improvem...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13728
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13728: [SPARK-16010] [SQL] Code Refactoring, Test Case Improvem...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13728
  
    **[Test build #61006 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61006/consoleFull)** for PR 13728 at commit [`d1b2cbb`](https://github.com/apache/spark/commit/d1b2cbbe73e74ee80dd3afa6a9a1fe5214138b22).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13728: [SPARK-16010] [SQL] Code Refactoring, Test Case Improvem...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13728
  
    **[Test build #60680 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60680/consoleFull)** for PR 13728 at commit [`ad1f18c`](https://github.com/apache/spark/commit/ad1f18cf4ebf189581997876cd13614ec940b961).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13728: [SPARK-16010] [SQL] Code Refactoring, Test Case I...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13728#discussion_r67466720
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala ---
    @@ -545,4 +545,28 @@ class ParquetFilterSuite extends QueryTest with ParquetTest with SharedSQLContex
           }
         }
       }
    +
    +  test("Verify SQLConf PARQUET_FILTER_PUSHDOWN_ENABLED") {
    +    import testImplicits._
    +
    +    Seq("true", "false").foreach { pushDown =>
    +      // When SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key is set to true and all the data types
    +      // of the table schema are AtomicType, the parquet reader uses vectorizedReader.
    +      // In this mode, filters will not be pushed down, no matter whether
    +      // SQLConf.PARQUET_FILTER_PUSHDOWN_ENABLED.key is true or not.
    +      withSQLConf(SQLConf.PARQUET_FILTER_PUSHDOWN_ENABLED.key -> pushDown,
    +          SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key -> "false") {
    +        withTempPath { dir =>
    +          val path = s"${dir.getCanonicalPath}/table1"
    +          (1 to 3).map(i => (i, i.toString)).toDF("a", "b").write.parquet(path)
    +          // When a filter is pushed to Parquet, Parquet can apply it to every row.
    --- End diff --
    
    @HyukjinKwon Davies's comment is just about how Parquet prunes the rows. It is on the row group level. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13728: [SPARK-16010] [SQL] Code Refactoring, Test Case I...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13728#discussion_r67466643
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala ---
    @@ -545,4 +545,28 @@ class ParquetFilterSuite extends QueryTest with ParquetTest with SharedSQLContex
           }
         }
       }
    +
    +  test("Verify SQLConf PARQUET_FILTER_PUSHDOWN_ENABLED") {
    +    import testImplicits._
    +
    +    Seq("true", "false").foreach { pushDown =>
    +      // When SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key is set to true and all the data types
    +      // of the table schema are AtomicType, the parquet reader uses vectorizedReader.
    +      // In this mode, filters will not be pushed down, no matter whether
    +      // SQLConf.PARQUET_FILTER_PUSHDOWN_ENABLED.key is true or not.
    +      withSQLConf(SQLConf.PARQUET_FILTER_PUSHDOWN_ENABLED.key -> pushDown,
    +          SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key -> "false") {
    +        withTempPath { dir =>
    +          val path = s"${dir.getCanonicalPath}/table1"
    +          (1 to 3).map(i => (i, i.toString)).toDF("a", "b").write.parquet(path)
    +          // When a filter is pushed to Parquet, Parquet can apply it to every row.
    --- End diff --
    
    @davies You are right. Sorry, I just simply copied this comment from the other test cases. Let me remove all of them. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13728: [SPARK-16010] [SQL] Code Refactoring, Test Case Improvem...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13728
  
    **[Test build #60680 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60680/consoleFull)** for PR 13728 at commit [`ad1f18c`](https://github.com/apache/spark/commit/ad1f18cf4ebf189581997876cd13614ec940b961).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13728: [SPARK-16010] [SQL] Code Refactoring, Test Case Improvem...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13728
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13728: [SPARK-16010] [SQL] Code Refactoring, Test Case I...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile closed the pull request at:

    https://github.com/apache/spark/pull/13728


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13728: [SPARK-16010] [SQL] Code Refactoring, Test Case Improvem...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/13728
  
    @gatorsmile hm.. Doesn't Parquet filter2 filter and also prune the rows as well as row group level? I think the copied test was written by me before..


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13728: [SPARK-16010] [SQL] Code Refactoring, Test Case Improvem...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13728
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60680/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13728: [SPARK-16010] [SQL] Code Refactoring, Test Case I...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13728#discussion_r67464686
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala ---
    @@ -545,4 +545,28 @@ class ParquetFilterSuite extends QueryTest with ParquetTest with SharedSQLContex
           }
         }
       }
    +
    +  test("Verify SQLConf PARQUET_FILTER_PUSHDOWN_ENABLED") {
    +    import testImplicits._
    +
    +    Seq("true", "false").foreach { pushDown =>
    +      // When SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key is set to true and all the data types
    +      // of the table schema are AtomicType, the parquet reader uses vectorizedReader.
    +      // In this mode, filters will not be pushed down, no matter whether
    +      // SQLConf.PARQUET_FILTER_PUSHDOWN_ENABLED.key is true or not.
    +      withSQLConf(SQLConf.PARQUET_FILTER_PUSHDOWN_ENABLED.key -> pushDown,
    +          SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key -> "false") {
    +        withTempPath { dir =>
    +          val path = s"${dir.getCanonicalPath}/table1"
    +          (1 to 3).map(i => (i, i.toString)).toDF("a", "b").write.parquet(path)
    +          // When a filter is pushed to Parquet, Parquet can apply it to every row.
    --- End diff --
    
    Is this true? I thought the filter is only applied to row group.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13728: [SPARK-16010] [SQL] Code Refactoring, Test Case Improvem...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/13728
  
    uh... I see @HyukjinKwon I did not realize this filter was used twice. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13728: [SPARK-16010] [SQL] Code Refactoring, Test Case Improvem...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13728
  
    **[Test build #61006 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61006/consoleFull)** for PR 13728 at commit [`d1b2cbb`](https://github.com/apache/spark/commit/d1b2cbbe73e74ee80dd3afa6a9a1fe5214138b22).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13728: [SPARK-16010] [SQL] Code Refactoring, Test Case Improvem...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13728
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org