You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by dbtsai <gi...@git.apache.org> on 2018/09/27 22:16:57 UTC

[GitHub] spark pull request #22574: [SPARK-25556][SQL] Just remove the unsupported pr...

GitHub user dbtsai opened a pull request:

    https://github.com/apache/spark/pull/22574

    [SPARK-25556][SQL] Just remove the unsupported predicates in Parquet

    ## What changes were proposed in this pull request?
    
    Currently, in `ParquetFilters`, if one of the children predicates is not supported by Parquet, the entire predicates will be thrown away. In fact, if the unsupported predicate is in the top level `And` condition or in the child before hitting `Not` or `Or` condition, it can be safely removed.
    
    ## How was this patch tested?
    
    Tests are added.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dbtsai/spark removeUnsupportedPredicatesInParquet

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22574.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22574
    
----
commit d49d63bc40a7752990583f9afbd10c68025510b3
Author: DB Tsai <d_...@...>
Date:   2018-09-27T22:12:44Z

    Remove unsupported predicates in parquet

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22574: [SPARK-25559][SQL] Just remove the unsupported pr...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22574#discussion_r221153414
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala ---
    @@ -488,26 +494,25 @@ private[parquet] class ParquetFilters(
               .map(_(nameToParquetField(name).fieldName, value))
     
           case sources.And(lhs, rhs) =>
    -        // At here, it is not safe to just convert one side if we do not understand the
    -        // other side. Here is an example used to explain the reason.
    -        // Let's say we have NOT(a = 2 AND b in ('1')) and we do not understand how to
    -        // convert b in ('1'). If we only convert a = 2, we will end up with a filter
    -        // NOT(a = 2), which will generate wrong results.
    -        // Pushing one side of AND down is only safe to do at the top level.
    -        // You can see ParquetRelation's initializeLocalJobFunc method as an example.
    -        for {
    -          lhsFilter <- createFilter(schema, lhs)
    -          rhsFilter <- createFilter(schema, rhs)
    -        } yield FilterApi.and(lhsFilter, rhsFilter)
    +        // If the unsupported predicate is in the top level `And` condition or in the child
    +        // `And` condition before hitting `Not` or `Or` condition, it can be safely removed.
    +        (createFilterHelper(nameToParquetField, lhs, canRemoveOneSideInAnd = true),
    --- End diff --
    
    Addressed. Thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22574: [SPARK-25559][SQL] Just remove the unsupported predicate...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/22574
  
    LGTM, pending jenkins


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22574: [SPARK-25559][SQL] Just remove the unsupported predicate...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22574
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22574: [SPARK-25559][SQL] Just remove the unsupported predicate...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22574
  
    **[Test build #96735 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96735/testReport)** for PR 22574 at commit [`d37b2e8`](https://github.com/apache/spark/commit/d37b2e8883425d121d1e8d1533f934747c28fff3).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22574: [SPARK-25559][SQL] Just remove the unsupported predicate...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22574
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96731/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22574: [SPARK-25559][SQL] Just remove the unsupported predicate...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22574
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3558/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22574: [SPARK-25559][SQL] Just remove the unsupported predicate...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22574
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96725/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22574: [SPARK-25559][SQL] Just remove the unsupported predicate...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22574
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96749/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22574: [SPARK-25559][SQL] Just remove the unsupported predicate...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22574
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22574: [SPARK-25559][SQL] Just remove the unsupported predicate...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22574
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3578/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22574: [SPARK-25559][SQL] Just remove the unsupported predicate...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22574
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22574: [SPARK-25559][SQL] Just remove the unsupported predicate...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22574
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3556/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22574: [SPARK-25559][SQL] Remove the unsupported predicates in ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22574
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96775/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22574: [SPARK-25556][SQL] Just remove the unsupported predicate...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22574
  
    **[Test build #96725 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96725/testReport)** for PR 22574 at commit [`8c76e31`](https://github.com/apache/spark/commit/8c76e315bc0e89d9f049b59c82e97d06c8627025).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22574: [SPARK-25559][SQL] Just remove the unsupported pr...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22574#discussion_r221127868
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala ---
    @@ -488,26 +494,25 @@ private[parquet] class ParquetFilters(
               .map(_(nameToParquetField(name).fieldName, value))
     
           case sources.And(lhs, rhs) =>
    -        // At here, it is not safe to just convert one side if we do not understand the
    -        // other side. Here is an example used to explain the reason.
    -        // Let's say we have NOT(a = 2 AND b in ('1')) and we do not understand how to
    -        // convert b in ('1'). If we only convert a = 2, we will end up with a filter
    -        // NOT(a = 2), which will generate wrong results.
    -        // Pushing one side of AND down is only safe to do at the top level.
    -        // You can see ParquetRelation's initializeLocalJobFunc method as an example.
    -        for {
    -          lhsFilter <- createFilter(schema, lhs)
    -          rhsFilter <- createFilter(schema, rhs)
    -        } yield FilterApi.and(lhsFilter, rhsFilter)
    +        // If the unsupported predicate is in the top level `And` condition or in the child
    +        // `And` condition before hitting `Not` or `Or` condition, it can be safely removed.
    +        (createFilterHelper(nameToParquetField, lhs, canRemoveOneSideInAnd = true),
    +          createFilterHelper(nameToParquetField, rhs, canRemoveOneSideInAnd = true)) match {
    --- End diff --
    
    Thank you for pinging me, @dbtsai . 
    
    Here, instead of `canRemoveOneSideInAnd = true`, we need to use the received `canRemoveOneSideInAnd`, don't we? Otherwise, this will cause a bug in the nested `And` clauses. So, could you review and merge the test case, https://github.com/dbtsai/spark/pull/5 and fix the PR accordingly?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22574: [SPARK-25559][SQL] Just remove the unsupported predicate...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/22574
  
    retest this please.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22574: [SPARK-25559][SQL] Just remove the unsupported predicate...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22574
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22574: [SPARK-25559][SQL] Just remove the unsupported predicate...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22574
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96735/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22574: [SPARK-25559][SQL] Just remove the unsupported predicate...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on the issue:

    https://github.com/apache/spark/pull/22574
  
    test this again.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22574: [SPARK-25559][SQL] Just remove the unsupported pr...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22574#discussion_r221374514
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala ---
    @@ -488,26 +494,27 @@ private[parquet] class ParquetFilters(
               .map(_(nameToParquetField(name).fieldName, value))
     
           case sources.And(lhs, rhs) =>
    -        // At here, it is not safe to just convert one side if we do not understand the
    -        // other side. Here is an example used to explain the reason.
    -        // Let's say we have NOT(a = 2 AND b in ('1')) and we do not understand how to
    -        // convert b in ('1'). If we only convert a = 2, we will end up with a filter
    -        // NOT(a = 2), which will generate wrong results.
    -        // Pushing one side of AND down is only safe to do at the top level.
    -        // You can see ParquetRelation's initializeLocalJobFunc method as an example.
    --- End diff --
    
    addressed and added more tests.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22574: [SPARK-25559][SQL] Just remove the unsupported predicate...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22574
  
    **[Test build #96731 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96731/testReport)** for PR 22574 at commit [`29a9aa0`](https://github.com/apache/spark/commit/29a9aa0a30bea4d126e8668ade66678bdc2ab5fa).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22574: [SPARK-25559][SQL] Just remove the unsupported pr...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22574#discussion_r221277340
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala ---
    @@ -488,26 +494,27 @@ private[parquet] class ParquetFilters(
               .map(_(nameToParquetField(name).fieldName, value))
     
           case sources.And(lhs, rhs) =>
    -        // At here, it is not safe to just convert one side if we do not understand the
    -        // other side. Here is an example used to explain the reason.
    -        // Let's say we have NOT(a = 2 AND b in ('1')) and we do not understand how to
    -        // convert b in ('1'). If we only convert a = 2, we will end up with a filter
    -        // NOT(a = 2), which will generate wrong results.
    -        // Pushing one side of AND down is only safe to do at the top level.
    -        // You can see ParquetRelation's initializeLocalJobFunc method as an example.
    --- End diff --
    
    +1


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22574: [SPARK-25559][SQL] Just remove the unsupported pr...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22574#discussion_r221304857
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala ---
    @@ -488,26 +494,27 @@ private[parquet] class ParquetFilters(
               .map(_(nameToParquetField(name).fieldName, value))
     
           case sources.And(lhs, rhs) =>
    -        // At here, it is not safe to just convert one side if we do not understand the
    -        // other side. Here is an example used to explain the reason.
    -        // Let's say we have NOT(a = 2 AND b in ('1')) and we do not understand how to
    -        // convert b in ('1'). If we only convert a = 2, we will end up with a filter
    -        // NOT(a = 2), which will generate wrong results.
    -        // Pushing one side of AND down is only safe to do at the top level.
    -        // You can see ParquetRelation's initializeLocalJobFunc method as an example.
    --- End diff --
    
    +1


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22574: [SPARK-25559][SQL] Just remove the unsupported predicate...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22574
  
    **[Test build #96735 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96735/testReport)** for PR 22574 at commit [`d37b2e8`](https://github.com/apache/spark/commit/d37b2e8883425d121d1e8d1533f934747c28fff3).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22574: [SPARK-25556][SQL] Just remove the unsupported predicate...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22574
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22574: [SPARK-25559][SQL] Just remove the unsupported pr...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22574#discussion_r221159822
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala ---
    @@ -488,26 +494,27 @@ private[parquet] class ParquetFilters(
               .map(_(nameToParquetField(name).fieldName, value))
     
           case sources.And(lhs, rhs) =>
    -        // At here, it is not safe to just convert one side if we do not understand the
    -        // other side. Here is an example used to explain the reason.
    -        // Let's say we have NOT(a = 2 AND b in ('1')) and we do not understand how to
    -        // convert b in ('1'). If we only convert a = 2, we will end up with a filter
    -        // NOT(a = 2), which will generate wrong results.
    -        // Pushing one side of AND down is only safe to do at the top level.
    -        // You can see ParquetRelation's initializeLocalJobFunc method as an example.
    --- End diff --
    
    This example is good to show the cases we can't remove one side. Can we still keep it?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22574: [SPARK-25559][SQL] Just remove the unsupported predicate...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22574
  
    **[Test build #96724 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96724/testReport)** for PR 22574 at commit [`d49d63b`](https://github.com/apache/spark/commit/d49d63bc40a7752990583f9afbd10c68025510b3).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22574: [SPARK-25559][SQL] Just remove the unsupported predicate...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22574
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22574: [SPARK-25559][SQL] Just remove the unsupported predicate...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22574
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3567/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22574: [SPARK-25559][SQL] Just remove the unsupported predicate...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/22574
  
    Can we have a more clear title which is described as in the description? Currently, the claim is too broader than the description.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22574: [SPARK-25559][SQL] Just remove the unsupported predicate...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22574
  
    **[Test build #96749 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96749/testReport)** for PR 22574 at commit [`d37b2e8`](https://github.com/apache/spark/commit/d37b2e8883425d121d1e8d1533f934747c28fff3).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22574: [SPARK-25559][SQL] Just remove the unsupported predicate...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22574
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22574: [SPARK-25556][SQL] Just remove the unsupported predicate...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22574
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3553/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22574: [SPARK-25556][SQL] Just remove the unsupported predicate...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22574
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3552/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22574: [SPARK-25559][SQL] Just remove the unsupported predicate...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22574
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22574: [SPARK-25559][SQL] Remove the unsupported predicates in ...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on the issue:

    https://github.com/apache/spark/pull/22574
  
    I changed the title, and hopefully, it's much more clear now.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22574: [SPARK-25556][SQL] Just remove the unsupported predicate...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22574
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22574: [SPARK-25559][SQL] Just remove the unsupported predicate...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22574
  
    **[Test build #96725 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96725/testReport)** for PR 22574 at commit [`8c76e31`](https://github.com/apache/spark/commit/8c76e315bc0e89d9f049b59c82e97d06c8627025).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22574: [SPARK-25559][SQL] Just remove the unsupported predicate...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22574
  
    **[Test build #96731 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96731/testReport)** for PR 22574 at commit [`29a9aa0`](https://github.com/apache/spark/commit/29a9aa0a30bea4d126e8668ade66678bdc2ab5fa).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22574: [SPARK-25559][SQL] Just remove the unsupported predicate...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22574
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22574: [SPARK-25559][SQL] Just remove the unsupported predicate...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22574
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96724/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22574: [SPARK-25559][SQL] Just remove the unsupported predicate...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22574
  
    **[Test build #96775 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96775/testReport)** for PR 22574 at commit [`9a9e47f`](https://github.com/apache/spark/commit/9a9e47fb242afb94e7df917c852425cc0f5114e0).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22574: [SPARK-25559][SQL] Remove the unsupported predicates in ...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/22574
  
    Merged to master.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22574: [SPARK-25559][SQL] Just remove the unsupported predicate...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22574
  
    **[Test build #96749 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96749/testReport)** for PR 22574 at commit [`d37b2e8`](https://github.com/apache/spark/commit/d37b2e8883425d121d1e8d1533f934747c28fff3).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22574: [SPARK-25559][SQL] Just remove the unsupported pr...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22574#discussion_r221152126
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala ---
    @@ -488,26 +494,25 @@ private[parquet] class ParquetFilters(
               .map(_(nameToParquetField(name).fieldName, value))
     
           case sources.And(lhs, rhs) =>
    -        // At here, it is not safe to just convert one side if we do not understand the
    -        // other side. Here is an example used to explain the reason.
    -        // Let's say we have NOT(a = 2 AND b in ('1')) and we do not understand how to
    -        // convert b in ('1'). If we only convert a = 2, we will end up with a filter
    -        // NOT(a = 2), which will generate wrong results.
    -        // Pushing one side of AND down is only safe to do at the top level.
    -        // You can see ParquetRelation's initializeLocalJobFunc method as an example.
    -        for {
    -          lhsFilter <- createFilter(schema, lhs)
    -          rhsFilter <- createFilter(schema, rhs)
    -        } yield FilterApi.and(lhsFilter, rhsFilter)
    +        // If the unsupported predicate is in the top level `And` condition or in the child
    +        // `And` condition before hitting `Not` or `Or` condition, it can be safely removed.
    +        (createFilterHelper(nameToParquetField, lhs, canRemoveOneSideInAnd = true),
    --- End diff --
    
    a bit about style
    ```
    val lhs = createFilterHelper...
    val rhs = createFilterHelper...
    (lhs, rhs) match {
      ...
    }
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22574: [SPARK-25559][SQL] Just remove the unsupported predicate...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/22574
  
    Also, cc @rdblue 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22574: [SPARK-25556][SQL] Just remove the unsupported predicate...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on the issue:

    https://github.com/apache/spark/pull/22574
  
    cc @gatorsmile @cloud-fan @HyukjinKwon @dongjoon-hyun @viirya 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22574: [SPARK-25559][SQL] Remove the unsupported predica...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/22574


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22574: [SPARK-25559][SQL] Just remove the unsupported predicate...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22574
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22574: [SPARK-25559][SQL] Just remove the unsupported pr...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22574#discussion_r221152340
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala ---
    @@ -488,26 +494,25 @@ private[parquet] class ParquetFilters(
               .map(_(nameToParquetField(name).fieldName, value))
     
           case sources.And(lhs, rhs) =>
    -        // At here, it is not safe to just convert one side if we do not understand the
    -        // other side. Here is an example used to explain the reason.
    -        // Let's say we have NOT(a = 2 AND b in ('1')) and we do not understand how to
    -        // convert b in ('1'). If we only convert a = 2, we will end up with a filter
    -        // NOT(a = 2), which will generate wrong results.
    -        // Pushing one side of AND down is only safe to do at the top level.
    -        // You can see ParquetRelation's initializeLocalJobFunc method as an example.
    -        for {
    -          lhsFilter <- createFilter(schema, lhs)
    -          rhsFilter <- createFilter(schema, rhs)
    -        } yield FilterApi.and(lhsFilter, rhsFilter)
    +        // If the unsupported predicate is in the top level `And` condition or in the child
    +        // `And` condition before hitting `Not` or `Or` condition, it can be safely removed.
    +        (createFilterHelper(nameToParquetField, lhs, canRemoveOneSideInAnd = true),
    +          createFilterHelper(nameToParquetField, rhs, canRemoveOneSideInAnd = true)) match {
    --- End diff --
    
    Thanks for catching this. I just fixed it. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22574: [SPARK-25556][SQL] Just remove the unsupported predicate...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22574
  
    **[Test build #96724 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96724/testReport)** for PR 22574 at commit [`d49d63b`](https://github.com/apache/spark/commit/d49d63bc40a7752990583f9afbd10c68025510b3).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22574: [SPARK-25559][SQL] Just remove the unsupported pr...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22574#discussion_r221151966
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala ---
    @@ -488,26 +494,25 @@ private[parquet] class ParquetFilters(
               .map(_(nameToParquetField(name).fieldName, value))
     
           case sources.And(lhs, rhs) =>
    -        // At here, it is not safe to just convert one side if we do not understand the
    -        // other side. Here is an example used to explain the reason.
    -        // Let's say we have NOT(a = 2 AND b in ('1')) and we do not understand how to
    -        // convert b in ('1'). If we only convert a = 2, we will end up with a filter
    -        // NOT(a = 2), which will generate wrong results.
    -        // Pushing one side of AND down is only safe to do at the top level.
    -        // You can see ParquetRelation's initializeLocalJobFunc method as an example.
    -        for {
    -          lhsFilter <- createFilter(schema, lhs)
    -          rhsFilter <- createFilter(schema, rhs)
    -        } yield FilterApi.and(lhsFilter, rhsFilter)
    +        // If the unsupported predicate is in the top level `And` condition or in the child
    +        // `And` condition before hitting `Not` or `Or` condition, it can be safely removed.
    +        (createFilterHelper(nameToParquetField, lhs, canRemoveOneSideInAnd = true),
    +          createFilterHelper(nameToParquetField, rhs, canRemoveOneSideInAnd = true)) match {
    --- End diff --
    
    +1


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22574: [SPARK-25559][SQL] Remove the unsupported predicates in ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22574
  
    **[Test build #96775 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96775/testReport)** for PR 22574 at commit [`9a9e47f`](https://github.com/apache/spark/commit/9a9e47fb242afb94e7df917c852425cc0f5114e0).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22574: [SPARK-25559][SQL] Remove the unsupported predicates in ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22574
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org