You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by liancheng <gi...@git.apache.org> on 2015/09/17 20:51:25 UTC

[GitHub] spark pull request: [SPARK-10623] [SQL] Fixes ORC predicate push-d...

GitHub user liancheng opened a pull request:

    https://github.com/apache/spark/pull/8799

    [SPARK-10623] [SQL] Fixes ORC predicate push-down

    When pushing down a leaf predicate, ORC `SearchArgument` builder requires an extra "parent" predicate (any one among AND / OR / NOT) to wrap the leaf predicate. E.g., to push down `a < 1`, we must build `AND(a < 1)` instead. Fortunately, when actually constructing the `SearchArgument`, the builder will eliminate all those unnecessary wrappers.
    
    This PR is based on #8783 authored by @zhzhan. I also took the chance to simply `OrcFilters` a little bit to improve readability.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/liancheng/spark spark-10623/fix-orc-ppd

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/8799.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #8799
    
----
commit 5febd8546ab4bcbb4ceb5e810b4547b857b39dd7
Author: Cheng Lian <li...@databricks.com>
Date:   2015-09-17T16:19:45Z

    Fixes ORC PPD

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10623] [SQL] Fixes ORC predicate push-d...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8799#issuecomment-141190801
  
      [Test build #42613 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42613/consoleFull) for   PR 8799 at commit [`5febd85`](https://github.com/apache/spark/commit/5febd8546ab4bcbb4ceb5e810b4547b857b39dd7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10623] [SQL] Fixes ORC predicate push-d...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8799#issuecomment-141271421
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10623] [SQL] Fixes ORC predicate push-d...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8799#issuecomment-141583700
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42688/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10623] [SQL] Fixes ORC predicate push-d...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8799#issuecomment-141583698
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10623] [SQL] Fixes ORC predicate push-d...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8799#issuecomment-141186193
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10623] [SQL] Fixes ORC predicate push-d...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8799#discussion_r39893528
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFilters.scala ---
    @@ -32,10 +32,10 @@ import org.apache.spark.sql.sources._
      */
     private[orc] object OrcFilters extends Logging {
       def createFilter(expr: Array[Filter]): Option[SearchArgument] = {
    -    expr.reduceOption(And).flatMap { conjunction =>
    -      val builder = SearchArgumentFactory.newBuilder()
    -      buildSearchArgument(conjunction, builder).map(_.build())
    -    }
    +    for {
    +      conjunction <- expr.reduceOption(And)
    +      builder <- buildSearchArgument(conjunction, SearchArgumentFactory.newBuilder())
    +    } yield builder.build()
    --- End diff --
    
    It is not very clear how this part works. Can we add some comments?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10623] [SQL] Fixes ORC predicate push-d...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8799#issuecomment-141186129
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10623] [SQL] Fixes ORC predicate push-d...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/8799


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10623] [SQL] Fixes ORC predicate push-d...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8799#issuecomment-141227108
  
      [Test build #42613 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42613/console) for   PR 8799 at commit [`5febd85`](https://github.com/apache/spark/commit/5febd8546ab4bcbb4ceb5e810b4547b857b39dd7).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10623] [SQL] Fixes ORC predicate push-d...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8799#issuecomment-141271366
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10623] [SQL] Fixes ORC predicate push-d...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8799#discussion_r39913803
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcQuerySuite.scala ---
    @@ -344,4 +344,34 @@ class OrcQuerySuite extends QueryTest with BeforeAndAfterAll with OrcTest {
           }
         }
       }
    +
    +  test("SPARK-10623 Enable ORC PPD") {
    +    withTempPath { dir =>
    +      withSQLConf(SQLConf.ORC_FILTER_PUSHDOWN_ENABLED.key -> "true") {
    +        import testImplicits._
    +
    +        val path = dir.getCanonicalPath
    +        sqlContext.range(10).coalesce(1).write.orc(path)
    +        val df = sqlContext.read.orc(path)
    +
    +        def checkPredicate(pred: Column, answer: Seq[Long]): Unit = {
    +          checkAnswer(df.where(pred), answer.map(Row(_)))
    --- End diff --
    
    It just checks answers. It's annoying that I couldn't find a programmatical way to verify whether its pushed down or not. Checked through logs manually though.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10623] [SQL] Fixes ORC predicate push-d...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8799#discussion_r39908195
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcQuerySuite.scala ---
    @@ -344,4 +344,34 @@ class OrcQuerySuite extends QueryTest with BeforeAndAfterAll with OrcTest {
           }
         }
       }
    +
    +  test("SPARK-10623 Enable ORC PPD") {
    +    withTempPath { dir =>
    +      withSQLConf(SQLConf.ORC_FILTER_PUSHDOWN_ENABLED.key -> "true") {
    +        import testImplicits._
    +
    +        val path = dir.getCanonicalPath
    +        sqlContext.range(10).coalesce(1).write.orc(path)
    +        val df = sqlContext.read.orc(path)
    +
    +        def checkPredicate(pred: Column, answer: Seq[Long]): Unit = {
    +          checkAnswer(df.where(pred), answer.map(Row(_)))
    --- End diff --
    
    Does this test check if the predicate is really pushded down? Or it just check answers?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10623] [SQL] Fixes ORC predicate push-d...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8799#issuecomment-141272633
  
      [Test build #42625 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42625/consoleFull) for   PR 8799 at commit [`c3f6692`](https://github.com/apache/spark/commit/c3f669228b9ceab6687865b9e4f1be953ecaf8c6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10623] [SQL] Fixes ORC predicate push-d...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8799#issuecomment-141583625
  
      [Test build #42688 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42688/console) for   PR 8799 at commit [`a694ba2`](https://github.com/apache/spark/commit/a694ba2836cff9ceb16a44e4de60f70f349aa353).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10623] [SQL] Fixes ORC predicate push-d...

Posted by zhzhan <gi...@git.apache.org>.
Github user zhzhan commented on the pull request:

    https://github.com/apache/spark/pull/8799#issuecomment-141355072
  
    LGTM Thanks for fixing this. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10623] [SQL] Fixes ORC predicate push-d...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8799#issuecomment-141562030
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10623] [SQL] Fixes ORC predicate push-d...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8799#issuecomment-141562005
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10623] [SQL] Fixes ORC predicate push-d...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8799#issuecomment-141300756
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42625/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10623] [SQL] Fixes ORC predicate push-d...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/8799#issuecomment-141553760
  
    two comments. Otherwise, looks good.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10623] [SQL] Fixes ORC predicate push-d...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8799#issuecomment-141300755
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10623] [SQL] Fixes ORC predicate push-d...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8799#discussion_r39895460
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFilters.scala ---
    @@ -102,46 +102,32 @@ private[orc] object OrcFilters extends Logging {
               negate <- buildSearchArgument(child, builder.startNot())
             } yield negate.end()
     
    -      case EqualTo(attribute, value) =>
    -        Option(value)
    -          .filter(isSearchableLiteral)
    -          .map(builder.equals(attribute, _))
    +      case EqualTo(attribute, value) if isSearchableLiteral(value) =>
    +        Some(builder.startAnd().equals(attribute, value).end())
     
    -      case EqualNullSafe(attribute, value) =>
    -        Option(value)
    -          .filter(isSearchableLiteral)
    -          .map(builder.nullSafeEquals(attribute, _))
    +      case EqualNullSafe(attribute, value) if isSearchableLiteral(value) =>
    +        Some(builder.startAnd().nullSafeEquals(attribute, value).end())
     
    -      case LessThan(attribute, value) =>
    -        Option(value)
    -          .filter(isSearchableLiteral)
    -          .map(builder.lessThan(attribute, _))
    +      case LessThan(attribute, value) if isSearchableLiteral(value) =>
    +        Some(builder.startAnd().lessThan(attribute, value).end())
     
    -      case LessThanOrEqual(attribute, value) =>
    -        Option(value)
    -          .filter(isSearchableLiteral)
    -          .map(builder.lessThanEquals(attribute, _))
    +      case LessThanOrEqual(attribute, value) if isSearchableLiteral(value) =>
    +        Some(builder.startAnd().lessThanEquals(attribute, value).end())
     
    -      case GreaterThan(attribute, value) =>
    -        Option(value)
    -          .filter(isSearchableLiteral)
    -          .map(builder.startNot().lessThanEquals(attribute, _).end())
    +      case GreaterThan(attribute, value) if isSearchableLiteral(value) =>
    +        Some(builder.startNot().lessThanEquals(attribute, value).end())
     
    -      case GreaterThanOrEqual(attribute, value) =>
    -        Option(value)
    -          .filter(isSearchableLiteral)
    -          .map(builder.startNot().lessThan(attribute, _).end())
    +      case GreaterThanOrEqual(attribute, value) if isSearchableLiteral(value) =>
    +        Some(builder.startNot().lessThan(attribute, value).end())
     
           case IsNull(attribute) =>
    -        Some(builder.isNull(attribute))
    +        Some(builder.startAnd().isNull(attribute).end())
     
           case IsNotNull(attribute) =>
             Some(builder.startNot().isNull(attribute).end())
     
    -      case In(attribute, values) =>
    -        Option(values)
    -          .filter(_.forall(isSearchableLiteral))
    -          .map(builder.in(attribute, _))
    +      case In(attribute, values) if values.forall(isSearchableLiteral) =>
    +        Some(builder.startAnd().in(attribute, values.map(_.asInstanceOf[AnyRef]): _*).end())
    --- End diff --
    
    Will this `In` get changed to`InSet` by the optimizer when all values are literals?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10623] [SQL] Fixes ORC predicate push-d...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8799#issuecomment-141227315
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10623] [SQL] Fixes ORC predicate push-d...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8799#discussion_r39897629
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFilters.scala ---
    @@ -102,46 +102,32 @@ private[orc] object OrcFilters extends Logging {
               negate <- buildSearchArgument(child, builder.startNot())
             } yield negate.end()
     
    -      case EqualTo(attribute, value) =>
    -        Option(value)
    -          .filter(isSearchableLiteral)
    -          .map(builder.equals(attribute, _))
    +      case EqualTo(attribute, value) if isSearchableLiteral(value) =>
    +        Some(builder.startAnd().equals(attribute, value).end())
     
    -      case EqualNullSafe(attribute, value) =>
    -        Option(value)
    -          .filter(isSearchableLiteral)
    -          .map(builder.nullSafeEquals(attribute, _))
    +      case EqualNullSafe(attribute, value) if isSearchableLiteral(value) =>
    +        Some(builder.startAnd().nullSafeEquals(attribute, value).end())
     
    -      case LessThan(attribute, value) =>
    -        Option(value)
    -          .filter(isSearchableLiteral)
    -          .map(builder.lessThan(attribute, _))
    +      case LessThan(attribute, value) if isSearchableLiteral(value) =>
    +        Some(builder.startAnd().lessThan(attribute, value).end())
     
    -      case LessThanOrEqual(attribute, value) =>
    -        Option(value)
    -          .filter(isSearchableLiteral)
    -          .map(builder.lessThanEquals(attribute, _))
    +      case LessThanOrEqual(attribute, value) if isSearchableLiteral(value) =>
    +        Some(builder.startAnd().lessThanEquals(attribute, value).end())
     
    -      case GreaterThan(attribute, value) =>
    -        Option(value)
    -          .filter(isSearchableLiteral)
    -          .map(builder.startNot().lessThanEquals(attribute, _).end())
    +      case GreaterThan(attribute, value) if isSearchableLiteral(value) =>
    +        Some(builder.startNot().lessThanEquals(attribute, value).end())
     
    -      case GreaterThanOrEqual(attribute, value) =>
    -        Option(value)
    -          .filter(isSearchableLiteral)
    -          .map(builder.startNot().lessThan(attribute, _).end())
    +      case GreaterThanOrEqual(attribute, value) if isSearchableLiteral(value) =>
    +        Some(builder.startNot().lessThan(attribute, value).end())
     
           case IsNull(attribute) =>
    -        Some(builder.isNull(attribute))
    +        Some(builder.startAnd().isNull(attribute).end())
     
           case IsNotNull(attribute) =>
             Some(builder.startNot().isNull(attribute).end())
     
    -      case In(attribute, values) =>
    -        Option(values)
    -          .filter(_.forall(isSearchableLiteral))
    -          .map(builder.in(attribute, _))
    +      case In(attribute, values) if values.forall(isSearchableLiteral) =>
    +        Some(builder.startAnd().in(attribute, values.map(_.asInstanceOf[AnyRef]): _*).end())
    --- End diff --
    
    ah ok.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10623] [SQL] Fixes ORC predicate push-d...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/8799#issuecomment-141607187
  
    ok. Merging to master and branch 1.5.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10623] [SQL] Fixes ORC predicate push-d...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8799#discussion_r39791707
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFilters.scala ---
    @@ -102,46 +102,32 @@ private[orc] object OrcFilters extends Logging {
               negate <- buildSearchArgument(child, builder.startNot())
             } yield negate.end()
     
    -      case EqualTo(attribute, value) =>
    -        Option(value)
    -          .filter(isSearchableLiteral)
    -          .map(builder.equals(attribute, _))
    +      case EqualTo(attribute, value) if isSearchableLiteral(value) =>
    +        Some(builder.startAnd().equals(attribute, value).end())
     
    -      case EqualNullSafe(attribute, value) =>
    -        Option(value)
    -          .filter(isSearchableLiteral)
    -          .map(builder.nullSafeEquals(attribute, _))
    +      case EqualNullSafe(attribute, value) if isSearchableLiteral(value) =>
    +        Some(builder.startAnd().nullSafeEquals(attribute, value).end())
     
    -      case LessThan(attribute, value) =>
    -        Option(value)
    -          .filter(isSearchableLiteral)
    -          .map(builder.lessThan(attribute, _))
    +      case LessThan(attribute, value) if isSearchableLiteral(value) =>
    +        Some(builder.startAnd().lessThan(attribute, value).end())
     
    -      case LessThanOrEqual(attribute, value) =>
    -        Option(value)
    -          .filter(isSearchableLiteral)
    -          .map(builder.lessThanEquals(attribute, _))
    +      case LessThanOrEqual(attribute, value) if isSearchableLiteral(value) =>
    +        Some(builder.startAnd().lessThanEquals(attribute, value).end())
     
    -      case GreaterThan(attribute, value) =>
    -        Option(value)
    -          .filter(isSearchableLiteral)
    -          .map(builder.startNot().lessThanEquals(attribute, _).end())
    +      case GreaterThan(attribute, value) if isSearchableLiteral(value) =>
    +        Some(builder.startNot().lessThanEquals(attribute, value).end())
     
    -      case GreaterThanOrEqual(attribute, value) =>
    -        Option(value)
    -          .filter(isSearchableLiteral)
    -          .map(builder.startNot().lessThan(attribute, _).end())
    +      case GreaterThanOrEqual(attribute, value) if isSearchableLiteral(value) =>
    +        Some(builder.startNot().lessThan(attribute, value).end())
     
           case IsNull(attribute) =>
    -        Some(builder.isNull(attribute))
    +        Some(builder.startAnd().isNull(attribute).end())
     
           case IsNotNull(attribute) =>
             Some(builder.startNot().isNull(attribute).end())
     
    -      case In(attribute, values) =>
    -        Option(values)
    -          .filter(_.forall(isSearchableLiteral))
    -          .map(builder.in(attribute, _))
    --- End diff --
    
    The `in(attribute, _)` part is wrong, because `in` accepts varargs instead of a list.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10623] [SQL] Fixes ORC predicate push-d...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8799#issuecomment-141300673
  
      [Test build #42625 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42625/console) for   PR 8799 at commit [`c3f6692`](https://github.com/apache/spark/commit/c3f669228b9ceab6687865b9e4f1be953ecaf8c6).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10623] [SQL] Fixes ORC predicate push-d...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8799#discussion_r39896944
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFilters.scala ---
    @@ -102,46 +102,32 @@ private[orc] object OrcFilters extends Logging {
               negate <- buildSearchArgument(child, builder.startNot())
             } yield negate.end()
     
    -      case EqualTo(attribute, value) =>
    -        Option(value)
    -          .filter(isSearchableLiteral)
    -          .map(builder.equals(attribute, _))
    +      case EqualTo(attribute, value) if isSearchableLiteral(value) =>
    +        Some(builder.startAnd().equals(attribute, value).end())
     
    -      case EqualNullSafe(attribute, value) =>
    -        Option(value)
    -          .filter(isSearchableLiteral)
    -          .map(builder.nullSafeEquals(attribute, _))
    +      case EqualNullSafe(attribute, value) if isSearchableLiteral(value) =>
    +        Some(builder.startAnd().nullSafeEquals(attribute, value).end())
     
    -      case LessThan(attribute, value) =>
    -        Option(value)
    -          .filter(isSearchableLiteral)
    -          .map(builder.lessThan(attribute, _))
    +      case LessThan(attribute, value) if isSearchableLiteral(value) =>
    +        Some(builder.startAnd().lessThan(attribute, value).end())
     
    -      case LessThanOrEqual(attribute, value) =>
    -        Option(value)
    -          .filter(isSearchableLiteral)
    -          .map(builder.lessThanEquals(attribute, _))
    +      case LessThanOrEqual(attribute, value) if isSearchableLiteral(value) =>
    +        Some(builder.startAnd().lessThanEquals(attribute, value).end())
     
    -      case GreaterThan(attribute, value) =>
    -        Option(value)
    -          .filter(isSearchableLiteral)
    -          .map(builder.startNot().lessThanEquals(attribute, _).end())
    +      case GreaterThan(attribute, value) if isSearchableLiteral(value) =>
    +        Some(builder.startNot().lessThanEquals(attribute, value).end())
     
    -      case GreaterThanOrEqual(attribute, value) =>
    -        Option(value)
    -          .filter(isSearchableLiteral)
    -          .map(builder.startNot().lessThan(attribute, _).end())
    +      case GreaterThanOrEqual(attribute, value) if isSearchableLiteral(value) =>
    +        Some(builder.startNot().lessThan(attribute, value).end())
     
           case IsNull(attribute) =>
    -        Some(builder.isNull(attribute))
    +        Some(builder.startAnd().isNull(attribute).end())
     
           case IsNotNull(attribute) =>
             Some(builder.startNot().isNull(attribute).end())
     
    -      case In(attribute, values) =>
    -        Option(values)
    -          .filter(_.forall(isSearchableLiteral))
    -          .map(builder.in(attribute, _))
    +      case In(attribute, values) if values.forall(isSearchableLiteral) =>
    +        Some(builder.startAnd().in(attribute, values.map(_.asInstanceOf[AnyRef]): _*).end())
    --- End diff --
    
    This is a `sources.In` rather than `expressions.In`, so it should be fine.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10623] [SQL] Fixes ORC predicate push-d...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/8799#discussion_r39792429
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFilters.scala ---
    @@ -32,10 +32,10 @@ import org.apache.spark.sql.sources._
      */
     private[orc] object OrcFilters extends Logging {
       def createFilter(expr: Array[Filter]): Option[SearchArgument] = {
    -    expr.reduceOption(And).flatMap { conjunction =>
    -      val builder = SearchArgumentFactory.newBuilder()
    -      buildSearchArgument(conjunction, builder).map(_.build())
    -    }
    +    for {
    +      conjunction <- expr.reduceOption(And)
    +      builder <- buildSearchArgument(conjunction, SearchArgumentFactory.newBuilder())
    +    } yield builder.build()
    --- End diff --
    
    This change is only for better readability


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10623] [SQL] Fixes ORC predicate push-d...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/8799#issuecomment-141562534
  
      [Test build #42688 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42688/consoleFull) for   PR 8799 at commit [`a694ba2`](https://github.com/apache/spark/commit/a694ba2836cff9ceb16a44e4de60f70f349aa353).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10623] [SQL] Fixes ORC predicate push-d...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/8799#issuecomment-141227319
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42613/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org