You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by jinxing64 <gi...@git.apache.org> on 2017/10/29 06:55:35 UTC

[GitHub] spark pull request #19602: [SPARK-22384][SQL] Refine partition pruning when ...

GitHub user jinxing64 opened a pull request:

    https://github.com/apache/spark/pull/19602

    [SPARK-22384][SQL] Refine partition pruning when attribute is wrapped in Cast

    ## What changes were proposed in this pull request?
    
    Sql below will get all partitions from metastore, which put much burden on metastore;
    ```
    CREATE TABLE test (value INT) PARTITIONED BY (dt STRING)
    SELECT * from test where dt=2017
    ```
    The reason is that the the analyzed attribute dt is wrapped in Cast and HiveShim fails to generate a proper partition filter.
    This pr proposes to take `Cast` into consideration when generate partition filter.
    
    ## How was this patch tested?
    Test added.
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jinxing64/spark SPARK-22384

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19602.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19602
    
----
commit 26c7a7be273265d8b25ed3fdf4d6fd3b55f225d8
Author: jinxing <ji...@126.com>
Date:   2017-10-26T07:38:51Z

    filter partitions

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19602: [SPARK-22384][SQL] Refine partition pruning when ...

Posted by jinxing64 <gi...@git.apache.org>.
Github user jinxing64 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19602#discussion_r191078043
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala ---
    @@ -53,7 +52,7 @@ class HiveClientSuite(version: String)
           for {
             ds <- 20170101 to 20170103
             h <- 0 to 23
    -        chunk <- Seq("aa", "ab", "ba", "bb")
    +        chunk <- Seq("11", "12", "21", "22")
    --- End diff --
    
    Thanks, I should refine the tests.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19602: [SPARK-22384][SQL] Refine partition pruning when ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19602#discussion_r192286696
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala ---
    @@ -74,23 +86,23 @@ class HiveClientSuite(version: String)
       test(s"getPartitionsByFilter returns all partitions when $tryDirectSqlKey=false") {
         val client = init(false)
         val filteredPartitions = client.getPartitionsByFilter(client.getTable("default", "test"),
    -      Seq(parseExpression("ds=20170101")))
    +      Seq(EqualTo(attr("ds"), Literal(20170101, IntegerType))))
    --- End diff --
    
    we can import `import org.apache.spark.sql.catalyst.dsl.expressions._` to simplify expression creation


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    The direction is right, but we need to restrict it to Cast (string_typed_attr as integral types) and the operator types must be either equal or not equal


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91227/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3751/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19602: [SPARK-22384][SQL] Refine partition pruning when ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19602#discussion_r190312153
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
    @@ -657,18 +656,39 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
     
         val useAdvanced = SQLConf.get.advancedPartitionPredicatePushdownEnabled
     
    +    object ExtractAttribute {
    +      def unapply(expr: Expression): Option[Attribute] = {
    +        expr match {
    +          case attr: Attribute => Some(attr)
    +          case cast @ Cast(child, dt: IntegralType, _) if child.dataType == StringType =>
    --- End diff --
    
    can we generalize it? I think the hive analyzer can also do type coercion so any safe cast should be allowed, like int -> long, int -> string, etc.
    
    We also need a config to safe guard it.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19602: [SPARK-22384][SQL] Refine partition pruning when ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19602#discussion_r192576715
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala ---
    @@ -59,38 +62,62 @@ class HiveClientSuite(version: String)
             "h" -> h.toString,
             "chunk" -> chunk
           ), storageFormat)
    -    assert(partitions.size == testPartitionCount)
    +    assert(partitions0.size == testPartitionCount0)
    +
    +    client.createPartitions(
    +      "default", "test0", partitions0, ignoreIfExists = false)
    +
    +    val partitions1 =
    +      for {
    +        pt <- 0 until 10
    +        chunk <- Seq("aa", "ab", "ba", "bb")
    +      } yield CatalogTablePartition(Map(
    +        "pt" -> pt.toString,
    +        "chunk" -> chunk
    +      ), storageFormat)
    +    assert(partitions1.size == testPartitionCount1)
     
         client.createPartitions(
    -      "default", "test", partitions, ignoreIfExists = false)
    +      "default", "test1", partitions1, ignoreIfExists = false)
    +
         client
       }
     
    +  private def pAttr(table: String, name: String): Attribute = {
    +    val partTypes = client.getTable("default", table).partitionSchema.fields
    +        .map(field => (field.name, field.dataType)).toMap
    +    partTypes.get(name) match {
    --- End diff --
    
    building a map and loop up only once is a waste. We should move the map outside, or look up linearly.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91378/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by jinxing64 <gi...@git.apache.org>.
Github user jinxing64 commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    True, I restrict it to be Cast (string_typed_attr as integral types) and `EqualTo`. `Not(EqualTo)` is not included, since the extra burden put to metastore is minor.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    **[Test build #83181 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83181/testReport)** for PR 19602 at commit [`26c7a7b`](https://github.com/apache/spark/commit/26c7a7be273265d8b25ed3fdf4d6fd3b55f225d8).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83182/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83490/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by jinxing64 <gi...@git.apache.org>.
Github user jinxing64 commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    @gatorsmile 
    Thanks a lot for your help :)
    
    >Can we just evaluate the right side CAST(2017 as STRING), since it is foldable?
    
    Do you mean to add a new rule ? -- cast the type of literal when partition key and the literal are different types.
    
    In current code, `StringType` are promoted(e.g. casted to `IntegerType`) in `PromoteStrings` when it is BinaryComparison (https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala#L366)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19602: [SPARK-22384][SQL] Refine partition pruning when ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19602#discussion_r192865145
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala ---
    @@ -46,10 +48,11 @@ class HiveClientSuite(version: String)
         val hadoopConf = new Configuration()
         hadoopConf.setBoolean(tryDirectSqlKey, tryDirectSql)
         val client = buildClient(hadoopConf)
    -    client
    -      .runSqlHive("CREATE TABLE test (value INT) PARTITIONED BY (ds INT, h INT, chunk STRING)")
    +    client.runSqlHive("CREATE TABLE test0 (value INT) PARTITIONED BY (ds INT, h INT, chunk STRING)")
    --- End diff --
    
    can we reuse the existing data set to test? e.g. `'ds.cast(LongType) == 8L`. Then we can reduce the code diff here.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    **[Test build #91376 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91376/testReport)** for PR 19602 at commit [`1504c66`](https://github.com/apache/spark/commit/1504c663d14635003007863d3b5d387c0100f70d).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19602: [SPARK-22384][SQL] Refine partition pruning when ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19602#discussion_r190579088
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala ---
    @@ -53,7 +52,7 @@ class HiveClientSuite(version: String)
           for {
             ds <- 20170101 to 20170103
             h <- 0 to 23
    -        chunk <- Seq("aa", "ab", "ba", "bb")
    +        chunk <- Seq("11", "12", "21", "22")
    --- End diff --
    
    The first point looks fine, for the second one, can we generate new data for your new test case?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    **[Test build #91227 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91227/testReport)** for PR 19602 at commit [`3f226df`](https://github.com/apache/spark/commit/3f226df949f6c2791aa3dc8572e660ba9dd1d234).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3641/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    **[Test build #90323 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90323/testReport)** for PR 19602 at commit [`57d0c3b`](https://github.com/apache/spark/commit/57d0c3b3e9ece180429172bc925b337d88864b13).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19602: [SPARK-22384][SQL] Refine partition pruning when ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/19602


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    In general cast is hard to be pushed into data source, e.g. `cast(a as string) = string` if a is int, how should data source handle it?
    
    In the meanwhile, I think we can omit most of the cast in the format of `attribute = literal`. e.g. `cast(byteCol as int) = 0`, we know `0` is within byte range, we can convert it to `byteCol = (byte) 0`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    **[Test build #91377 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91377/testReport)** for PR 19602 at commit [`8d5057a`](https://github.com/apache/spark/commit/8d5057a51f71e959c8151133f6fc0c301cf3d13e).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19602: [SPARK-22384][SQL] Refine partition pruning when ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19602#discussion_r190579351
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
    @@ -657,18 +656,46 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
     
         val useAdvanced = SQLConf.get.advancedPartitionPredicatePushdownEnabled
     
    +    object ExtractAttribute {
    +      def unapply(expr: Expression): Option[Attribute] = {
    +        expr match {
    +          case attr: Attribute => Some(attr)
    +          case cast @ Cast(child, dt: StringType, _) if child.dataType.isInstanceOf[NumericType] =>
    +            unapply(child)
    +          case cast @ Cast(child, dt: NumericType, _) if child.dataType == StringType =>
    --- End diff --
    
    I don't think this is safe. It assumes spark and hive has the same behavior when converting invalid string to numbers.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    > select count(1) from test where dt=20180101 
    
    This should be forbidden, users should rewrite it into `dt="20180101"`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    **[Test build #91377 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91377/testReport)** for PR 19602 at commit [`8d5057a`](https://github.com/apache/spark/commit/8d5057a51f71e959c8151133f6fc0c301cf3d13e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    **[Test build #91426 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91426/testReport)** for PR 19602 at commit [`e4c6e1f`](https://github.com/apache/spark/commit/e4c6e1ff713a7033b0a60dabaca5071b480d7600).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    **[Test build #91225 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91225/testReport)** for PR 19602 at commit [`82c4de7`](https://github.com/apache/spark/commit/82c4de7a83d06da9c7529bdb8af2e1949b98f3b0).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by jinxing64 <gi...@git.apache.org>.
Github user jinxing64 commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    @cloud-fan 
    Sorry for late reply, so busy these days.
    In current change:
    1. I follow `Cast.mayTruncate` strictly when extract partition Attribute;
    2. I created new test data in `HiveClientSuite` and `testMetastorePartitionFiltering` can be used for validation for tables with different partitions schema.
    
    If needed, I can create more tests -- different binary comparisons and different datatypes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3752/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    **[Test build #91413 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91413/testReport)** for PR 19602 at commit [`e4c6e1f`](https://github.com/apache/spark/commit/e4c6e1ff713a7033b0a60dabaca5071b480d7600).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by jinxing64 <gi...@git.apache.org>.
Github user jinxing64 commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    And also I think we have same problem for datasource table.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91377/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    **[Test build #91383 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91383/testReport)** for PR 19602 at commit [`98c2512`](https://github.com/apache/spark/commit/98c251235a1d0924a9606be82abf1005dca03e1a).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by jinxing64 <gi...@git.apache.org>.
Github user jinxing64 commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    @gatorsmile 
    Thanks again for review this pr.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19602: [SPARK-22384][SQL] Refine partition pruning when ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19602#discussion_r192286502
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala ---
    @@ -66,6 +65,19 @@ class HiveClientSuite(version: String)
         client
       }
     
    +  private def attr(name: String): Attribute = {
    +    name match {
    +      case "ds" =>
    +        AttributeReference("ds", IntegerType)()
    --- End diff --
    
    This is error-prone, we should look up the attribute by name from `spark.table("test")`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    **[Test build #91225 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91225/testReport)** for PR 19602 at commit [`82c4de7`](https://github.com/apache/spark/commit/82c4de7a83d06da9c7529bdb8af2e1949b98f3b0).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    thanks, merging to master!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3754/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    **[Test build #91413 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91413/testReport)** for PR 19602 at commit [`e4c6e1f`](https://github.com/apache/spark/commit/e4c6e1ff713a7033b0a60dabaca5071b480d7600).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    **[Test build #91448 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91448/testReport)** for PR 19602 at commit [`14e99b1`](https://github.com/apache/spark/commit/14e99b18379dadb167a7775f9f2f29209228b373).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91489/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19602: [SPARK-22384][SQL] Refine partition pruning when ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19602#discussion_r191104615
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
    @@ -657,18 +656,46 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
     
         val useAdvanced = SQLConf.get.advancedPartitionPredicatePushdownEnabled
     
    +    object ExtractAttribute {
    +      def unapply(expr: Expression): Option[Attribute] = {
    +        expr match {
    +          case attr: Attribute => Some(attr)
    +          case cast @ Cast(child, dt: StringType, _) if child.dataType.isInstanceOf[NumericType] =>
    +            unapply(child)
    +          case cast @ Cast(child, dt: NumericType, _) if child.dataType == StringType =>
    --- End diff --
    
    I think we should only support safe upcast, `cast("1.234" as int)` should be excluded. Can we follow `Cast.mayTruncate` strictly?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    **[Test build #91103 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91103/testReport)** for PR 19602 at commit [`76676c1`](https://github.com/apache/spark/commit/76676c1982adc9a73c3c5c41c6ddaf50332d4240).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3783/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91383/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3003/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90323/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    **[Test build #91378 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91378/testReport)** for PR 19602 at commit [`8febfa9`](https://github.com/apache/spark/commit/8febfa91d868f2b71e6840ca427cde0a07424028).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19602: [SPARK-22384][SQL] Refine partition pruning when ...

Posted by jinxing64 <gi...@git.apache.org>.
Github user jinxing64 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19602#discussion_r191206457
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
    @@ -642,11 +641,11 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
               col.getType.startsWith(serdeConstants.CHAR_TYPE_NAME))
             .map(col => col.getName).toSet
     
    -      def unapply(attr: Attribute): Option[String] = {
    +      def unapply(attr: Attribute): Option[Attribute] = {
    --- End diff --
    
    Because I need a Attribute to be passed into `decoratedValue`, which will process the Literal -- add or remove quotes if needed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19602: [SPARK-22384][SQL] Refine partition pruning when ...

Posted by jinxing64 <gi...@git.apache.org>.
Github user jinxing64 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19602#discussion_r191078137
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
    @@ -657,18 +656,46 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
     
         val useAdvanced = SQLConf.get.advancedPartitionPredicatePushdownEnabled
     
    +    object ExtractAttribute {
    +      def unapply(expr: Expression): Option[Attribute] = {
    +        expr match {
    +          case attr: Attribute => Some(attr)
    +          case cast @ Cast(child, dt: StringType, _) if child.dataType.isInstanceOf[NumericType] =>
    +            unapply(child)
    +          case cast @ Cast(child, dt: NumericType, _) if child.dataType == StringType =>
    --- End diff --
    
    And can we resolve this issue by adding a config to indicate whether to enable this change?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    **[Test build #91448 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91448/testReport)** for PR 19602 at commit [`14e99b1`](https://github.com/apache/spark/commit/14e99b18379dadb167a7775f9f2f29209228b373).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    For such cases, 
    ```Scala
    CREATE TABLE test (value INT) PARTITIONED BY (dt STRING)
    SELECT * from test where dt=2017
    ```
    
    Can we just evaluate the right side `CAST(2017 as STRING)`, since it is foldable?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19602: [SPARK-22384][SQL] Refine partition pruning when ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19602#discussion_r147572381
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala ---
    @@ -53,7 +52,7 @@ class HiveClientSuite(version: String)
           for {
             ds <- 20170101 to 20170103
             h <- 0 to 23
    -        chunk <- Seq("aa", "ab", "ba", "bb")
    +        chunk <- Seq("11", "12", "21", "22")
    --- End diff --
    
    What is the reason we need to change so many lines in the test case? Could you a new test case, instead of changing the existing one?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    **[Test build #83490 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83490/testReport)** for PR 19602 at commit [`2334543`](https://github.com/apache/spark/commit/233454371aa23a0919d229f7679f18900ab94f1c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91448/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    **[Test build #90323 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90323/testReport)** for PR 19602 at commit [`57d0c3b`](https://github.com/apache/spark/commit/57d0c3b3e9ece180429172bc925b337d88864b13).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by jinxing64 <gi...@git.apache.org>.
Github user jinxing64 commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    I rebased this pr and resolved conflicts. 
    cc @cloud-fan @jiangxb1987  Not sure if you have interest on this.
    Take a look if have time. Thanks !


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    **[Test build #91383 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91383/testReport)** for PR 19602 at commit [`98c2512`](https://github.com/apache/spark/commit/98c251235a1d0924a9606be82abf1005dca03e1a).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91413/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19602: [SPARK-22384][SQL] Refine partition pruning when ...

Posted by jinxing64 <gi...@git.apache.org>.
Github user jinxing64 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19602#discussion_r192312477
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala ---
    @@ -207,65 +271,68 @@ class HiveClientSuite(version: String)
       }
     
       private def testMetastorePartitionFiltering(
    -      filterString: String,
    +      table: String,
    +      filterExpr: Expression,
           expectedDs: Seq[Int],
           expectedH: Seq[Int],
           expectedChunks: Seq[String]): Unit = {
         testMetastorePartitionFiltering(
    -      filterString,
    -      (expectedDs, expectedH, expectedChunks) :: Nil,
    +      table,
    +      filterExpr,
    +      Map("ds" -> expectedDs, "h" -> expectedH, "chunk" -> expectedChunks) :: Nil,
           identity)
       }
     
       private def testMetastorePartitionFiltering(
    -      filterString: String,
    +      table: String,
    +      filterExpr: Expression,
           expectedDs: Seq[Int],
           expectedH: Seq[Int],
           expectedChunks: Seq[String],
           transform: Expression => Expression): Unit = {
         testMetastorePartitionFiltering(
    -      filterString,
    -      (expectedDs, expectedH, expectedChunks) :: Nil,
    +      table,
    +      filterExpr,
    +      Map("ds" -> expectedDs, "h" -> expectedH, "chunk" -> expectedChunks) :: Nil,
           identity)
       }
     
       private def testMetastorePartitionFiltering(
    -      filterString: String,
    -      expectedPartitionCubes: Seq[(Seq[Int], Seq[Int], Seq[String])]): Unit = {
    -    testMetastorePartitionFiltering(filterString, expectedPartitionCubes, identity)
    +      table: String,
    +      filterExpr: Expression,
    +      expectedPartitionCubes: Seq[Map[String, Seq[Any]]]): Unit = {
    +    testMetastorePartitionFiltering(table, filterExpr, expectedPartitionCubes, identity)
       }
     
       private def testMetastorePartitionFiltering(
    -      filterString: String,
    -      expectedPartitionCubes: Seq[(Seq[Int], Seq[Int], Seq[String])],
    +      table: String,
    +      filterExpr: Expression,
    +      expectedPartitionCubes: Seq[Map[String, Seq[Any]]],
    --- End diff --
    
    With this change, number of partition names in `expectedPartitionCubes`  is not necessarily to be 3. And schema of `expectedPartitionCubes` is like Seq[Map[partition name, partition values]]


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19602: [SPARK-22384][SQL] Refine partition pruning when ...

Posted by jinxing64 <gi...@git.apache.org>.
Github user jinxing64 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19602#discussion_r192550486
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala ---
    @@ -59,38 +61,62 @@ class HiveClientSuite(version: String)
             "h" -> h.toString,
             "chunk" -> chunk
           ), storageFormat)
    -    assert(partitions.size == testPartitionCount)
    +    assert(partitions0.size == testPartitionCount0)
     
         client.createPartitions(
    -      "default", "test", partitions, ignoreIfExists = false)
    +      "default", "test0", partitions0, ignoreIfExists = false)
    +
    +    val partitions1 =
    +      for {
    +        pt <- 0 until 10
    +        chunk <- Seq("aa", "ab", "ba", "bb")
    +      } yield CatalogTablePartition(Map(
    +        "pt" -> pt.toString,
    +        "chunk" -> chunk
    +      ), storageFormat)
    +    assert(partitions1.size == testPartitionCount1)
    +
    +    client.createPartitions(
    +      "default", "test1", partitions1, ignoreIfExists = false)
    +
         client
       }
     
    +  private def pAttr(table: String, name: String): Attribute = {
    +    val partTypes = client.getTable("default", table).partitionSchema.fields
    +        .map(field => (field.name, field.dataType)).toMap
    +    partTypes.get(name) match {
    +      case Some(dt) => AttributeReference(name, dt)()
    +      case None =>
    +        fail(s"Illegal name of partition attribute: $name")
    +    }
    +  }
    +
       override def beforeAll() {
         super.beforeAll()
         client = init(true)
       }
     
       test(s"getPartitionsByFilter returns all partitions when $tryDirectSqlKey=false") {
         val client = init(false)
    -    val filteredPartitions = client.getPartitionsByFilter(client.getTable("default", "test"),
    -      Seq(parseExpression("ds=20170101")))
    +    val filteredPartitions = client.getPartitionsByFilter(client.getTable("default", "test0"),
    +      Seq(EqualTo(pAttr("test0", "ds"), Literal(20170101, IntegerType))))
    --- End diff --
    
    Thanks, with `org.apache.spark.sql.catalyst.dsl.expressions._`, code can be much cleaner.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19602: [SPARK-22384][SQL] Refine partition pruning when ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19602#discussion_r191202828
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
    @@ -642,11 +641,11 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
               col.getType.startsWith(serdeConstants.CHAR_TYPE_NAME))
             .map(col => col.getName).toSet
     
    -      def unapply(attr: Attribute): Option[String] = {
    +      def unapply(attr: Attribute): Option[Attribute] = {
    --- End diff --
    
    why this change?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19602: [SPARK-22384][SQL] Refine partition pruning when ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19602#discussion_r192319393
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala ---
    @@ -59,38 +61,62 @@ class HiveClientSuite(version: String)
             "h" -> h.toString,
             "chunk" -> chunk
           ), storageFormat)
    -    assert(partitions.size == testPartitionCount)
    +    assert(partitions0.size == testPartitionCount0)
     
         client.createPartitions(
    -      "default", "test", partitions, ignoreIfExists = false)
    +      "default", "test0", partitions0, ignoreIfExists = false)
    +
    +    val partitions1 =
    +      for {
    +        pt <- 0 until 10
    +        chunk <- Seq("aa", "ab", "ba", "bb")
    +      } yield CatalogTablePartition(Map(
    +        "pt" -> pt.toString,
    +        "chunk" -> chunk
    +      ), storageFormat)
    +    assert(partitions1.size == testPartitionCount1)
    +
    +    client.createPartitions(
    +      "default", "test1", partitions1, ignoreIfExists = false)
    +
         client
       }
     
    +  private def pAttr(table: String, name: String): Attribute = {
    +    val partTypes = client.getTable("default", table).partitionSchema.fields
    +        .map(field => (field.name, field.dataType)).toMap
    +    partTypes.get(name) match {
    +      case Some(dt) => AttributeReference(name, dt)()
    +      case None =>
    +        fail(s"Illegal name of partition attribute: $name")
    +    }
    +  }
    +
       override def beforeAll() {
         super.beforeAll()
         client = init(true)
       }
     
       test(s"getPartitionsByFilter returns all partitions when $tryDirectSqlKey=false") {
         val client = init(false)
    -    val filteredPartitions = client.getPartitionsByFilter(client.getTable("default", "test"),
    -      Seq(parseExpression("ds=20170101")))
    +    val filteredPartitions = client.getPartitionsByFilter(client.getTable("default", "test0"),
    +      Seq(EqualTo(pAttr("test0", "ds"), Literal(20170101, IntegerType))))
    --- End diff --
    
    we can import `org.apache.spark.sql.catalyst.dsl.expressions._` to simplify expression creation


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19602: [SPARK-22384][SQL] Refine partition pruning when ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19602#discussion_r190312626
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
    @@ -657,18 +656,39 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
     
         val useAdvanced = SQLConf.get.advancedPartitionPredicatePushdownEnabled
     
    +    object ExtractAttribute {
    +      def unapply(expr: Expression): Option[Attribute] = {
    +        expr match {
    +          case attr: Attribute => Some(attr)
    +          case cast @ Cast(child, dt: IntegralType, _) if child.dataType == StringType =>
    --- End diff --
    
    seems we can use `Cast.mayTruncate`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by jinxing64 <gi...@git.apache.org>.
Github user jinxing64 commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    Could we fix this? Sql like above is common in my warehouse.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    LGTM


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19602: [SPARK-22384][SQL] Refine partition pruning when ...

Posted by jinxing64 <gi...@git.apache.org>.
Github user jinxing64 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19602#discussion_r192550679
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala ---
    @@ -207,65 +271,68 @@ class HiveClientSuite(version: String)
       }
     
       private def testMetastorePartitionFiltering(
    -      filterString: String,
    +      table: String,
    +      filterExpr: Expression,
           expectedDs: Seq[Int],
           expectedH: Seq[Int],
           expectedChunks: Seq[String]): Unit = {
         testMetastorePartitionFiltering(
    -      filterString,
    -      (expectedDs, expectedH, expectedChunks) :: Nil,
    +      table,
    +      filterExpr,
    +      Map("ds" -> expectedDs, "h" -> expectedH, "chunk" -> expectedChunks) :: Nil,
           identity)
       }
     
       private def testMetastorePartitionFiltering(
    -      filterString: String,
    +      table: String,
    +      filterExpr: Expression,
           expectedDs: Seq[Int],
           expectedH: Seq[Int],
           expectedChunks: Seq[String],
           transform: Expression => Expression): Unit = {
         testMetastorePartitionFiltering(
    -      filterString,
    -      (expectedDs, expectedH, expectedChunks) :: Nil,
    +      table,
    +      filterExpr,
    +      Map("ds" -> expectedDs, "h" -> expectedH, "chunk" -> expectedChunks) :: Nil,
           identity)
       }
     
       private def testMetastorePartitionFiltering(
    -      filterString: String,
    -      expectedPartitionCubes: Seq[(Seq[Int], Seq[Int], Seq[String])]): Unit = {
    -    testMetastorePartitionFiltering(filterString, expectedPartitionCubes, identity)
    +      table: String,
    +      filterExpr: Expression,
    +      expectedPartitionCubes: Seq[Map[String, Seq[Any]]]): Unit = {
    +    testMetastorePartitionFiltering(table, filterExpr, expectedPartitionCubes, identity)
       }
     
       private def testMetastorePartitionFiltering(
    -      filterString: String,
    -      expectedPartitionCubes: Seq[(Seq[Int], Seq[Int], Seq[String])],
    +      table: String,
    +      filterExpr: Expression,
    +      expectedPartitionCubes: Seq[Map[String, Seq[Any]]],
           transform: Expression => Expression): Unit = {
    -    val filteredPartitions = client.getPartitionsByFilter(client.getTable("default", "test"),
    +    val filteredPartitions = client.getPartitionsByFilter(client.getTable("default", table),
           Seq(
    -        transform(parseExpression(filterString))
    +        transform(filterExpr)
           ))
     
    -    val expectedPartitionCount = expectedPartitionCubes.map {
    -      case (expectedDs, expectedH, expectedChunks) =>
    -        expectedDs.size * expectedH.size * expectedChunks.size
    -    }.sum
    -
    -    val expectedPartitions = expectedPartitionCubes.map {
    -      case (expectedDs, expectedH, expectedChunks) =>
    -        for {
    -          ds <- expectedDs
    -          h <- expectedH
    -          chunk <- expectedChunks
    -        } yield Set(
    -          "ds" -> ds.toString,
    -          "h" -> h.toString,
    -          "chunk" -> chunk
    -        )
    -    }.reduce(_ ++ _)
    +    val expectedPartitionCount = expectedPartitionCubes.map(_.map(_._2.size).product).sum
    +
    +    val expectedPartitions = expectedPartitionCubes.map(getPartitionsFromCube(_)).reduce(_ ++ _)
     
         val actualFilteredPartitionCount = filteredPartitions.size
     
         assert(actualFilteredPartitionCount == expectedPartitionCount,
           s"Expected $expectedPartitionCount partitions but got $actualFilteredPartitionCount")
    -    assert(filteredPartitions.map(_.spec.toSet).toSet == expectedPartitions.toSet)
    +    assert(filteredPartitions.map(_.spec).toSet == expectedPartitions.toSet)
    +  }
    +
    +  private def getPartitionsFromCube(cube: Map[String, Seq[Any]]): Seq[Map[String, String]] = {
    +    cube.map {
    +      case (k: String, pts: Seq[Any]) => pts.map(pt => (k, pt.toString))
    +    }.foldLeft(Seq(Seq[(String, String)]()))((seq0, seq1) => {
    --- End diff --
    
    Hmm, it's  a recursion problem. I tried to use loop&mutable states directly, but seems not become easier readable;
    
    In current change, I extract a `PartitionSpec` type and add comment. I think it's better now?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19602: [SPARK-22384][SQL] Refine partition pruning when ...

Posted by jinxing64 <gi...@git.apache.org>.
Github user jinxing64 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19602#discussion_r191078009
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
    @@ -657,18 +656,46 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
     
         val useAdvanced = SQLConf.get.advancedPartitionPredicatePushdownEnabled
     
    +    object ExtractAttribute {
    +      def unapply(expr: Expression): Option[Attribute] = {
    +        expr match {
    +          case attr: Attribute => Some(attr)
    +          case cast @ Cast(child, dt: StringType, _) if child.dataType.isInstanceOf[NumericType] =>
    +            unapply(child)
    +          case cast @ Cast(child, dt: NumericType, _) if child.dataType == StringType =>
    --- End diff --
    
    Are you worrying that Spark supports converting some invalid string to number but Hive doesn't? Yes, in that case, this change is dangerous. I didn't  think out an example, or there maybe such case in the future? Could you give an example?
    
    On the other hand, it comes to me that this change is incorrect when there's precision truncation, e.g. cast("1.234" as int). But on a second thought, all the `ExtractAttribute` thing here happens within the scope of `getPartitionsByFilter` and such truncation would not happen?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    **[Test build #83182 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83182/testReport)** for PR 19602 at commit [`a3a978e`](https://github.com/apache/spark/commit/a3a978eb612cdc5944c05951e5a6eb9457ae6962).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19602: [SPARK-22384][SQL] Refine partition pruning when ...

Posted by jinxing64 <gi...@git.apache.org>.
Github user jinxing64 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19602#discussion_r192311969
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala ---
    @@ -207,65 +271,68 @@ class HiveClientSuite(version: String)
       }
     
       private def testMetastorePartitionFiltering(
    -      filterString: String,
    +      table: String,
    +      filterExpr: Expression,
           expectedDs: Seq[Int],
           expectedH: Seq[Int],
           expectedChunks: Seq[String]): Unit = {
         testMetastorePartitionFiltering(
    -      filterString,
    -      (expectedDs, expectedH, expectedChunks) :: Nil,
    +      table,
    +      filterExpr,
    +      Map("ds" -> expectedDs, "h" -> expectedH, "chunk" -> expectedChunks) :: Nil,
           identity)
       }
     
       private def testMetastorePartitionFiltering(
    -      filterString: String,
    +      table: String,
    +      filterExpr: Expression,
           expectedDs: Seq[Int],
           expectedH: Seq[Int],
           expectedChunks: Seq[String],
           transform: Expression => Expression): Unit = {
         testMetastorePartitionFiltering(
    -      filterString,
    -      (expectedDs, expectedH, expectedChunks) :: Nil,
    +      table,
    +      filterExpr,
    +      Map("ds" -> expectedDs, "h" -> expectedH, "chunk" -> expectedChunks) :: Nil,
           identity)
       }
     
       private def testMetastorePartitionFiltering(
    -      filterString: String,
    -      expectedPartitionCubes: Seq[(Seq[Int], Seq[Int], Seq[String])]): Unit = {
    -    testMetastorePartitionFiltering(filterString, expectedPartitionCubes, identity)
    +      table: String,
    +      filterExpr: Expression,
    +      expectedPartitionCubes: Seq[Map[String, Seq[Any]]]): Unit = {
    +    testMetastorePartitionFiltering(table, filterExpr, expectedPartitionCubes, identity)
       }
     
       private def testMetastorePartitionFiltering(
    -      filterString: String,
    -      expectedPartitionCubes: Seq[(Seq[Int], Seq[Int], Seq[String])],
    +      table: String,
    +      filterExpr: Expression,
    +      expectedPartitionCubes: Seq[Map[String, Seq[Any]]],
           transform: Expression => Expression): Unit = {
    -    val filteredPartitions = client.getPartitionsByFilter(client.getTable("default", "test"),
    +    val filteredPartitions = client.getPartitionsByFilter(client.getTable("default", table),
           Seq(
    -        transform(parseExpression(filterString))
    +        transform(filterExpr)
           ))
     
    -    val expectedPartitionCount = expectedPartitionCubes.map {
    -      case (expectedDs, expectedH, expectedChunks) =>
    -        expectedDs.size * expectedH.size * expectedChunks.size
    -    }.sum
    -
    -    val expectedPartitions = expectedPartitionCubes.map {
    -      case (expectedDs, expectedH, expectedChunks) =>
    -        for {
    -          ds <- expectedDs
    -          h <- expectedH
    -          chunk <- expectedChunks
    -        } yield Set(
    -          "ds" -> ds.toString,
    -          "h" -> h.toString,
    -          "chunk" -> chunk
    -        )
    -    }.reduce(_ ++ _)
    +    val expectedPartitionCount = expectedPartitionCubes.map(_.map(_._2.size).product).sum
    +
    +    val expectedPartitions = expectedPartitionCubes.map(getPartitionsFromCube(_)).reduce(_ ++ _)
     
         val actualFilteredPartitionCount = filteredPartitions.size
     
         assert(actualFilteredPartitionCount == expectedPartitionCount,
           s"Expected $expectedPartitionCount partitions but got $actualFilteredPartitionCount")
    -    assert(filteredPartitions.map(_.spec.toSet).toSet == expectedPartitions.toSet)
    +    assert(filteredPartitions.map(_.spec).toSet == expectedPartitions.toSet)
    +  }
    +
    --- End diff --
    
    Parse the cube to be a list of partitions, the result is like `Seq[Map[partition name, partition value]]`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by jinxing64 <gi...@git.apache.org>.
Github user jinxing64 commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    Thanks for merging !


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19602: [SPARK-22384][SQL] Refine partition pruning when ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19602#discussion_r192576685
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
    @@ -657,17 +656,29 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
     
         val useAdvanced = SQLConf.get.advancedPartitionPredicatePushdownEnabled
     
    +    object ExtractAttribute {
    +      def unapply(expr: Expression): Option[Attribute] = {
    +        expr match {
    +          case attr: Attribute => Some(attr)
    +          case Cast(child, dt, _) if !Cast.mayTruncate(child.dataType, dt) => unapply(child)
    +          case _ => None
    +        }
    +      }
    +    }
    +
         def convert(expr: Expression): Option[String] = expr match {
           case In(NonVarcharAttribute(name), ExtractableLiterals(values)) if useAdvanced =>
    --- End diff --
    
    shall we apply the cast handling here?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    **[Test build #91489 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91489/testReport)** for PR 19602 at commit [`8d709fd`](https://github.com/apache/spark/commit/8d709fdf63d68bf5bd133212f081619bd40ba733).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    **[Test build #83490 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83490/testReport)** for PR 19602 at commit [`2334543`](https://github.com/apache/spark/commit/233454371aa23a0919d229f7679f18900ab94f1c).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3643/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by jinxing64 <gi...@git.apache.org>.
Github user jinxing64 commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    @cloud-fan
    Thanks a lot for looking into this.
    I updated the change and generalized `ExtractAttribute`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3815/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19602: [SPARK-22384][SQL] Refine partition pruning when ...

Posted by jinxing64 <gi...@git.apache.org>.
Github user jinxing64 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19602#discussion_r191204885
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
    @@ -657,18 +656,41 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
     
         val useAdvanced = SQLConf.get.advancedPartitionPredicatePushdownEnabled
     
    +    object ExtractAttribute {
    +      def unapply(expr: Expression): Option[Attribute] = {
    +        expr match {
    +          case attr: Attribute => Some(attr)
    +          case cast @ Cast(child, dt, _) if !Cast.mayTruncate(child.dataType, dt) => unapply(child)
    +          case _ => None
    +        }
    +      }
    +    }
    +
    +    def decoratedValue(attr: Attribute, value: String): String = {
    --- End diff --
    
    e.g. 
    dt="20180101"  if dt is of NumericType, we remove the quote and it changes to be  dt=20180101
    dt=201880101 if dt is of StringType, we add the quote and it changes to be dt="20180101"



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3550/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91225/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91103/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    **[Test build #91378 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91378/testReport)** for PR 19602 at commit [`8febfa9`](https://github.com/apache/spark/commit/8febfa91d868f2b71e6840ca427cde0a07424028).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by jinxing64 <gi...@git.apache.org>.
Github user jinxing64 commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    @gatorsmile
    Thanks a lot for review this pr :)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by jinxing64 <gi...@git.apache.org>.
Github user jinxing64 commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    @cloud-fan 
    I removed some cases in `ExtractAttribute`. But to make it clear, do you mean that in below scenario, we don't prune partitions?
    ```
    create table test(id int, string) partitioned by (dt string)
    select count(1) from test where dt=20180101  -- HiveShim will get EqualTo(Cast(dt as int), 20180101) and Cast.mayTruncate will return true for this.
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    **[Test build #91426 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91426/testReport)** for PR 19602 at commit [`e4c6e1f`](https://github.com/apache/spark/commit/e4c6e1ff713a7033b0a60dabaca5071b480d7600).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3750/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91426/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    **[Test build #83182 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83182/testReport)** for PR 19602 at commit [`a3a978e`](https://github.com/apache/spark/commit/a3a978eb612cdc5944c05951e5a6eb9457ae6962).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19602: [SPARK-22384][SQL] Refine partition pruning when ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19602#discussion_r192855659
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientVersions.scala ---
    @@ -23,5 +23,5 @@ import org.apache.spark.SparkFunSuite
     
     private[client] trait HiveClientVersions {
       protected val versions =
    -    IndexedSeq("0.12", "0.13", "0.14", "1.0", "1.1", "1.2", "2.0", "2.1", "2.2", "2.3")
    +    IndexedSeq("1.2")
    --- End diff --
    
    why change this?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83181/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    **[Test build #91103 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91103/testReport)** for PR 19602 at commit [`76676c1`](https://github.com/apache/spark/commit/76676c1982adc9a73c3c5c41c6ddaf50332d4240).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91376/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3775/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19602: [SPARK-22384][SQL] Refine partition pruning when ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19602#discussion_r192286303
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
    @@ -657,18 +656,30 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
     
         val useAdvanced = SQLConf.get.advancedPartitionPredicatePushdownEnabled
     
    +    object ExtractAttribute {
    +      def unapply(expr: Expression): Option[Attribute] = {
    +        expr match {
    +          case attr: Attribute => Some(attr)
    +          case cast @ Cast(child, dt, _) if !Cast.mayTruncate(child.dataType, dt) => unapply(child)
    --- End diff --
    
    nit: `cast @ ` is not needed here.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19602: [SPARK-22384][SQL] Refine partition pruning when ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19602#discussion_r192319924
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala ---
    @@ -207,65 +271,68 @@ class HiveClientSuite(version: String)
       }
     
       private def testMetastorePartitionFiltering(
    -      filterString: String,
    +      table: String,
    +      filterExpr: Expression,
           expectedDs: Seq[Int],
           expectedH: Seq[Int],
           expectedChunks: Seq[String]): Unit = {
         testMetastorePartitionFiltering(
    -      filterString,
    -      (expectedDs, expectedH, expectedChunks) :: Nil,
    +      table,
    +      filterExpr,
    +      Map("ds" -> expectedDs, "h" -> expectedH, "chunk" -> expectedChunks) :: Nil,
           identity)
       }
     
       private def testMetastorePartitionFiltering(
    -      filterString: String,
    +      table: String,
    +      filterExpr: Expression,
           expectedDs: Seq[Int],
           expectedH: Seq[Int],
           expectedChunks: Seq[String],
           transform: Expression => Expression): Unit = {
         testMetastorePartitionFiltering(
    -      filterString,
    -      (expectedDs, expectedH, expectedChunks) :: Nil,
    +      table,
    +      filterExpr,
    +      Map("ds" -> expectedDs, "h" -> expectedH, "chunk" -> expectedChunks) :: Nil,
           identity)
       }
     
       private def testMetastorePartitionFiltering(
    -      filterString: String,
    -      expectedPartitionCubes: Seq[(Seq[Int], Seq[Int], Seq[String])]): Unit = {
    -    testMetastorePartitionFiltering(filterString, expectedPartitionCubes, identity)
    +      table: String,
    +      filterExpr: Expression,
    +      expectedPartitionCubes: Seq[Map[String, Seq[Any]]]): Unit = {
    +    testMetastorePartitionFiltering(table, filterExpr, expectedPartitionCubes, identity)
       }
     
       private def testMetastorePartitionFiltering(
    -      filterString: String,
    -      expectedPartitionCubes: Seq[(Seq[Int], Seq[Int], Seq[String])],
    +      table: String,
    +      filterExpr: Expression,
    +      expectedPartitionCubes: Seq[Map[String, Seq[Any]]],
           transform: Expression => Expression): Unit = {
    -    val filteredPartitions = client.getPartitionsByFilter(client.getTable("default", "test"),
    +    val filteredPartitions = client.getPartitionsByFilter(client.getTable("default", table),
           Seq(
    -        transform(parseExpression(filterString))
    +        transform(filterExpr)
           ))
     
    -    val expectedPartitionCount = expectedPartitionCubes.map {
    -      case (expectedDs, expectedH, expectedChunks) =>
    -        expectedDs.size * expectedH.size * expectedChunks.size
    -    }.sum
    -
    -    val expectedPartitions = expectedPartitionCubes.map {
    -      case (expectedDs, expectedH, expectedChunks) =>
    -        for {
    -          ds <- expectedDs
    -          h <- expectedH
    -          chunk <- expectedChunks
    -        } yield Set(
    -          "ds" -> ds.toString,
    -          "h" -> h.toString,
    -          "chunk" -> chunk
    -        )
    -    }.reduce(_ ++ _)
    +    val expectedPartitionCount = expectedPartitionCubes.map(_.map(_._2.size).product).sum
    +
    +    val expectedPartitions = expectedPartitionCubes.map(getPartitionsFromCube(_)).reduce(_ ++ _)
     
         val actualFilteredPartitionCount = filteredPartitions.size
     
         assert(actualFilteredPartitionCount == expectedPartitionCount,
           s"Expected $expectedPartitionCount partitions but got $actualFilteredPartitionCount")
    -    assert(filteredPartitions.map(_.spec.toSet).toSet == expectedPartitions.toSet)
    +    assert(filteredPartitions.map(_.spec).toSet == expectedPartitions.toSet)
    +  }
    +
    +  private def getPartitionsFromCube(cube: Map[String, Seq[Any]]): Seq[Map[String, String]] = {
    +    cube.map {
    +      case (k: String, pts: Seq[Any]) => pts.map(pt => (k, pt.toString))
    +    }.foldLeft(Seq(Seq[(String, String)]()))((seq0, seq1) => {
    --- End diff --
    
    this is hard to read, please use loop and mutable states directly.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    **[Test build #83181 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83181/testReport)** for PR 19602 at commit [`26c7a7b`](https://github.com/apache/spark/commit/26c7a7be273265d8b25ed3fdf4d6fd3b55f225d8).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19602: [SPARK-22384][SQL] Refine partition pruning when ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19602#discussion_r191202379
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
    @@ -657,18 +656,41 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
     
         val useAdvanced = SQLConf.get.advancedPartitionPredicatePushdownEnabled
     
    +    object ExtractAttribute {
    +      def unapply(expr: Expression): Option[Attribute] = {
    +        expr match {
    +          case attr: Attribute => Some(attr)
    +          case cast @ Cast(child, dt, _) if !Cast.mayTruncate(child.dataType, dt) => unapply(child)
    +          case _ => None
    +        }
    +      }
    +    }
    +
    +    def decoratedValue(attr: Attribute, value: String): String = {
    --- End diff --
    
    what does it do? can you add some comments?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3794/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    **[Test build #91376 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91376/testReport)** for PR 19602 at commit [`1504c66`](https://github.com/apache/spark/commit/1504c663d14635003007863d3b5d387c0100f70d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    **[Test build #91227 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91227/testReport)** for PR 19602 at commit [`3f226df`](https://github.com/apache/spark/commit/3f226df949f6c2791aa3dc8572e660ba9dd1d234).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19602: [SPARK-22384][SQL] Refine partition pruning when attribu...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19602
  
    **[Test build #91489 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91489/testReport)** for PR 19602 at commit [`8d709fd`](https://github.com/apache/spark/commit/8d709fdf63d68bf5bd133212f081619bd40ba733).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19602: [SPARK-22384][SQL] Refine partition pruning when ...

Posted by jinxing64 <gi...@git.apache.org>.
Github user jinxing64 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19602#discussion_r147583510
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala ---
    @@ -53,7 +52,7 @@ class HiveClientSuite(version: String)
           for {
             ds <- 20170101 to 20170103
             h <- 0 to 23
    -        chunk <- Seq("aa", "ab", "ba", "bb")
    +        chunk <- Seq("11", "12", "21", "22")
    --- End diff --
    
    Thanks a lot for review :)
    I did two things:
    1. Pass resolved `Expression` to generate filter -- because in current change, I need data type of the attribute;
    2. "aa" -> "11", "ab" -> "12", "ba" -> "21", "bb" -> "22" -- because I need to pass an integer to set up the filter(chunk=11) and do the cast then.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org