You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/11/18 08:57:37 UTC

[GitHub] [spark] wangyum opened a new pull request #30408: [SPARK-33477][SQL] Hive Metastore should support filter by date type

wangyum opened a new pull request #30408:
URL: https://github.com/apache/spark/pull/30408


   ### What changes were proposed in this pull request?
   
   Hive Metastore supports strings and integral types in filters. It could also support dates. Please see [HIVE-5679](https://github.com/apache/hive/commit/5106bf1c8671740099fca8e1a7d4b37afe97137f) for more details.
   
   This pr add support it.
   
   ### Why are the changes needed?
   
   Improve query performance.
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   
   ### How was this patch tested?
   
   Unit test.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30408: [SPARK-33477][SQL] Hive Metastore should support filter by date type

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30408:
URL: https://github.com/apache/spark/pull/30408#issuecomment-729607360


   **[Test build #131274 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131274/testReport)** for PR 30408 at commit [`ba2f553`](https://github.com/apache/spark/commit/ba2f5531601ff169ff0f9fab9bb60627f65082bf).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30408: [SPARK-33477][SQL] Hive Metastore should support filter by date type

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30408:
URL: https://github.com/apache/spark/pull/30408#issuecomment-732702609


   **[Test build #131624 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131624/testReport)** for PR 30408 at commit [`752eb8d`](https://github.com/apache/spark/commit/752eb8dab286aa78925851bb2ebd0b0ce816def6).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30408: [SPARK-33477][SQL] Hive Metastore should support filter by date type

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30408:
URL: https://github.com/apache/spark/pull/30408#issuecomment-729641265


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35878/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gatorsmile commented on pull request #30408: [SPARK-33477][SQL] Hive Metastore support filter by date type

Posted by GitBox <gi...@apache.org>.
gatorsmile commented on pull request #30408:
URL: https://github.com/apache/spark/pull/30408#issuecomment-753561128


   @wangyum Could you add more test cases to check the NULL handling cases? For example, 
   - Include NULL values in the data set
   - Include NULL values in the predicates
   - Include null-safe equals 
   
   Please check https://spark.apache.org/docs/3.0.1/sql-ref-null-semantics.html#comp-operators 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30408: [SPARK-33477][SQL] Hive Metastore support filter by date type

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #30408:
URL: https://github.com/apache/spark/pull/30408#discussion_r530104000



##########
File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala
##########
@@ -711,7 +735,8 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
         val resolver = SQLConf.get.resolver
         if (varcharKeys.exists(c => resolver(c, attr.name))) {
           None
-        } else if (attr.dataType.isInstanceOf[IntegralType] || attr.dataType == StringType) {
+        } else if (attr.dataType.isInstanceOf[IntegralType] || attr.dataType == StringType ||
+          attr.dataType == DateType) {

Review comment:
       indentation?

##########
File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala
##########
@@ -748,6 +773,10 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
         convert(And(GreaterThanOrEqual(child, Literal(sortedValues.head, dataType)),
           LessThanOrEqual(child, Literal(sortedValues.last, dataType))))
 
+      case InSet(child @ ExtractAttribute(SupportedAttribute(name)), ExtractableDateValues(values))
+        if useAdvanced && child.dataType == DateType =>

Review comment:
       indentation?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30408: [SPARK-33477][SQL] Hive Metastore support filter by date type

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30408:
URL: https://github.com/apache/spark/pull/30408#issuecomment-733414030






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30408: [SPARK-33477][SQL] Hive Metastore should support filter by date type

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30408:
URL: https://github.com/apache/spark/pull/30408#issuecomment-729641286






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #30408: [SPARK-33477][SQL] Hive Metastore should support filter by date type

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #30408:
URL: https://github.com/apache/spark/pull/30408#discussion_r528772962



##########
File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/FiltersSuite.scala
##########
@@ -63,6 +64,24 @@ class FiltersSuite extends SparkFunSuite with Logging with PlanTest {
     (Literal(1) === a("intcol", IntegerType)) :: (Literal("a") === a("strcol", IntegerType)) :: Nil,
     "1 = intcol and \"a\" = strcol")
 
+  filterTest("date filter",
+    (a("datecol", DateType) === Literal(Date.valueOf("2019-01-01"))) :: Nil,
+    "datecol = 2019-01-01")
+
+  filterTest("date filter with IN predicate",
+    (a("datecol", DateType) in
+      (Literal(Date.valueOf("2019-01-01")), Literal(Date.valueOf("2019-01-07")))) :: Nil,
+    "(datecol = 2019-01-01 or datecol = 2019-01-07)")
+
+  filterTest("date and string filter",
+    (Literal(Date.valueOf("2019-01-01")) === a("datecol", DateType)) ::
+      (Literal("a") === a("strcol", IntegerType)) :: Nil,
+    "2019-01-01 = datecol and \"a\" = strcol")
+
+  filterTest("date filter with null",
+    (a("datecol", DateType) ===  Literal(null)) :: Nil,

Review comment:
       not related to this PR, but we can pushdown `col is null` predicate to hive for this case.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #30408: [SPARK-33477][SQL] Hive Metastore should support filter by date type

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #30408:
URL: https://github.com/apache/spark/pull/30408#issuecomment-732705896


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30408: [SPARK-33477][SQL] Hive Metastore should support filter by date type

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30408:
URL: https://github.com/apache/spark/pull/30408#issuecomment-730520044






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30408: [SPARK-33477][SQL] Hive Metastore support filter by date type

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30408:
URL: https://github.com/apache/spark/pull/30408#issuecomment-733506446


   **[Test build #131740 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131740/testReport)** for PR 30408 at commit [`29c489a`](https://github.com/apache/spark/commit/29c489ad5f753aaa3551489655073c9f6fc7b0c6).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on pull request #30408: [SPARK-33477][SQL] Hive Metastore should support filter by date type

Posted by GitBox <gi...@apache.org>.
maropu commented on pull request #30408:
URL: https://github.com/apache/spark/pull/30408#issuecomment-732788459


   @wangyum How about asking it in the spark-dev thread so that Shane could notice it quickly?http://apache-spark-developers-list.1001551.n3.nabble.com/jenkins-downtime-tomorrow-evening-weekend-tt30405.html 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30408: [SPARK-33477][SQL] Hive Metastore should support filter by date type

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30408:
URL: https://github.com/apache/spark/pull/30408#issuecomment-733230958






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30408: [SPARK-33477][SQL] Hive Metastore should support filter by date type

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30408:
URL: https://github.com/apache/spark/pull/30408#issuecomment-729627586


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35878/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30408: [SPARK-33477][SQL] Hive Metastore should support filter by date type

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30408:
URL: https://github.com/apache/spark/pull/30408#issuecomment-733148701


   **[Test build #131689 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131689/testReport)** for PR 30408 at commit [`752eb8d`](https://github.com/apache/spark/commit/752eb8dab286aa78925851bb2ebd0b0ce816def6).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30408: [SPARK-33477][SQL] Hive Metastore should support filter by date type

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30408:
URL: https://github.com/apache/spark/pull/30408#issuecomment-732760626






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30408: [SPARK-33477][SQL] Hive Metastore should support filter by date type

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30408:
URL: https://github.com/apache/spark/pull/30408#issuecomment-730519143


   **[Test build #131357 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131357/testReport)** for PR 30408 at commit [`ce5f0d1`](https://github.com/apache/spark/commit/ce5f0d147cf09b8fcfbf4e170c0a10072be54941).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds the following public classes _(experimental)_:
     * `class ExecutorSource(`
     * `  implicit class MetadataColumnsHelper(metadata: Array[MetadataColumn]) `


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30408: [SPARK-33477][SQL] Hive Metastore support filter by date type

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30408:
URL: https://github.com/apache/spark/pull/30408#issuecomment-733510488






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30408: [SPARK-33477][SQL] Hive Metastore should support filter by date type

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30408:
URL: https://github.com/apache/spark/pull/30408#issuecomment-732928683






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30408: [SPARK-33477][SQL] Hive Metastore support filter by date type

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #30408:
URL: https://github.com/apache/spark/pull/30408#discussion_r530106219



##########
File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/FiltersSuite.scala
##########
@@ -139,6 +163,11 @@ class FiltersSuite extends SparkFunSuite with Logging with PlanTest {
         InSet(a("doublecol", DoubleType),
           Range(1, 20).map(s => Literal(s.toDouble).eval(EmptyRow)).toSet),
         "")
+
+      checkConverted(
+        InSet(a("datecol", DateType),
+          Range(1, 20).map(d => Literal(d, DateType).eval(EmptyRow)).toSet),
+        "(datecol >= 1970-01-02 and datecol <= 1970-01-20)")

Review comment:
       Oh, is this not `datecol >= 1970-01-01 and ...`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #30408: [SPARK-33477][SQL] Hive Metastore should support filter by date type

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #30408:
URL: https://github.com/apache/spark/pull/30408#issuecomment-730365065


   @wangyum can you resolve the conflicts? thanks!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30408: [SPARK-33477][SQL] Hive Metastore support filter by date type

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30408:
URL: https://github.com/apache/spark/pull/30408#issuecomment-733666327






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on pull request #30408: [SPARK-33477][SQL] Hive Metastore support filter by date type

Posted by GitBox <gi...@apache.org>.
wangyum commented on pull request #30408:
URL: https://github.com/apache/spark/pull/30408#issuecomment-753561772


   > @wangyum Could you add more test cases to check the NULL handling cases? For example,
   > 
   > * Include NULL values in the data set
   > * Include NULL values in the predicates
   > * Include null-safe equals
   > 
   > Please check https://spark.apache.org/docs/3.0.1/sql-ref-null-semantics.html#comp-operators
   
   OK


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on a change in pull request #30408: [SPARK-33477][SQL] Hive Metastore should support filter by date type

Posted by GitBox <gi...@apache.org>.
wangyum commented on a change in pull request #30408:
URL: https://github.com/apache/spark/pull/30408#discussion_r526091954



##########
File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala
##########
@@ -701,6 +710,21 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
       }
     }
 
+    object ExtractableDateValues {

Review comment:
       This is because for `InSet` predicate, the `hset`'s value is the int type, we need to use `dateFormatter` to format it, this is the test:
   https://github.com/apache/spark/blob/ba2f5531601ff169ff0f9fab9bb60627f65082bf/sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HivePartitionFilteringSuite.scala#L339-L342




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30408: [SPARK-33477][SQL] Hive Metastore should support filter by date type

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30408:
URL: https://github.com/apache/spark/pull/30408#issuecomment-733148701


   **[Test build #131689 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131689/testReport)** for PR 30408 at commit [`752eb8d`](https://github.com/apache/spark/commit/752eb8dab286aa78925851bb2ebd0b0ce816def6).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on pull request #30408: [SPARK-33477][SQL] Hive Metastore support filter by date type

Posted by GitBox <gi...@apache.org>.
wangyum commented on pull request #30408:
URL: https://github.com/apache/spark/pull/30408#issuecomment-733515064


   retest this please.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on pull request #30408: [SPARK-33477][SQL] Hive Metastore should support filter by date type

Posted by GitBox <gi...@apache.org>.
wangyum commented on pull request #30408:
URL: https://github.com/apache/spark/pull/30408#issuecomment-730701382


   It's date comparison: https://github.com/apache/hive/blob/rel/release-2.3.7/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java#L1141-L1148


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30408: [SPARK-33477][SQL] Hive Metastore support filter by date type

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30408:
URL: https://github.com/apache/spark/pull/30408#issuecomment-733414030






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30408: [SPARK-33477][SQL] Hive Metastore should support filter by date type

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30408:
URL: https://github.com/apache/spark/pull/30408#issuecomment-730492116


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35961/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30408: [SPARK-33477][SQL] Hive Metastore should support filter by date type

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30408:
URL: https://github.com/apache/spark/pull/30408#issuecomment-730433287


   **[Test build #131357 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131357/testReport)** for PR 30408 at commit [`ce5f0d1`](https://github.com/apache/spark/commit/ce5f0d147cf09b8fcfbf4e170c0a10072be54941).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30408: [SPARK-33477][SQL] Hive Metastore support filter by date type

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #30408:
URL: https://github.com/apache/spark/pull/30408#discussion_r530110592



##########
File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/FiltersSuite.scala
##########
@@ -139,6 +163,11 @@ class FiltersSuite extends SparkFunSuite with Logging with PlanTest {
         InSet(a("doublecol", DoubleType),
           Range(1, 20).map(s => Literal(s.toDouble).eval(EmptyRow)).toSet),
         "")
+
+      checkConverted(
+        InSet(a("datecol", DateType),
+          Range(1, 20).map(d => Literal(d, DateType).eval(EmptyRow)).toSet),
+        "(datecol >= 1970-01-02 and datecol <= 1970-01-20)")

Review comment:
       Never mind. I found the reason.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30408: [SPARK-33477][SQL] Hive Metastore support filter by date type

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30408:
URL: https://github.com/apache/spark/pull/30408#issuecomment-733516303


   **[Test build #131751 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131751/testReport)** for PR 30408 at commit [`29c489a`](https://github.com/apache/spark/commit/29c489ad5f753aaa3551489655073c9f6fc7b0c6).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30408: [SPARK-33477][SQL] Hive Metastore should support filter by date type

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30408:
URL: https://github.com/apache/spark/pull/30408#issuecomment-733216545


   **[Test build #131689 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131689/testReport)** for PR 30408 at commit [`752eb8d`](https://github.com/apache/spark/commit/752eb8dab286aa78925851bb2ebd0b0ce816def6).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #30408: [SPARK-33477][SQL] Hive Metastore should support filter by date type

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #30408:
URL: https://github.com/apache/spark/pull/30408#discussion_r526064515



##########
File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala
##########
@@ -701,6 +710,21 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
       }
     }
 
+    object ExtractableDateValues {

Review comment:
       why not update `ExtractableValues` to support date?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #30408: [SPARK-33477][SQL] Hive Metastore should support filter by date type

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #30408:
URL: https://github.com/apache/spark/pull/30408#discussion_r526102198



##########
File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/FiltersSuite.scala
##########
@@ -63,6 +64,24 @@ class FiltersSuite extends SparkFunSuite with Logging with PlanTest {
     (Literal(1) === a("intcol", IntegerType)) :: (Literal("a") === a("strcol", IntegerType)) :: Nil,
     "1 = intcol and \"a\" = strcol")
 
+  filterTest("date filter",

Review comment:
       do we run these test with different hive versions?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30408: [SPARK-33477][SQL] Hive Metastore support filter by date type

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30408:
URL: https://github.com/apache/spark/pull/30408#issuecomment-733579150






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30408: [SPARK-33477][SQL] Hive Metastore support filter by date type

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30408:
URL: https://github.com/apache/spark/pull/30408#issuecomment-733516303


   **[Test build #131751 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131751/testReport)** for PR 30408 at commit [`29c489a`](https://github.com/apache/spark/commit/29c489ad5f753aaa3551489655073c9f6fc7b0c6).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on pull request #30408: [SPARK-33477][SQL] Hive Metastore should support filter by date type

Posted by GitBox <gi...@apache.org>.
wangyum commented on pull request #30408:
URL: https://github.com/apache/spark/pull/30408#issuecomment-733146199


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on a change in pull request #30408: [SPARK-33477][SQL] Hive Metastore should support filter by date type

Posted by GitBox <gi...@apache.org>.
wangyum commented on a change in pull request #30408:
URL: https://github.com/apache/spark/pull/30408#discussion_r526092082



##########
File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala
##########
@@ -1265,11 +1265,13 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
       defaultTimeZoneId: String): Seq[CatalogTablePartition] = withClient {
     val rawTable = getRawTable(db, table)
     val catalogTable = restoreTableMetadata(rawTable)
+    val timeZoneId = CaseInsensitiveMap(catalogTable.storage.properties).getOrElse(

Review comment:
       `prunePartitionsByFilter` also use it:
   https://github.com/apache/spark/blob/0032d85153e34b9ac69598b7dff530094ed0f640/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogUtils.scala#L157
   
   https://github.com/apache/spark/blob/dfa6fb46f4238792bff6a0201da201be1b42620e/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala#L152-L163




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30408: [SPARK-33477][SQL] Hive Metastore should support filter by date type

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30408:
URL: https://github.com/apache/spark/pull/30408#issuecomment-732760626






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30408: [SPARK-33477][SQL] Hive Metastore should support filter by date type

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30408:
URL: https://github.com/apache/spark/pull/30408#issuecomment-732725416


   **[Test build #131631 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131631/testReport)** for PR 30408 at commit [`752eb8d`](https://github.com/apache/spark/commit/752eb8dab286aa78925851bb2ebd0b0ce816def6).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30408: [SPARK-33477][SQL] Hive Metastore should support filter by date type

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30408:
URL: https://github.com/apache/spark/pull/30408#issuecomment-732725416


   **[Test build #131631 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131631/testReport)** for PR 30408 at commit [`752eb8d`](https://github.com/apache/spark/commit/752eb8dab286aa78925851bb2ebd0b0ce816def6).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30408: [SPARK-33477][SQL] Hive Metastore should support filter by date type

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30408:
URL: https://github.com/apache/spark/pull/30408#issuecomment-729608201






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30408: [SPARK-33477][SQL] Hive Metastore should support filter by date type

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30408:
URL: https://github.com/apache/spark/pull/30408#issuecomment-732702917






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30408: [SPARK-33477][SQL] Hive Metastore should support filter by date type

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30408:
URL: https://github.com/apache/spark/pull/30408#issuecomment-729641286






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #30408: [SPARK-33477][SQL] Hive Metastore should support filter by date type

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #30408:
URL: https://github.com/apache/spark/pull/30408#discussion_r529140302



##########
File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala
##########
@@ -700,6 +709,21 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
       }
     }
 
+    object ExtractableDateValues {
+      private lazy val valueToLiteralString: PartialFunction[Any, String] = {
+        case value: Int => dateFormatter.format(value)
+      }
+
+      def unapply(values: Set[Any]): Option[Seq[String]] = {
+        val extractables = values.toSeq.map(valueToLiteralString.lift)
+        if (extractables.nonEmpty && extractables.forall(_.isDefined)) {

Review comment:
       Why do we need `forall` here? `InSet` can have mixed values: int and other types?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #30408: [SPARK-33477][SQL] Hive Metastore should support filter by date type

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #30408:
URL: https://github.com/apache/spark/pull/30408#issuecomment-730439664


   last question about correctness: Does hive execute the partition predicate as date comparison or string comparison? The later can be problematic.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30408: [SPARK-33477][SQL] Hive Metastore should support filter by date type

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30408:
URL: https://github.com/apache/spark/pull/30408#issuecomment-730520044






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30408: [SPARK-33477][SQL] Hive Metastore should support filter by date type

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30408:
URL: https://github.com/apache/spark/pull/30408#issuecomment-732991026






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30408: [SPARK-33477][SQL] Hive Metastore should support filter by date type

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30408:
URL: https://github.com/apache/spark/pull/30408#issuecomment-732702917






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30408: [SPARK-33477][SQL] Hive Metastore support filter by date type

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #30408:
URL: https://github.com/apache/spark/pull/30408#discussion_r530106219



##########
File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/FiltersSuite.scala
##########
@@ -139,6 +163,11 @@ class FiltersSuite extends SparkFunSuite with Logging with PlanTest {
         InSet(a("doublecol", DoubleType),
           Range(1, 20).map(s => Literal(s.toDouble).eval(EmptyRow)).toSet),
         "")
+
+      checkConverted(
+        InSet(a("datecol", DateType),
+          Range(1, 20).map(d => Literal(d, DateType).eval(EmptyRow)).toSet),
+        "(datecol >= 1970-01-02 and datecol <= 1970-01-20)")

Review comment:
       Oh, is this not `datecol >= 1970-01-20 and ...`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30408: [SPARK-33477][SQL] Hive Metastore should support filter by date type

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30408:
URL: https://github.com/apache/spark/pull/30408#issuecomment-732674946


   **[Test build #131624 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131624/testReport)** for PR 30408 at commit [`752eb8d`](https://github.com/apache/spark/commit/752eb8dab286aa78925851bb2ebd0b0ce816def6).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30408: [SPARK-33477][SQL] Hive Metastore should support filter by date type

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30408:
URL: https://github.com/apache/spark/pull/30408#issuecomment-729608201






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on a change in pull request #30408: [SPARK-33477][SQL] Hive Metastore should support filter by date type

Posted by GitBox <gi...@apache.org>.
wangyum commented on a change in pull request #30408:
URL: https://github.com/apache/spark/pull/30408#discussion_r526133218



##########
File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/FiltersSuite.scala
##########
@@ -63,6 +64,24 @@ class FiltersSuite extends SparkFunSuite with Logging with PlanTest {
     (Literal(1) === a("intcol", IntegerType)) :: (Literal("a") === a("strcol", IntegerType)) :: Nil,
     "1 = intcol and \"a\" = strcol")
 
+  filterTest("date filter",

Review comment:
       Different hive versions tested by `HivePartitionFilteringSuite`:
   https://github.com/apache/spark/blob/ba2f5531601ff169ff0f9fab9bb60627f65082bf/sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HivePartitionFilteringSuite.scala#L290-L343




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #30408: [SPARK-33477][SQL] Hive Metastore should support filter by date type

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #30408:
URL: https://github.com/apache/spark/pull/30408#discussion_r529216190



##########
File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala
##########
@@ -700,6 +709,21 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
       }
     }
 
+    object ExtractableDateValues {
+      private lazy val valueToLiteralString: PartialFunction[Any, String] = {
+        case value: Int => dateFormatter.format(value)
+      }
+
+      def unapply(values: Set[Any]): Option[Seq[String]] = {
+        val extractables = values.toSeq.map(valueToLiteralString.lift)
+        if (extractables.nonEmpty && extractables.forall(_.isDefined)) {

Review comment:
       Ah, ok. Thanks.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30408: [SPARK-33477][SQL] Hive Metastore support filter by date type

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30408:
URL: https://github.com/apache/spark/pull/30408#issuecomment-733637306






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30408: [SPARK-33477][SQL] Hive Metastore should support filter by date type

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30408:
URL: https://github.com/apache/spark/pull/30408#issuecomment-732928683






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on a change in pull request #30408: [SPARK-33477][SQL] Hive Metastore support filter by date type

Posted by GitBox <gi...@apache.org>.
wangyum commented on a change in pull request #30408:
URL: https://github.com/apache/spark/pull/30408#discussion_r530112342



##########
File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/FiltersSuite.scala
##########
@@ -139,6 +163,11 @@ class FiltersSuite extends SparkFunSuite with Logging with PlanTest {
         InSet(a("doublecol", DoubleType),
           Range(1, 20).map(s => Literal(s.toDouble).eval(EmptyRow)).toSet),
         "")
+
+      checkConverted(
+        InSet(a("datecol", DateType),
+          Range(1, 20).map(d => Literal(d, DateType).eval(EmptyRow)).toSet),
+        "(datecol >= 1970-01-02 and datecol <= 1970-01-20)")

Review comment:
       ```scala
   scala> import org.apache.spark.sql.types._
   import org.apache.spark.sql.types._
   
   scala> import org.apache.spark.sql.catalyst.expressions._
   import org.apache.spark.sql.catalyst.expressions._
   
   scala> Range(1, 20).map(d => Cast(Literal(d, DateType), StringType, Some("America/Los_Angeles"))).map(_.eval(EmptyRow))
   res13: scala.collection.immutable.IndexedSeq[Any] = Vector(1970-01-02, 1970-01-03, 1970-01-04, 1970-01-05, 1970-01-06, 1970-01-07, 1970-01-08, 1970-01-09, 1970-01-10, 1970-01-11, 1970-01-12, 1970-01-13, 1970-01-14, 1970-01-15, 1970-01-16, 1970-01-17, 1970-01-18, 1970-01-19, 1970-01-20)
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30408: [SPARK-33477][SQL] Hive Metastore support filter by date type

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30408:
URL: https://github.com/apache/spark/pull/30408#issuecomment-733471851


   **[Test build #131740 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131740/testReport)** for PR 30408 at commit [`29c489a`](https://github.com/apache/spark/commit/29c489ad5f753aaa3551489655073c9f6fc7b0c6).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #30408: [SPARK-33477][SQL] Hive Metastore should support filter by date type

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #30408:
URL: https://github.com/apache/spark/pull/30408#discussion_r528775772



##########
File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HivePartitionFilteringSuite.scala
##########
@@ -297,6 +294,61 @@ class HivePartitionFilteringSuite(version: String)
       day :: Nil)
   }
 
+  test("getPartitionsByFilter: date type pruning by metastore") {
+    val table = CatalogTable(
+      identifier = TableIdentifier("test_date", Some("default")),
+      tableType = CatalogTableType.MANAGED,
+      schema = new StructType().add("value", "int").add("part", "date"),
+      partitionColumnNames = Seq("part"),
+      storage = storageFormat)
+    client.createTable(table, ignoreIfExists = false)
+
+    val partitions =
+      for {
+        date <- Seq("2019-01-01", "2019-01-02", "2019-01-03", "2019-01-04")
+      } yield CatalogTablePartition(Map(
+        "part" -> date
+      ), storageFormat)
+    assert(partitions.size == 4)
+
+    client.createPartitions("default", "test_date", partitions, ignoreIfExists = false)
+
+    def testDataTypeFiltering(
+        filterExprs: Seq[Expression],
+        expectedPartitionCubes: Seq[Seq[Date]]): Unit = {
+      val filteredPartitions = client.getPartitionsByFilter(
+        client.getTable("default", "test_date"),
+        filterExprs,
+        SQLConf.get.sessionLocalTimeZone)
+
+      val expectedPartitions = expectedPartitionCubes.map {
+        expectedDt =>
+          for {
+            dt <- expectedDt
+          } yield Set(
+            "part" -> dt.toString
+          )
+      }.reduce(_ ++ _)
+
+      assert(filteredPartitions.map(_.spec.toSet).toSet == expectedPartitions.toSet)
+    }
+
+    testDataTypeFiltering(
+      Seq(AttributeReference("part", DateType)() === Date.valueOf("2019-01-01")),

Review comment:
       can we create an `attr` method to get the `AttributeReference` from the table? to follow other tests.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #30408: [SPARK-33477][SQL] Hive Metastore support filter by date type

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #30408:
URL: https://github.com/apache/spark/pull/30408#issuecomment-733525530


   Merged to master.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30408: [SPARK-33477][SQL] Hive Metastore support filter by date type

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #30408:
URL: https://github.com/apache/spark/pull/30408#discussion_r530105650



##########
File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/FiltersSuite.scala
##########
@@ -63,6 +65,28 @@ class FiltersSuite extends SparkFunSuite with Logging with PlanTest {
     (Literal(1) === a("intcol", IntegerType)) :: (Literal("a") === a("strcol", IntegerType)) :: Nil,
     "1 = intcol and \"a\" = strcol")
 
+  filterTest("date filter",
+    (a("datecol", DateType) === Literal(Date.valueOf("2019-01-01"))) :: Nil,
+    "datecol = 2019-01-01")
+
+  filterTest("date filter with IN predicate",
+    (a("datecol", DateType) in
+      (Literal(Date.valueOf("2019-01-01")), Literal(Date.valueOf("2019-01-07")))) :: Nil,
+    "(datecol = 2019-01-01 or datecol = 2019-01-07)")
+
+  filterTest("date and string filter",
+    (Literal(Date.valueOf("2019-01-01")) === a("datecol", DateType)) ::
+      (Literal("a") === a("strcol", IntegerType)) :: Nil,
+    "2019-01-01 = datecol and \"a\" = strcol")
+
+  filterTest("date filter with null",
+    (a("datecol", DateType) ===  Literal(null)) :: Nil,
+    "")
+
+  filterTest("string filter with InSet predicate",
+    InSet(a("strcol", StringType), Set("1", "2").map(s => UTF8String.fromString(s))) :: Nil,
+    "(strcol = \"1\" or strcol = \"2\")")

Review comment:
       nit. This seems to be irrelevant to this PR, but it's good to have this.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30408: [SPARK-33477][SQL] Hive Metastore should support filter by date type

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30408:
URL: https://github.com/apache/spark/pull/30408#issuecomment-732991026






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30408: [SPARK-33477][SQL] Hive Metastore should support filter by date type

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30408:
URL: https://github.com/apache/spark/pull/30408#issuecomment-730433287


   **[Test build #131357 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131357/testReport)** for PR 30408 at commit [`ce5f0d1`](https://github.com/apache/spark/commit/ce5f0d147cf09b8fcfbf4e170c0a10072be54941).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30408: [SPARK-33477][SQL] Hive Metastore should support filter by date type

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30408:
URL: https://github.com/apache/spark/pull/30408#issuecomment-732760393


   **[Test build #131631 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131631/testReport)** for PR 30408 at commit [`752eb8d`](https://github.com/apache/spark/commit/752eb8dab286aa78925851bb2ebd0b0ce816def6).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on a change in pull request #30408: [SPARK-33477][SQL] Hive Metastore should support filter by date type

Posted by GitBox <gi...@apache.org>.
wangyum commented on a change in pull request #30408:
URL: https://github.com/apache/spark/pull/30408#discussion_r529214802



##########
File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala
##########
@@ -700,6 +709,21 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
       }
     }
 
+    object ExtractableDateValues {
+      private lazy val valueToLiteralString: PartialFunction[Any, String] = {
+        case value: Int => dateFormatter.format(value)
+      }
+
+      def unapply(values: Set[Any]): Option[Seq[String]] = {
+        val extractables = values.toSeq.map(valueToLiteralString.lift)
+        if (extractables.nonEmpty && extractables.forall(_.isDefined)) {

Review comment:
       Otherwise this test will fail:
   ```scala
     filterTest("string filter with InSet predicate",
       (InSet(a("stringcol", StringType),
         Range(1, 3).map(d => UTF8String.fromString(d.toString)).toSet)) :: Nil,
       "(stringcol = \"1\" or stringcol = \"2\")")
   ```
   
   ```
   None.get
   java.util.NoSuchElementException: None.get
   	at scala.None$.get(Option.scala:529)
   	at scala.None$.get(Option.scala:527)
   	at org.apache.spark.sql.hive.client.Shim_v0_13$ExtractableDateValues$1$.$anonfun$unapply$7(HiveShim.scala:720)
   	at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
   	at scala.collection.Iterator.foreach(Iterator.scala:941)
   	at scala.collection.Iterator.foreach$(Iterator.scala:941)
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on a change in pull request #30408: [SPARK-33477][SQL] Hive Metastore should support filter by date type

Posted by GitBox <gi...@apache.org>.
wangyum commented on a change in pull request #30408:
URL: https://github.com/apache/spark/pull/30408#discussion_r529214802



##########
File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala
##########
@@ -700,6 +709,21 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
       }
     }
 
+    object ExtractableDateValues {
+      private lazy val valueToLiteralString: PartialFunction[Any, String] = {
+        case value: Int => dateFormatter.format(value)
+      }
+
+      def unapply(values: Set[Any]): Option[Seq[String]] = {
+        val extractables = values.toSeq.map(valueToLiteralString.lift)
+        if (extractables.nonEmpty && extractables.forall(_.isDefined)) {

Review comment:
       Otherwise this test will fail:
   ```scala
     filterTest("string filter with InSet predicate",
       (InSet(a("stringcol", StringType),
         Range(1, 10).map(d => UTF8String.fromString(d.toString)).toSet)) :: Nil,
       "")
   ```
   
   ```
   None.get
   java.util.NoSuchElementException: None.get
   	at scala.None$.get(Option.scala:529)
   	at scala.None$.get(Option.scala:527)
   	at org.apache.spark.sql.hive.client.Shim_v0_13$ExtractableDateValues$1$.$anonfun$unapply$7(HiveShim.scala:720)
   	at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
   	at scala.collection.Iterator.foreach(Iterator.scala:941)
   	at scala.collection.Iterator.foreach$(Iterator.scala:941)
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30408: [SPARK-33477][SQL] Hive Metastore should support filter by date type

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30408:
URL: https://github.com/apache/spark/pull/30408#issuecomment-733230958






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30408: [SPARK-33477][SQL] Hive Metastore should support filter by date type

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30408:
URL: https://github.com/apache/spark/pull/30408#issuecomment-729536316


   **[Test build #131274 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131274/testReport)** for PR 30408 at commit [`ba2f553`](https://github.com/apache/spark/commit/ba2f5531601ff169ff0f9fab9bb60627f65082bf).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30408: [SPARK-33477][SQL] Hive Metastore support filter by date type

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30408:
URL: https://github.com/apache/spark/pull/30408#issuecomment-733666327






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30408: [SPARK-33477][SQL] Hive Metastore support filter by date type

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30408:
URL: https://github.com/apache/spark/pull/30408#issuecomment-733578116


   **[Test build #131751 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131751/testReport)** for PR 30408 at commit [`29c489a`](https://github.com/apache/spark/commit/29c489ad5f753aaa3551489655073c9f6fc7b0c6).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30408: [SPARK-33477][SQL] Hive Metastore should support filter by date type

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30408:
URL: https://github.com/apache/spark/pull/30408#issuecomment-732674946


   **[Test build #131624 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131624/testReport)** for PR 30408 at commit [`752eb8d`](https://github.com/apache/spark/commit/752eb8dab286aa78925851bb2ebd0b0ce816def6).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30408: [SPARK-33477][SQL] Hive Metastore support filter by date type

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30408:
URL: https://github.com/apache/spark/pull/30408#issuecomment-733471851


   **[Test build #131740 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131740/testReport)** for PR 30408 at commit [`29c489a`](https://github.com/apache/spark/commit/29c489ad5f753aaa3551489655073c9f6fc7b0c6).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30408: [SPARK-33477][SQL] Hive Metastore support filter by date type

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30408:
URL: https://github.com/apache/spark/pull/30408#issuecomment-733510488






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on pull request #30408: [SPARK-33477][SQL] Hive Metastore should support filter by date type

Posted by GitBox <gi...@apache.org>.
wangyum commented on pull request #30408:
URL: https://github.com/apache/spark/pull/30408#issuecomment-732769319


   @shaneknapp Did you set :`export LANG=en_US.UTF-8`?
   ```
   org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.nio.file.InvalidPathException: Malformed input or input contains unmappable characters: /home/jenkins/workspace/SparkPullRequestBuilder@3/sql/hive/target/tmp/hive_execution_test_group/warehouse-1355e680-268f-4224-b549-eaddcadcf136/DaTaBaSe_I.db/tab_ı);
   ```
   
   This issue should be fixed if we set `export LANG=en_US.UTF-8`, more details:https://issues.apache.org/jira/browse/SPARK-27177


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #30408: [SPARK-33477][SQL] Hive Metastore should support filter by date type

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #30408:
URL: https://github.com/apache/spark/pull/30408#discussion_r526063207



##########
File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala
##########
@@ -1265,11 +1265,13 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
       defaultTimeZoneId: String): Seq[CatalogTablePartition] = withClient {
     val rawTable = getRawTable(db, table)
     val catalogTable = restoreTableMetadata(rawTable)
+    val timeZoneId = CaseInsensitiveMap(catalogTable.storage.properties).getOrElse(

Review comment:
       Why does `prunePartitionsByFilter` not use it?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30408: [SPARK-33477][SQL] Hive Metastore should support filter by date type

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30408:
URL: https://github.com/apache/spark/pull/30408#issuecomment-730492136






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30408: [SPARK-33477][SQL] Hive Metastore should support filter by date type

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30408:
URL: https://github.com/apache/spark/pull/30408#issuecomment-730492136






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30408: [SPARK-33477][SQL] Hive Metastore should support filter by date type

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30408:
URL: https://github.com/apache/spark/pull/30408#issuecomment-729536316


   **[Test build #131274 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131274/testReport)** for PR 30408 at commit [`ba2f553`](https://github.com/apache/spark/commit/ba2f5531601ff169ff0f9fab9bb60627f65082bf).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30408: [SPARK-33477][SQL] Hive Metastore should support filter by date type

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30408:
URL: https://github.com/apache/spark/pull/30408#issuecomment-730476123


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35961/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gatorsmile commented on a change in pull request #30408: [SPARK-33477][SQL] Hive Metastore support filter by date type

Posted by GitBox <gi...@apache.org>.
gatorsmile commented on a change in pull request #30408:
URL: https://github.com/apache/spark/pull/30408#discussion_r550948478



##########
File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HivePartitionFilteringSuite.scala
##########
@@ -297,6 +294,63 @@ class HivePartitionFilteringSuite(version: String)
       day :: Nil)
   }
 
+  test("getPartitionsByFilter: date type pruning by metastore") {
+    val table = CatalogTable(
+      identifier = TableIdentifier("test_date", Some("default")),
+      tableType = CatalogTableType.MANAGED,
+      schema = new StructType().add("value", "int").add("part", "date"),
+      partitionColumnNames = Seq("part"),
+      storage = storageFormat)
+    client.createTable(table, ignoreIfExists = false)
+
+    val partitions =
+      for {
+        date <- Seq("2019-01-01", "2019-01-02", "2019-01-03", "2019-01-04")

Review comment:
       How about NULL?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon closed pull request #30408: [SPARK-33477][SQL] Hive Metastore support filter by date type

Posted by GitBox <gi...@apache.org>.
HyukjinKwon closed pull request #30408:
URL: https://github.com/apache/spark/pull/30408


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org