You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "liukun4515 (via GitHub)" <gi...@apache.org> on 2023/07/11 05:51:14 UTC
[GitHub] [arrow-datafusion] liukun4515 commented on issue #6876: support: Date +/plus Int or date_add function
liukun4515 commented on issue #6876:
URL: https://github.com/apache/arrow-datafusion/issues/6876#issuecomment-1630182777
> 👍 -- I believe @tustvold is cleaning up the arithmetic logic in arrow-rs / datafusion now
Ok, I will take look this work and track this process of work
> What types can the `value_expr` be in `spark`?
In the spark
```
spark-sql> select version();
3.2.0 5d45a415f3a29898d92380380cfd82bfc7f579ea
Time taken: 0.084 seconds, Fetched 1 row(s)
spark-sql> desc test;
a date
b int
```
`date` + integer constant
```
spark-sql> explain extended select a+10 from test;
== Parsed Logical Plan ==
'Project [unresolvedalias(('a + 10), None)]
+- 'UnresolvedRelation [test], [], false
== Analyzed Logical Plan ==
date_add(a, 10): date
Project [date_add(a#49, 10) AS date_add(a, 10)#51]
+- SubqueryAlias spark_catalog.default.test
+- HiveTableRelation [`default`.`test`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [a#49, b#50], Partition Cols: []]
== Optimized Logical Plan ==
Project [date_add(a#49, 10) AS date_add(a, 10)#51]
+- HiveTableRelation [`default`.`test`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [a#49, b#50], Partition Cols: []]
== Physical Plan ==
*(1) Project [date_add(a#49, 10) AS date_add(a, 10)#51]
+- Scan hive default.test [a#49], HiveTableRelation [`default`.`test`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [a#49, b#50], Partition Cols: []]
Time taken: 0.04 seconds, Fetched 1 row(s)
```
There is a specific analyse rule to handle the data/time with the operation of arithmetic.
`date` +/date_add integer column/expr
```
spark-sql> explain extended select a+b from test;
== Parsed Logical Plan ==
'Project [unresolvedalias(('a + 'b), None)]
+- 'UnresolvedRelation [test], [], false
== Analyzed Logical Plan ==
date_add(a, b): date
Project [date_add(a#88, b#89) AS date_add(a, b)#90]
+- SubqueryAlias spark_catalog.default.test
+- HiveTableRelation [`default`.`test`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [a#88, b#89], Partition Cols: []]
== Optimized Logical Plan ==
Project [date_add(a#88, b#89) AS date_add(a, b)#90]
+- HiveTableRelation [`default`.`test`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [a#88, b#89], Partition Cols: []]
== Physical Plan ==
*(1) Project [date_add(a#88, b#89) AS date_add(a, b)#90]
+- Scan hive default.test [a#88, b#89], HiveTableRelation [`default`.`test`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [a#88, b#89], Partition Cols: []]
Time taken: 0.035 seconds, Fetched 1 row(s)
```
because the PG support the operation `date +/- integer` described in the doc https://www.postgresql.org/docs/current/functions-datetime.html
For example
```
date + integer → date
Add a number of days to a date
date '2001-09-28' + 7 → 2001-10-05
```
So I want to support the more arithmetic operation for date/time/timestamp/interval in the datafusion(maybe we can implement them in the arrow-rs).
The date operated by the arithmetic operation is required in the sql system or the query engine, So i don't know if the implementation of above operation in the arrow-rs kernel is suitable?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org