You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "liukun4515 (via GitHub)" <gi...@apache.org> on 2023/07/11 05:51:14 UTC
[GitHub] [arrow-datafusion] liukun4515 commented on issue #6876: support: Date +/plus Int or date_add function

liukun4515 commented on issue #6876:
URL: https://github.com/apache/arrow-datafusion/issues/6876#issuecomment-1630182777

   
   > 👍 -- I believe @tustvold is cleaning up the arithmetic logic in arrow-rs / datafusion now
   
   Ok, I will take look this work and track this process of work
   
   > What types can the `value_expr` be in `spark`?
   
   In the spark
   ```
   spark-sql> select version();
   3.2.0 5d45a415f3a29898d92380380cfd82bfc7f579ea
   Time taken: 0.084 seconds, Fetched 1 row(s)
   
   spark-sql> desc test;
   a                   	date
   b                   	int
   ```
   
   `date` + integer constant
   
   ```
   spark-sql> explain extended select a+10 from test;
   == Parsed Logical Plan ==
   'Project [unresolvedalias(('a + 10), None)]
   +- 'UnresolvedRelation [test], [], false
   
   == Analyzed Logical Plan ==
   date_add(a, 10): date
   Project [date_add(a#49, 10) AS date_add(a, 10)#51]
   +- SubqueryAlias spark_catalog.default.test
      +- HiveTableRelation [`default`.`test`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [a#49, b#50], Partition Cols: []]
   
   == Optimized Logical Plan ==
   Project [date_add(a#49, 10) AS date_add(a, 10)#51]
   +- HiveTableRelation [`default`.`test`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [a#49, b#50], Partition Cols: []]
   
   == Physical Plan ==
   *(1) Project [date_add(a#49, 10) AS date_add(a, 10)#51]
   +- Scan hive default.test [a#49], HiveTableRelation [`default`.`test`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [a#49, b#50], Partition Cols: []]
   
   Time taken: 0.04 seconds, Fetched 1 row(s)
   ```
   
   There is a specific analyse rule to handle the  data/time with the operation of arithmetic. 
   
   
   `date` +/date_add    integer column/expr
   ```
   spark-sql> explain extended select a+b from test;
   == Parsed Logical Plan ==
   'Project [unresolvedalias(('a + 'b), None)]
   +- 'UnresolvedRelation [test], [], false
   
   == Analyzed Logical Plan ==
   date_add(a, b): date
   Project [date_add(a#88, b#89) AS date_add(a, b)#90]
   +- SubqueryAlias spark_catalog.default.test
      +- HiveTableRelation [`default`.`test`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [a#88, b#89], Partition Cols: []]
   
   == Optimized Logical Plan ==
   Project [date_add(a#88, b#89) AS date_add(a, b)#90]
   +- HiveTableRelation [`default`.`test`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [a#88, b#89], Partition Cols: []]
   
   == Physical Plan ==
   *(1) Project [date_add(a#88, b#89) AS date_add(a, b)#90]
   +- Scan hive default.test [a#88, b#89], HiveTableRelation [`default`.`test`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [a#88, b#89], Partition Cols: []]
   
   Time taken: 0.035 seconds, Fetched 1 row(s)
   ```
   
   because the PG support the operation `date +/- integer` described in the doc https://www.postgresql.org/docs/current/functions-datetime.html
   For example
   ```
   date + integer → date
   
   Add a number of days to a date
   
   date '2001-09-28' + 7 → 2001-10-05
   ```
   
   So I want to support the more arithmetic operation for date/time/timestamp/interval in the datafusion(maybe we can implement them in the arrow-rs).
   
   
   The  date operated by the arithmetic operation is required in the sql system or the query engine, So i don't  know if the implementation of above operation in the arrow-rs kernel is suitable?
   
   
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org