You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/04/07 18:58:11 UTC

[GitHub] [arrow] alamb opened a new pull request #9936: ARROW-12278: [Rust][DataFusion] Use Timestamp(Nanosecond, None) for SQL TIMESTAMP Type

alamb opened a new pull request #9936:
URL: https://github.com/apache/arrow/pull/9936


   
   # Rationale
   Running the query `CREATE EXTERNAL TABLE .. (c TIMESTAMP)` today in DataFusion will result in a data type pf "Date64" which means that anything more specific than the date will be ignored.
   
   This leads to strange behavior such as
   
   ```shell
   echo "Jorge,2018-12-13T12:12:10.011" >> /tmp/foo.csv
   echo "Andrew,2018-11-13T17:11:10.011" > /tmp/foo.csv
   
   cargo run -p datafusion --bin datafusion-cli
       Finished dev [unoptimized + debuginfo] target(s) in 0.23s
        Running `target/debug/datafusion-cli`
   > CREATE EXTERNAL TABLE t(a varchar, b TIMESTAMP)
   STORED AS CSV
   LOCATION '/tmp/foo.csv';
   
   0 rows in set. Query took 0 seconds.
   > select * from t;
   +--------+------------+
   | a      | b          |
   +--------+------------+
   | Andrew | 2018-11-13 |
   | Jorge  | 2018-12-13 |
   +--------+------------+
   ```
   
   (note that the Time part is chopped off)
   
   # Changes
   This PR changes the default mapping from SQL type `TIMESTAMP`
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #9936: ARROW-12278: [Rust][DataFusion] Use Timestamp(Nanosecond, None) for SQL TIMESTAMP Type

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #9936:
URL: https://github.com/apache/arrow/pull/9936#issuecomment-815272147


   https://issues.apache.org/jira/browse/ARROW-12278


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb closed pull request #9936: ARROW-12278: [Rust][DataFusion] Use Timestamp(Nanosecond, None) for SQL TIMESTAMP Type

Posted by GitBox <gi...@apache.org>.
alamb closed pull request #9936:
URL: https://github.com/apache/arrow/pull/9936


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on a change in pull request #9936: ARROW-12278: [Rust][DataFusion] Use Timestamp(Nanosecond, None) for SQL TIMESTAMP Type

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #9936:
URL: https://github.com/apache/arrow/pull/9936#discussion_r608928102



##########
File path: rust/datafusion/src/execution/context.rs
##########
@@ -2863,10 +2913,7 @@ mod tests {
         ctx: &mut ExecutionContext,
         sql: &str,
     ) -> Result<Vec<RecordBatch>> {
-        let logical_plan = ctx.create_logical_plan(sql)?;

Review comment:
       This is needed so that CREATE EXTERNAL TABLE is handled correctly

##########
File path: rust/datafusion/src/sql/planner.rs
##########
@@ -298,7 +298,7 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> {
             SQLDataType::Boolean => Ok(DataType::Boolean),
             SQLDataType::Date => Ok(DataType::Date32),
             SQLDataType::Time => Ok(DataType::Time64(TimeUnit::Millisecond)),
-            SQLDataType::Timestamp => Ok(DataType::Date64),
+            SQLDataType::Timestamp => Ok(DataType::Timestamp(TimeUnit::Nanosecond, None)),

Review comment:
       This is the actual code change in this PR




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on pull request #9936: ARROW-12278: [Rust][DataFusion] Use Timestamp(Nanosecond, None) for SQL TIMESTAMP Type

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #9936:
URL: https://github.com/apache/arrow/pull/9936#issuecomment-816613802


   > Random though: I think that we would benefit from not writing a test on Context to test the SQL planner. It could be easier to identify which tests are testing what by writing unit tests more specialized to the component that they affect, as they usually declare the API behavior and are easier to find.
   
   @jorgecarleitao  I think the core of the problem is that the tests in `context.rs` are largely end to end tests of DataFusion rather than tests for the `ExecutionContext` itself. I think they ended up in context.rs because that is typically the main entry point to run queries.
   
   What would you think about refactoring the non `ExecutionContext` specific tests to some other file (perhaps the `tests/sql.rs` end to end test?)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on pull request #9936: ARROW-12278: [Rust][DataFusion] Use Timestamp(Nanosecond, None) for SQL TIMESTAMP Type

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #9936:
URL: https://github.com/apache/arrow/pull/9936#issuecomment-816259545


   FYI @Dandandan  / @seddonm1  / @ovr 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on a change in pull request #9936: ARROW-12278: [Rust][DataFusion] Use Timestamp(Nanosecond, None) for SQL TIMESTAMP Type

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #9936:
URL: https://github.com/apache/arrow/pull/9936#discussion_r610519325



##########
File path: rust/datafusion/src/execution/context.rs
##########
@@ -2878,6 +2878,54 @@ mod tests {
         Ok(())
     }
 
+    #[tokio::test]
+    async fn create_external_table_with_timestamps() {
+        let mut ctx = ExecutionContext::new();
+
+        let data = "Jorge,2018-12-13T12:12:10.011\n\
+                    Andrew,2018-11-13T17:11:10.011";
+
+        let tmp_dir = TempDir::new().unwrap();
+        let file_path = tmp_dir.path().join("timestamps.csv");
+
+        // scope to ensure the file is closed and written
+        {
+            File::create(&file_path)
+                .expect("creating temp file")
+                .write_all(data.as_bytes())
+                .expect("writing data");
+        }
+
+        let sql = format!(
+            "CREATE EXTERNAL TABLE csv_with_timestamps (
+                  name VARCHAR,
+                  ts TIMESTAMP
+              )
+              STORED AS CSV
+              LOCATION '{}'
+              ",
+            file_path.to_str().expect("path is utf8")
+        );
+
+        println!("{}", sql);

Review comment:
       Oh no! I will remove




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #9936: ARROW-12278: [Rust][DataFusion] Use Timestamp(Nanosecond, None) for SQL TIMESTAMP Type

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #9936:
URL: https://github.com/apache/arrow/pull/9936#discussion_r610328244



##########
File path: rust/datafusion/src/execution/context.rs
##########
@@ -2878,6 +2878,54 @@ mod tests {
         Ok(())
     }
 
+    #[tokio::test]
+    async fn create_external_table_with_timestamps() {
+        let mut ctx = ExecutionContext::new();
+
+        let data = "Jorge,2018-12-13T12:12:10.011\n\
+                    Andrew,2018-11-13T17:11:10.011";
+
+        let tmp_dir = TempDir::new().unwrap();
+        let file_path = tmp_dir.path().join("timestamps.csv");
+
+        // scope to ensure the file is closed and written
+        {
+            File::create(&file_path)
+                .expect("creating temp file")
+                .write_all(data.as_bytes())
+                .expect("writing data");
+        }
+
+        let sql = format!(
+            "CREATE EXTERNAL TABLE csv_with_timestamps (
+                  name VARCHAR,
+                  ts TIMESTAMP
+              )
+              STORED AS CSV
+              LOCATION '{}'
+              ",
+            file_path.to_str().expect("path is utf8")
+        );
+
+        println!("{}", sql);

Review comment:
       a wild `println` has appeared :)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org