You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/08/05 16:09:35 UTC

[GitHub] [arrow] alamb opened a new pull request #7906: ARROW-9652: Error message rather than panic csv with no column defs

alamb opened a new pull request #7906:
URL: https://github.com/apache/arrow/pull/7906


   This PR builds on https://github.com/apache/arrow/pull/7905 to do two things:
   1. Make better error messages for CREATE EXTERNAL TABLE commands that are not semantically valid and prevents a subsequent panic in the planner
   2. Move planning logic for CREATE EXTERNAL TABLE out of context.rs and into planer.rs
   
   The behavior before this PR:
   
   ```
   CREATE EXTERNAL TABLE repro
   STORED AS CSV
   LOCATION 'repro.csv';
   
   > select * from repro;
   thread 'main' panicked at 'index out of bounds: the len is 0 but the index is 0', datafusion/src/optimizer/projection_push_down.rs:238:31
   note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
   ```
   
   The behavior after this PR:
   
   ```
   > CREATE EXTERNAL TABLE repro
   STORED AS CSV
   LOCATION 'repro.csv';
   
   General("Column definitions required for CSV files. None found")
   ```
   
   There are more detail on https://issues.apache.org/jira/browse/ARROW-9652
   
   h2. Planning Logic Consolidation into planner.rs
   
   This change could be much smaller if I had placed the checks directly in parser.rs. However, I instead propose the larger (but I think more coherent) strategy of moving the planing logic for CREATE EXTERNAL into the planner. This means that now the planner handles all the DataFusion SQL Dialects internally rather than only the SQL ones.
   
   One reason I didn't just want to put the checks into parser.rs itself was that this check seems like it is actually a semantic check rather than a syntactic one -- aka there is nothing wrong with the SQL itself, but there is something logically wrong with the statement.
   
   The PR is broken up into two commits -- one to move the CREATE EXTERNAL TABLE code into the planner and the other to add new semantic checks. I am happy to break the change into two PRs if you prefer.
   
   I am also happy to move the check to parser.rs, if you prefer.
   
   
   
   h2. CSV Schema Inference Logic
   There appears to be some CSV schema inference logic in DataFusion (e.g. [CsvEec::try_infer et al](https://github.com/apache/arrow/blob/master/rust/datafusion/src/execution/physical_plan/csv.rs#L123) , but it does not appear to be run prior to the planner panicing.
   
   I may have missed a better intended behavior that the CSV schema inference logic gets run at CREATE EXTERNAL TABLE planning time.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #7906: ARROW-9652: [Rust][DataFusion] Error message rather than panic for external csv tables with no column defs

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #7906:
URL: https://github.com/apache/arrow/pull/7906#issuecomment-669289126


   https://issues.apache.org/jira/browse/ARROW-9652


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on a change in pull request #7906: ARROW-9652: [Rust][DataFusion] Error message rather than panic for external csv tables with no column defs

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #7906:
URL: https://github.com/apache/arrow/pull/7906#discussion_r465840928



##########
File path: rust/datafusion/src/execution/context.rs
##########
@@ -195,43 +169,6 @@ impl ExecutionContext {
         &self.scalar_functions
     }
 
-    fn build_schema(&self, columns: &Vec<SQLColumnDef>) -> Result<Schema> {

Review comment:
       This code is just moved to https://github.com/apache/arrow/pull/7906/files#diff-21b952259e12c1adf2098d9a797caef1R134 in https://github.com/apache/arrow/pull/7906/commits/aaaa9b7da9d9566ca6f3fc84d7b272951ff51f40




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] andygrove closed pull request #7906: ARROW-9652: [Rust][DataFusion] Error message rather than panic for external csv tables with no column defs

Posted by GitBox <gi...@apache.org>.
andygrove closed pull request #7906:
URL: https://github.com/apache/arrow/pull/7906


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on pull request #7906: ARROW-9652: [Rust][DataFusion] Error message rather than panic for external csv tables with no column defs

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #7906:
URL: https://github.com/apache/arrow/pull/7906#issuecomment-670012972


   I have rebased against master


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org