You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/07/25 16:32:36 UTC

[GitHub] [arrow] jorgecarleitao opened a new pull request #7833: ARROW-7903: [Rust] [DataFusion] Migrated to sqlparser 0.6.1

jorgecarleitao opened a new pull request #7833:
URL: https://github.com/apache/arrow/pull/7833


   AFAIK this keeps all functionality that we have available and open the doors to all the goodies of the newer version. There are too many to enumerate here, but the most predominant for me are multi statements and sub-queries.
   
   As a side note, sqlparser is very well designed and easy-to-use, and I am convinced that we should continue to take this dependency. We currently need some custom parsing and it was very easy to understand what I needed to do based on the documentation and their code.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] andygrove commented on a change in pull request #7833: ARROW-7903: [Rust] [DataFusion] Migrated to sqlparser 0.6.1

Posted by GitBox <gi...@apache.org>.
andygrove commented on a change in pull request #7833:
URL: https://github.com/apache/arrow/pull/7833#discussion_r460523985



##########
File path: rust/datafusion/src/execution/context.rs
##########
@@ -140,10 +142,10 @@ impl ExecutionContext {
 
     /// Creates a logical plan
     pub fn create_logical_plan(&mut self, sql: &str) -> Result<LogicalPlan> {
-        let ast = DFParser::parse_sql(sql)?;
+        let statements = DFParser::parse_sql(sql)?;
 
-        match ast {
-            DFASTNode::ANSI(ansi) => {
+        match &statements[0] {

Review comment:
       I think we should return an `Err` result here if `statements.len() != 1`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] andygrove commented on pull request #7833: ARROW-7903: [Rust] [DataFusion] Migrated to sqlparser 0.6.1

Posted by GitBox <gi...@apache.org>.
andygrove commented on pull request #7833:
URL: https://github.com/apache/arrow/pull/7833#issuecomment-663984744


   @houqp fyi


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] andygrove commented on pull request #7833: ARROW-7903: [Rust] [DataFusion] Migrated to sqlparser 0.6.1

Posted by GitBox <gi...@apache.org>.
andygrove commented on pull request #7833:
URL: https://github.com/apache/arrow/pull/7833#issuecomment-663934226


   Thanks @jorgecarleitao this is fantastic. I imagined this would be a much larger change. I took a quick look through and LGTM but I need to find time for a more thorough review.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #7833: ARROW-7903: [Rust] [DataFusion] Migrated to sqlparser 0.6.1

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #7833:
URL: https://github.com/apache/arrow/pull/7833#discussion_r460527664



##########
File path: rust/datafusion/src/sql/parser.rs
##########
@@ -72,153 +74,185 @@ pub struct DFParser {
 impl DFParser {
     /// Parse the specified tokens
     pub fn new(sql: &str) -> Result<Self, ParserError> {
-        let dialect = GenericSqlDialect {};
+        let dialect = GenericDialect {};
         let mut tokenizer = Tokenizer::new(&dialect, sql);
         let tokens = tokenizer.tokenize()?;
         Ok(DFParser {
             parser: Parser::new(tokens),
         })
     }
 
-    /// Parse a SQL statement and produce an Abstract Syntax Tree (AST)
-    pub fn parse_sql(sql: &str) -> Result<DFASTNode, ParserError> {
+    /// Parse a SQL statement and produce a set of statements
+    pub fn parse_sql(sql: &str) -> Result<Vec<Statement>, ParserError> {
+        let mut tokenizer = Tokenizer::new(&GenericDialect {}, &sql);
+        tokenizer.tokenize()?;
         let mut parser = DFParser::new(sql)?;
-        parser.parse()
+        let mut stmts = Vec::new();
+        let mut expecting_statement_delimiter = false;
+        loop {
+            // ignore empty statements (between successive statement delimiters)
+            while parser.parser.consume_token(&Token::SemiColon) {
+                expecting_statement_delimiter = false;
+            }
+
+            if parser.parser.peek_token() == Token::EOF {
+                break;
+            }
+            if expecting_statement_delimiter {
+                return parser.expected("end of statement", parser.parser.peek_token());
+            }
+
+            let statement = parser.parse_statement()?;
+            stmts.push(statement);
+            expecting_statement_delimiter = true;
+        }
+        Ok(stmts)
+    }
+
+    /// Report unexpected token
+    fn expected<T>(&self, expected: &str, found: Token) -> Result<T, ParserError> {
+        parser_err!(format!("Expected {}, found: {}", expected, found))
     }
 
     /// Parse a new expression
-    pub fn parse(&mut self) -> Result<DFASTNode, ParserError> {
-        self.parse_expr(0)
+    pub fn parse_statement(&mut self) -> Result<Statement, ParserError> {
+        match self.parser.next_token() {

Review comment:
       done.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kszucs closed pull request #7833: ARROW-7903: [Rust] [DataFusion] Migrated to sqlparser 0.6.1

Posted by GitBox <gi...@apache.org>.
kszucs closed pull request #7833:
URL: https://github.com/apache/arrow/pull/7833


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #7833: ARROW-7903: [Rust] [DataFusion] Migrated to sqlparser 0.6.1

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #7833:
URL: https://github.com/apache/arrow/pull/7833#issuecomment-663876040


   https://issues.apache.org/jira/browse/ARROW-7903


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] andygrove commented on a change in pull request #7833: ARROW-7903: [Rust] [DataFusion] Migrated to sqlparser 0.6.1

Posted by GitBox <gi...@apache.org>.
andygrove commented on a change in pull request #7833:
URL: https://github.com/apache/arrow/pull/7833#discussion_r460524177



##########
File path: rust/datafusion/src/sql/parser.rs
##########
@@ -72,153 +74,185 @@ pub struct DFParser {
 impl DFParser {
     /// Parse the specified tokens
     pub fn new(sql: &str) -> Result<Self, ParserError> {
-        let dialect = GenericSqlDialect {};
+        let dialect = GenericDialect {};
         let mut tokenizer = Tokenizer::new(&dialect, sql);
         let tokens = tokenizer.tokenize()?;
         Ok(DFParser {
             parser: Parser::new(tokens),
         })
     }
 
-    /// Parse a SQL statement and produce an Abstract Syntax Tree (AST)
-    pub fn parse_sql(sql: &str) -> Result<DFASTNode, ParserError> {
+    /// Parse a SQL statement and produce a set of statements
+    pub fn parse_sql(sql: &str) -> Result<Vec<Statement>, ParserError> {
+        let mut tokenizer = Tokenizer::new(&GenericDialect {}, &sql);

Review comment:
       Just a side note but I think eventually we should use the Hive dialect.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #7833: ARROW-7903: [Rust] [DataFusion] Migrated to sqlparser 0.6.1

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #7833:
URL: https://github.com/apache/arrow/pull/7833#discussion_r460527604



##########
File path: rust/datafusion/src/execution/context.rs
##########
@@ -140,10 +142,10 @@ impl ExecutionContext {
 
     /// Creates a logical plan
     pub fn create_logical_plan(&mut self, sql: &str) -> Result<LogicalPlan> {
-        let ast = DFParser::parse_sql(sql)?;
+        let statements = DFParser::parse_sql(sql)?;
 
-        match ast {
-            DFASTNode::ANSI(ansi) => {
+        match &statements[0] {

Review comment:
       Well spotted. Fixed




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] andygrove commented on a change in pull request #7833: ARROW-7903: [Rust] [DataFusion] Migrated to sqlparser 0.6.1

Posted by GitBox <gi...@apache.org>.
andygrove commented on a change in pull request #7833:
URL: https://github.com/apache/arrow/pull/7833#discussion_r460524377



##########
File path: rust/datafusion/src/sql/parser.rs
##########
@@ -72,153 +74,185 @@ pub struct DFParser {
 impl DFParser {
     /// Parse the specified tokens
     pub fn new(sql: &str) -> Result<Self, ParserError> {
-        let dialect = GenericSqlDialect {};
+        let dialect = GenericDialect {};
         let mut tokenizer = Tokenizer::new(&dialect, sql);
         let tokens = tokenizer.tokenize()?;
         Ok(DFParser {
             parser: Parser::new(tokens),
         })
     }
 
-    /// Parse a SQL statement and produce an Abstract Syntax Tree (AST)
-    pub fn parse_sql(sql: &str) -> Result<DFASTNode, ParserError> {
+    /// Parse a SQL statement and produce a set of statements
+    pub fn parse_sql(sql: &str) -> Result<Vec<Statement>, ParserError> {
+        let mut tokenizer = Tokenizer::new(&GenericDialect {}, &sql);
+        tokenizer.tokenize()?;
         let mut parser = DFParser::new(sql)?;
-        parser.parse()
+        let mut stmts = Vec::new();
+        let mut expecting_statement_delimiter = false;
+        loop {
+            // ignore empty statements (between successive statement delimiters)
+            while parser.parser.consume_token(&Token::SemiColon) {
+                expecting_statement_delimiter = false;
+            }
+
+            if parser.parser.peek_token() == Token::EOF {
+                break;
+            }
+            if expecting_statement_delimiter {
+                return parser.expected("end of statement", parser.parser.peek_token());
+            }
+
+            let statement = parser.parse_statement()?;
+            stmts.push(statement);
+            expecting_statement_delimiter = true;
+        }
+        Ok(stmts)
+    }
+
+    /// Report unexpected token
+    fn expected<T>(&self, expected: &str, found: Token) -> Result<T, ParserError> {
+        parser_err!(format!("Expected {}, found: {}", expected, found))
     }
 
     /// Parse a new expression
-    pub fn parse(&mut self) -> Result<DFASTNode, ParserError> {
-        self.parse_expr(0)
+    pub fn parse_statement(&mut self) -> Result<Statement, ParserError> {
+        match self.parser.next_token() {

Review comment:
       Perhaps it would be better to use `peek_token` here to avoid the need to roll back?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org