You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "mingnuj (via GitHub)" <gi...@apache.org> on 2023/04/18 04:36:29 UTC

[GitHub] [arrow-datafusion] mingnuj opened a new issue, #6040: Stack overflow while generating logical plan from statement

mingnuj opened a new issue, #6040:
URL: https://github.com/apache/arrow-datafusion/issues/6040

   ### Describe the bug
   
   While using multiple conditions are used, a stack overflow error occurs.
   
   In particular, when used with tokio, more limitations arise because the [default stack size is 2MiB](https://asomers.github.io/tokio-file/tokio/runtime/struct.Builder.html#method.thread_stack_size).
   
   ### To Reproduce
   
   I referred to reproduce code from issue #1434 provided by @mcassels.
   `SELECT * FROM table WHERE <condition0> OR <condition1> OR ...` 
   ``` rust
   use datafusion::{
       arrow::datatypes::{DataType, Field, Schema},
       common::Result,
       config::ConfigOptions,
       error::DataFusionError,
       logical_expr::{
           logical_plan::builder::LogicalTableSource, AggregateUDF, ScalarUDF, TableSource,
       },
       sql::{
           planner::{ContextProvider, SqlToRel},
           sqlparser::{dialect::GenericDialect, parser::Parser},
           TableReference,
       },
   };
   use std::{collections::HashMap, sync::Arc};
   
   #[global_allocator]
   static GLOBAL: mimalloc::MiMalloc = mimalloc::MiMalloc;
   
   #[tokio::main]
   async fn main() -> Result<()> {
       let num_conditions = 255;
       let where_clause = (0..num_conditions)
           .map(|i| format!("column1 = 'value{:?}'", i))
           .collect::<Vec<String>>()
           .join(" OR ");
       let sql = format!("SELECT * from table1 where {};", where_clause);
       get_optimized_plan(sql).await?;
       println!("query succeeded with {:?} conditions", num_conditions);
   
       let num_conditions = 256;
       let where_clause = (0..num_conditions)
           .map(|i| format!("column1 = 'value{:?}'", i))
           .collect::<Vec<String>>()
           .join(" OR ");
       let sql = format!("SELECT * from table1 where {};", where_clause);
       get_optimized_plan(sql).await?;
       println!("query succeeded with {:?} conditions", num_conditions);
   
       Ok(())
   }
   
   async fn get_optimized_plan(sql: String) -> Result<()> {
       let schema_provider = TestSchemaProvider::new();
   
       let dialect = GenericDialect {};
       let ast = Parser::parse_sql(&dialect, &sql).unwrap();
       let statement = &ast[0];
       let sql_to_rel = SqlToRel::new(&schema_provider);
       sql_to_rel.sql_statement_to_plan(statement.clone()).unwrap();
   
       Ok(())
   }
   
   struct TestSchemaProvider {
       options: ConfigOptions,
       tables: HashMap<String, Arc<dyn TableSource>>,
   }
   
   impl TestSchemaProvider {
       pub fn new() -> Self {
           let mut tables = HashMap::new();
           tables.insert(
               "table1".to_string(),
               create_table_source(vec![Field::new(
                   "column".to_string(),
                   DataType::Utf8,
                   false,
               )]),
           );
   
           Self {
               options: Default::default(),
               tables,
           }
       }
   }
   
   fn create_table_source(fields: Vec<Field>) -> Arc<dyn TableSource> {
       Arc::new(LogicalTableSource::new(Arc::new(
           Schema::new_with_metadata(fields, HashMap::new()),
       )))
   }
   
   impl ContextProvider for TestSchemaProvider {
       fn get_table_provider(&self, name: TableReference) -> Result<Arc<dyn TableSource>> {
           match self.tables.get(name.table()) {
               Some(table) => Ok(table.clone()),
               _ => Err(DataFusionError::Plan(format!(
                   "Table not found: {}",
                   name.table()
               ))),
           }
       }
   
       fn get_function_meta(&self, _name: &str) -> Option<Arc<ScalarUDF>> {
           None
       }
   
       fn get_aggregate_meta(&self, _name: &str) -> Option<Arc<AggregateUDF>> {
           None
       }
   
       fn get_variable_type(&self, _variable_names: &[String]) -> Option<DataType> {
           None
       }
   
       fn options(&self) -> &ConfigOptions {
           &self.options
       }
   }
   ```
   Output
   ``` bash
   query succeeded with 255 conditions
   
   thread 'main' has overflowed its stack
   fatal runtime error: stack overflow
   ```
   
   If there are more than 256 conditions, stack overflow occurs. This happens only `debug mode`, related to https://github.com/apache/arrow-datafusion/issues/1434#issuecomment-992758421. 
   
   ### Expected behavior
   
   Work without overflows..
   
   ### Additional context
   
   I guess 2 approaches to this problem.
   
   Approach#1
   Parameters are received as reference or without using box pointers in some functions, such as select_to_plan and plan_selection.  This maybe can make Stack grow faster.
   
   And I found some stack allocation with enumeration.
   https://www.reddit.com/r/rust/comments/zbla3j/how_does_enums_work_where_are_they_allocated/
   
   Approach#2
   Using [Address Sanitizer](https://doc.rust-lang.org/beta/unstable-book/compiler-flags/sanitizer.html) with the above example, error occurred in fmt::Display. But, I'm not sure exactly where it happened.
   
   This would be related to rust issue: https://github.com/rust-lang/rust/issues/45838 .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb closed issue #6040: Stack overflow while generating logical plan from statement

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb closed issue #6040: Stack overflow while generating logical plan from statement
URL: https://github.com/apache/arrow-datafusion/issues/6040


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org