You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/12/11 15:51:26 UTC

[GitHub] [arrow-datafusion] xudong963 opened a new pull request #1437: Fix: stack overflow

xudong963 opened a new pull request #1437:
URL: https://github.com/apache/arrow-datafusion/pull/1437


   # Which issue does this PR close?
   
   <!--
   We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123.
   -->
   
   Closes #1434 
   
    # Rationale for this change
   <!--
    Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
    Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.  
   -->
   
   # What changes are included in this PR?
   <!--
   There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR.
   -->
   
   # Are there any user-facing changes?
   <!--
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   
   <!--
   If there are any breaking changes to public APIs, please add the `api change` label.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Jimexist commented on pull request #1437: Fix: stack overflow

Posted by GitBox <gi...@apache.org>.
Jimexist commented on pull request #1437:
URL: https://github.com/apache/arrow-datafusion/pull/1437#issuecomment-991711332


   I wonder if it makes sense to just limit number of or statements instead?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] xudong963 commented on pull request #1437: Fix: stack overflow

Posted by GitBox <gi...@apache.org>.
xudong963 commented on pull request #1437:
URL: https://github.com/apache/arrow-datafusion/pull/1437#issuecomment-991856488


   > I wonder if it makes sense to just limit number of or statements instead?
   
   First, `Statement` represents the SQL AST node, I don't think it's reasonable to limit the depth of recursion stack by limiting the number of `Statement`.
   
   Second, in some cases, it's preferable to unconditionally grow the stack than returning err or panic.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] xudong963 commented on pull request #1437: Fix: stack overflow

Posted by GitBox <gi...@apache.org>.
xudong963 commented on pull request #1437:
URL: https://github.com/apache/arrow-datafusion/pull/1437#issuecomment-991693965


   I think there are some places that need to be wrapped by `maybe_grow`, such as `match expr` in the physical plan, but I want to support them in the next ticket. In this ticket, the skeleton is laid.
   
   PTAL @alamb @houqp @Dandandan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] xudong963 edited a comment on pull request #1437: Fix: stack overflow

Posted by GitBox <gi...@apache.org>.
xudong963 edited a comment on pull request #1437:
URL: https://github.com/apache/arrow-datafusion/pull/1437#issuecomment-991693965


   I think there are still some places that need to be wrapped by `maybe_grow`, such as `match expr` in the physical plan, but I want to support them in the next ticket. In this ticket, the skeleton is laid.
   
   PTAL @alamb @houqp @Dandandan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] xudong963 edited a comment on pull request #1437: Fix: stack overflow

Posted by GitBox <gi...@apache.org>.
xudong963 edited a comment on pull request #1437:
URL: https://github.com/apache/arrow-datafusion/pull/1437#issuecomment-991693965


   I think there are still some places that need to be wrapped by `maybe_grow`, such as `match expr` in the physical plan, but I want to support them in the next ticket. In this ticket, the skeleton is laid.
   
   PTAL @alamb @houqp @Dandandan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] xudong963 commented on pull request #1437: Process stack overflow panic elegantly

Posted by GitBox <gi...@apache.org>.
xudong963 commented on pull request #1437:
URL: https://github.com/apache/arrow-datafusion/pull/1437#issuecomment-993744194


   I'll continue to hack on the ticket on the weekend 💪


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] xudong963 removed a comment on pull request #1437: Process stack overflow panic elegantly

Posted by GitBox <gi...@apache.org>.
xudong963 removed a comment on pull request #1437:
URL: https://github.com/apache/arrow-datafusion/pull/1437#issuecomment-993744194


   I'll continue to hack on the ticket on the weekend 💪


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] xudong963 commented on pull request #1437: Fix: stack overflow

Posted by GitBox <gi...@apache.org>.
xudong963 commented on pull request #1437:
URL: https://github.com/apache/arrow-datafusion/pull/1437#issuecomment-993464972


   > * Converts a `panic!` into an `Error` in some (very important)
   > * Allows the user to set a higher stack limit (but overflows are still possible)
   
   Very nice conclusion, You said what I was thinking inside👍
   
   
   
   > The thing I worry about with the approach in this PR is that it relies on us annotating all places in the code (both that currently exist as well as may be added in the future) that do recursive walks of the tree with `maybe_grow`). It seems almost inevitable that we will end up missing some and I do think this code will likely add a non trivial overhead to datafusion)
   > 
   > >
   
   Yes, I agree. So I think we may check the depth of exprs using `safe_recursion` in `datafusion/src/sql/planner.rs`, then we can avoid using `may_growth` in the follow-up process.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] xudong963 commented on a change in pull request #1437: Fix: stack overflow

Posted by GitBox <gi...@apache.org>.
xudong963 commented on a change in pull request #1437:
URL: https://github.com/apache/arrow-datafusion/pull/1437#discussion_r767168056



##########
File path: datafusion/src/safe_stack.rs
##########
@@ -0,0 +1,84 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::error::{DataFusionError, Result};
+use std::cell::RefCell;
+
+#[derive(Default, Debug, Clone)]
+/// Record recursion depth
+pub struct ProtectRecursion {
+    /// the depth of recursion
+    depth: RefCell<usize>,
+    /// the limit of recursion
+    limit: usize,
+}
+
+impl ProtectRecursion {
+    /// Make a protect recursion with specific limit
+    pub fn new_with_limit(limit: usize) -> ProtectRecursion {
+        ProtectRecursion {
+            depth: RefCell::new(0),
+            limit,
+        }
+    }
+
+    fn depth_ascend(&self) {
+        *self.depth.borrow_mut() -= 1;
+    }
+
+    fn depth_descend(&self) -> Result<()> {
+        let mut depth = self.depth.borrow_mut();
+        if *depth >= self.limit {
+            return Err(DataFusionError::RecursionLimitErr(self.limit));
+        }
+        *depth += 1;
+        Ok(())
+    }
+}
+
+/// Bytes available in the current stack
+pub const STACKER_RED_ZONE: usize = {
+    64 << 10 // 64KB

Review comment:
       This constant is what I feel, I want to know your thoughts.

##########
File path: datafusion/src/safe_stack.rs
##########
@@ -0,0 +1,84 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::error::{DataFusionError, Result};
+use std::cell::RefCell;
+
+#[derive(Default, Debug, Clone)]
+/// Record recursion depth
+pub struct ProtectRecursion {
+    /// the depth of recursion
+    depth: RefCell<usize>,
+    /// the limit of recursion
+    limit: usize,
+}
+
+impl ProtectRecursion {
+    /// Make a protect recursion with specific limit
+    pub fn new_with_limit(limit: usize) -> ProtectRecursion {
+        ProtectRecursion {
+            depth: RefCell::new(0),
+            limit,
+        }
+    }
+
+    fn depth_ascend(&self) {
+        *self.depth.borrow_mut() -= 1;
+    }
+
+    fn depth_descend(&self) -> Result<()> {
+        let mut depth = self.depth.borrow_mut();
+        if *depth >= self.limit {
+            return Err(DataFusionError::RecursionLimitErr(self.limit));
+        }
+        *depth += 1;
+        Ok(())
+    }
+}
+
+/// Bytes available in the current stack
+pub const STACKER_RED_ZONE: usize = {
+    64 << 10 // 64KB
+};
+
+/// Allocate a new stack of at least stack_size bytes.
+pub const STACKER_SIZE: usize = {
+    4 << 20 // 4MB

Review comment:
       ditto




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] xudong963 commented on a change in pull request #1437: Fix: stack overflow

Posted by GitBox <gi...@apache.org>.
xudong963 commented on a change in pull request #1437:
URL: https://github.com/apache/arrow-datafusion/pull/1437#discussion_r767168056



##########
File path: datafusion/src/safe_stack.rs
##########
@@ -0,0 +1,84 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::error::{DataFusionError, Result};
+use std::cell::RefCell;
+
+#[derive(Default, Debug, Clone)]
+/// Record recursion depth
+pub struct ProtectRecursion {
+    /// the depth of recursion
+    depth: RefCell<usize>,
+    /// the limit of recursion
+    limit: usize,
+}
+
+impl ProtectRecursion {
+    /// Make a protect recursion with specific limit
+    pub fn new_with_limit(limit: usize) -> ProtectRecursion {
+        ProtectRecursion {
+            depth: RefCell::new(0),
+            limit,
+        }
+    }
+
+    fn depth_ascend(&self) {
+        *self.depth.borrow_mut() -= 1;
+    }
+
+    fn depth_descend(&self) -> Result<()> {
+        let mut depth = self.depth.borrow_mut();
+        if *depth >= self.limit {
+            return Err(DataFusionError::RecursionLimitErr(self.limit));
+        }
+        *depth += 1;
+        Ok(())
+    }
+}
+
+/// Bytes available in the current stack
+pub const STACKER_RED_ZONE: usize = {
+    64 << 10 // 64KB

Review comment:
       This constant is what I feel, I want to know your thoughts.

##########
File path: datafusion/src/safe_stack.rs
##########
@@ -0,0 +1,84 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::error::{DataFusionError, Result};
+use std::cell::RefCell;
+
+#[derive(Default, Debug, Clone)]
+/// Record recursion depth
+pub struct ProtectRecursion {
+    /// the depth of recursion
+    depth: RefCell<usize>,
+    /// the limit of recursion
+    limit: usize,
+}
+
+impl ProtectRecursion {
+    /// Make a protect recursion with specific limit
+    pub fn new_with_limit(limit: usize) -> ProtectRecursion {
+        ProtectRecursion {
+            depth: RefCell::new(0),
+            limit,
+        }
+    }
+
+    fn depth_ascend(&self) {
+        *self.depth.borrow_mut() -= 1;
+    }
+
+    fn depth_descend(&self) -> Result<()> {
+        let mut depth = self.depth.borrow_mut();
+        if *depth >= self.limit {
+            return Err(DataFusionError::RecursionLimitErr(self.limit));
+        }
+        *depth += 1;
+        Ok(())
+    }
+}
+
+/// Bytes available in the current stack
+pub const STACKER_RED_ZONE: usize = {
+    64 << 10 // 64KB
+};
+
+/// Allocate a new stack of at least stack_size bytes.
+pub const STACKER_SIZE: usize = {
+    4 << 20 // 4MB

Review comment:
       ditto

##########
File path: datafusion/src/sql/planner.rs
##########
@@ -200,7 +211,9 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> {
 
     /// Generate a logic plan from an SQL query
     pub fn query_to_plan(&self, query: &Query) -> Result<LogicalPlan> {
-        self.query_to_plan_with_alias(query, None, &mut HashMap::new())
+        self.safe_recursion(|| {

Review comment:
       Determine if it is safe to call this function without stack overflow, if it's not safe, return `Err(Recursion exceeded limit)`

##########
File path: datafusion/src/sql/planner.rs
##########
@@ -113,7 +121,10 @@ fn plan_indexed(expr: Expr, mut keys: Vec<Value>) -> Expr {
 impl<'a, S: ContextProvider> SqlToRel<'a, S> {
     /// Create a new query planner
     pub fn new(schema_provider: &'a S) -> Self {
-        SqlToRel { schema_provider }
+        SqlToRel {
+            schema_provider,
+            project_recursion: ProtectRecursion::new_with_limit(2048),

Review comment:
       ditto

##########
File path: datafusion/src/sql/planner.rs
##########
@@ -113,7 +121,10 @@ fn plan_indexed(expr: Expr, mut keys: Vec<Value>) -> Expr {
 impl<'a, S: ContextProvider> SqlToRel<'a, S> {
     /// Create a new query planner
     pub fn new(schema_provider: &'a S) -> Self {
-        SqlToRel { schema_provider }
+        SqlToRel {
+            schema_provider,
+            project_recursion: ProtectRecursion::new_with_limit(2048),

Review comment:
       This constant is what I feel, I want to know your thoughts.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on pull request #1437: Fix: stack overflow

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #1437:
URL: https://github.com/apache/arrow-datafusion/pull/1437#issuecomment-992770032


   > > This is an acceptable short term workaround, but I think it would be more efficient and more elegant if we rewrite these recursive procedures into iterative procedures.
   
   > Do you mean to use push-based model?
   
   
   @xudong963 , I think what @houqp  is suggesting is to rewrite code that is recursive to not be recursive.
   
   The pattern for Datafusion probably looks like taking code like
   
   ```rust
   fn visit(expr: &expr)  {
     for child in expr.children() {
       visit(child)
     }
     // do actual expr logic
   }
   ```
   
   And changing it so the state is tracked with a structure on the heap rather than a stack. I think [`VecDeque`](https://doc.rust-lang.org/stable/std/collections/struct.VecDeque.html) is a good one for rust:
   
   ```rust
   fn visit(expr: &expr)  {
     let mut worklist = VecDequeue::new();
     worklist.push_back(expr);
     while !worklist.is_empty() {
       for child in expr.children() {
         worklist.push_back(child)
       }
       // do actual expr logic
     }
   }
   ```
   
   (aka avoid the call back to `visit`)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] xudong963 commented on pull request #1437: Fix: stack overflow

Posted by GitBox <gi...@apache.org>.
xudong963 commented on pull request #1437:
URL: https://github.com/apache/arrow-datafusion/pull/1437#issuecomment-991693965


   I think there are some places that need to be wrapped by `maybe_grow`, such as `match expr` in the physical plan, but I want to support them in the next ticket. In this ticket, the skeleton is laid.
   
   PTAL @alamb @houqp @Dandandan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] xudong963 commented on pull request #1437: Fix: stack overflow

Posted by GitBox <gi...@apache.org>.
xudong963 commented on pull request #1437:
URL: https://github.com/apache/arrow-datafusion/pull/1437#issuecomment-992091985


   > This is an acceptable short term workaround, but I think it would be more efficient and more elegant if we rewrite these recursive procedures into iterative procedures.
   
   Do you mean to use push-based model?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] xudong963 commented on pull request #1437: Process stack overflow panic elegantly

Posted by GitBox <gi...@apache.org>.
xudong963 commented on pull request #1437:
URL: https://github.com/apache/arrow-datafusion/pull/1437#issuecomment-1041009259


   > Marking as stale pr -- will close it in a week or two unless we plan to keep working on it
   
   I'll directly close the ticket because I plan to fix it by #1444


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb edited a comment on pull request #1437: Fix: stack overflow

Posted by GitBox <gi...@apache.org>.
alamb edited a comment on pull request #1437:
URL: https://github.com/apache/arrow-datafusion/pull/1437#issuecomment-992770032


   > > This is an acceptable short term workaround, but I think it would be more efficient and more elegant if we rewrite these recursive procedures into iterative procedures.
   
   > Do you mean to use push-based model?
   
   
   @xudong963 , I think what @houqp  is suggesting is to rewrite code that is recursive to not be recursive.
   
   The pattern for Datafusion probably looks like taking code like
   
   ```rust
   fn visit(expr: &expr)  {
     for child in expr.children() {
       visit(child)
     }
     // do actual expr logic
   }
   ```
   
   And changing it so the state is tracked with a structure on the heap rather than a stack. I think [`VecDeque`](https://doc.rust-lang.org/stable/std/collections/struct.VecDeque.html) is a good one for rust:
   
   ```rust
   fn visit(expr: &expr)  {
     let mut worklist = VecDequeue::new();
     worklist.push_back(expr);
     while !worklist.is_empty() {
       let parent = worklist.pop_front();
       for child in parent.children() {
         worklist.push_back(child)
       }
       // do actual expr logic on parent
     }
   }
   ```
   
   (aka avoid the call back to `visit`)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] xudong963 commented on a change in pull request #1437: Fix: stack overflow

Posted by GitBox <gi...@apache.org>.
xudong963 commented on a change in pull request #1437:
URL: https://github.com/apache/arrow-datafusion/pull/1437#discussion_r767168233



##########
File path: datafusion/src/sql/planner.rs
##########
@@ -113,7 +121,10 @@ fn plan_indexed(expr: Expr, mut keys: Vec<Value>) -> Expr {
 impl<'a, S: ContextProvider> SqlToRel<'a, S> {
     /// Create a new query planner
     pub fn new(schema_provider: &'a S) -> Self {
-        SqlToRel { schema_provider }
+        SqlToRel {
+            schema_provider,
+            project_recursion: ProtectRecursion::new_with_limit(2048),

Review comment:
       This constant is what I feel, I want to know your thoughts.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] xudong963 commented on a change in pull request #1437: Fix: stack overflow

Posted by GitBox <gi...@apache.org>.
xudong963 commented on a change in pull request #1437:
URL: https://github.com/apache/arrow-datafusion/pull/1437#discussion_r767168233



##########
File path: datafusion/src/sql/planner.rs
##########
@@ -113,7 +121,10 @@ fn plan_indexed(expr: Expr, mut keys: Vec<Value>) -> Expr {
 impl<'a, S: ContextProvider> SqlToRel<'a, S> {
     /// Create a new query planner
     pub fn new(schema_provider: &'a S) -> Self {
-        SqlToRel { schema_provider }
+        SqlToRel {
+            schema_provider,
+            project_recursion: ProtectRecursion::new_with_limit(2048),

Review comment:
       ditto




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on pull request #1437: Process stack overflow panic elegantly

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #1437:
URL: https://github.com/apache/arrow-datafusion/pull/1437#issuecomment-1040685163


   Marking as stale pr -- will close it in a week or two unless we plan to keep working on it


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] xudong963 closed pull request #1437: Process stack overflow panic elegantly

Posted by GitBox <gi...@apache.org>.
xudong963 closed pull request #1437:
URL: https://github.com/apache/arrow-datafusion/pull/1437


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Jimexist commented on pull request #1437: Fix: stack overflow

Posted by GitBox <gi...@apache.org>.
Jimexist commented on pull request #1437:
URL: https://github.com/apache/arrow-datafusion/pull/1437#issuecomment-991711332


   I wonder if it makes sense to just limit number of or statements instead?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] xudong963 commented on a change in pull request #1437: Fix: stack overflow

Posted by GitBox <gi...@apache.org>.
xudong963 commented on a change in pull request #1437:
URL: https://github.com/apache/arrow-datafusion/pull/1437#discussion_r767168217



##########
File path: datafusion/src/sql/planner.rs
##########
@@ -200,7 +211,9 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> {
 
     /// Generate a logic plan from an SQL query
     pub fn query_to_plan(&self, query: &Query) -> Result<LogicalPlan> {
-        self.query_to_plan_with_alias(query, None, &mut HashMap::new())
+        self.safe_recursion(|| {

Review comment:
       Determine if it is safe to call this function without stack overflow, if it's not safe, return `Err(Recursion exceeded limit)`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] xudong963 commented on pull request #1437: Fix: stack overflow

Posted by GitBox <gi...@apache.org>.
xudong963 commented on pull request #1437:
URL: https://github.com/apache/arrow-datafusion/pull/1437#issuecomment-993467433


   > > > This is an acceptable short term workaround, but I think it would be more efficient and more elegant if we rewrite these recursive procedures into iterative procedures.
   > 
   > > Do you mean to use push-based model?
   > 
   > @xudong963 , I think what @houqp is suggesting is to rewrite code that is recursive to not be recursive.
   > 
   > The pattern for Datafusion probably looks like taking code like
   > 
   > ```rust
   > fn visit(expr: &expr)  {
   >   for child in expr.children() {
   >     visit(child)
   >   }
   >   // do actual expr logic
   > }
   > ```
   > 
   > And changing it so the state is tracked with a structure on the heap rather than a stack. I think [`VecDeque`](https://doc.rust-lang.org/stable/std/collections/struct.VecDeque.html) is a good one for rust:
   > 
   > ```rust
   > fn visit(expr: &expr)  {
   >   let mut worklist = VecDequeue::new();
   >   worklist.push_back(expr);
   >   while !worklist.is_empty() {
   >     let parent = worklist.pop_front();
   >     for child in parent.children() {
   >       worklist.push_back(child)
   >     }
   >     // do actual expr logic on parent
   >   }
   > }
   > ```
   > 
   > (aka avoid the call back to `visit`)
   
   Open an issue https://github.com/apache/arrow-datafusion/issues/1444 to track of this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org