You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "DDtKey (via GitHub)" <gi...@apache.org> on 2023/03/14 13:56:40 UTC

[GitHub] [arrow-datafusion] DDtKey opened a new issue, #5162: Ignoring of memory-pool limits & OOM on large cartesian-product join

DDtKey opened a new issue, #5162:
URL: https://github.com/apache/arrow-datafusion/issues/5162

   **Describe the bug**
   There is an issue with possible OOM instead of `ResourcesExhausted`
   Probably related to usage of unbounded channels (I believe it should be avoided actually)
   
   **To Reproduce**
   
   MRE to achieve ignoring of memory-pool with large Cartesian product:
   
   CSV File example (250mb): [GDrive link](https://drive.google.com/file/d/1q_-p8BvvO2w-0IH7SyxvDIOYK44yQIKt/view?usp=share_link) - it's random file and column to join by has the same value for all records (so it's cartesian product)
   
   Memory pool limit: `FairSpillPool::new(4 * 1024 * 1024 * 1024)`
   
   SQL:
   `SELECT * FROM rnd rnd1 JOIN rnd rnd2 ON rnd1."s3_drive" = rnd2."s3_drive"`
   
   **Expected behavior**
   
   It should return`ResourcesExhausted` error with configured `MemoryPool`
   
   **Additional context**
   Add any other context about the problem here.
   
   A part of this was described in the discussion here: https://github.com/apache/arrow-datafusion/issues/5108#issuecomment-1412491794, but there was mentioned the regression. 
   This example isn't regression and it's reproducible for old versions


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] Dandandan commented on issue #5162: Ignoring of memory-pool limits & OOM on large cartesian-product join

Posted by "Dandandan (via GitHub)" <gi...@apache.org>.
Dandandan commented on issue #5162:
URL: https://github.com/apache/arrow-datafusion/issues/5162#issuecomment-1468151418

   Closed by https://github.com/apache/arrow-datafusion/pull/5564


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] korowa commented on issue #5162: Ignoring of memory-pool limits & OOM on large cartesian-product join

Posted by "korowa (via GitHub)" <gi...@apache.org>.
korowa commented on issue #5162:
URL: https://github.com/apache/arrow-datafusion/issues/5162#issuecomment-1448394909

   I guess I'll fix Hash Join using the same approach in a couple of days then 🙃 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb closed issue #5162: Ignoring of memory-pool limits & OOM on large cartesian-product join

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb closed issue #5162: Ignoring of memory-pool limits & OOM on large cartesian-product join
URL: https://github.com/apache/arrow-datafusion/issues/5162


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] DDtKey commented on issue #5162: Ignoring of memory-pool limits & OOM on large cartesian-product join

Posted by "DDtKey (via GitHub)" <gi...@apache.org>.
DDtKey commented on issue #5162:
URL: https://github.com/apache/arrow-datafusion/issues/5162#issuecomment-1448373456

   > It seems I was wrong about https://github.com/apache/arrow-datafusion/pull/5339 closing this issue, my bad 😞 . This query shouldn't produce CROSS JOIN, and memory limitation for hash join required to fix this issue.
   
   Yes, I think so, because this query produces `hash-join` in plan, but results in Cartesian product actually due to the same values in columns


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] DDtKey commented on issue #5162: Ignoring of memory-pool limits & OOM on large cartesian-product join

Posted by "DDtKey (via GitHub)" <gi...@apache.org>.
DDtKey commented on issue #5162:
URL: https://github.com/apache/arrow-datafusion/issues/5162#issuecomment-1413984721

   I've changed the description, because this case isn't regression like one that was mentioned in #5108 by me. And probably more related to #3941.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on issue #5162: Ignoring of memory-pool limits & OOM on large cartesian-product join

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #5162:
URL: https://github.com/apache/arrow-datafusion/issues/5162#issuecomment-1448520580

   Thank you @korowa 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] Dandandan commented on issue #5162: Ignoring of memory-pool limits & OOM on large cartesian-product join

Posted by "Dandandan (via GitHub)" <gi...@apache.org>.
Dandandan commented on issue #5162:
URL: https://github.com/apache/arrow-datafusion/issues/5162#issuecomment-1468150890

   This 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] korowa commented on issue #5162: Ignoring of memory-pool limits & OOM on large cartesian-product join

Posted by "korowa (via GitHub)" <gi...@apache.org>.
korowa commented on issue #5162:
URL: https://github.com/apache/arrow-datafusion/issues/5162#issuecomment-1448369495

   It seems I was wrong about #5339 closes this issue, my bad 😞 . This query shouldn't produce `CROSS JOIN`, and memory limitation for hash join required to fix this issue. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on issue #5162: Ignoring of memory-pool limits & OOM on large cartesian-product join

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #5162:
URL: https://github.com/apache/arrow-datafusion/issues/5162#issuecomment-1414106112

   > I've changed the description, because this case isn't regression like one that was mentioned in https://github.com/apache/arrow-datafusion/issues/5108 by me. And probably more related to/part of https://github.com/apache/arrow-datafusion/issues/3941.
   
   Thank you @DDtKey  -- I agree this is a feature gap (limiting memory usage by joins) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] Dandandan closed issue #5162: Ignoring of memory-pool limits & OOM on large cartesian-product join

Posted by "Dandandan (via GitHub)" <gi...@apache.org>.
Dandandan closed issue #5162: Ignoring of memory-pool limits & OOM on large cartesian-product join
URL: https://github.com/apache/arrow-datafusion/issues/5162


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org