You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "jychen7 (via GitHub)" <gi...@apache.org> on 2023/04/11 22:45:35 UTC
[GitHub] [arrow-datafusion] jychen7 opened a new issue, #5969: Clickbench q32 not working
jychen7 opened a new issue, #5969:
URL: https://github.com/apache/arrow-datafusion/issues/5969
### Describe the bug
q32 does not work, it got killed.
```
SELECT "WatchID", "ClientIP", COUNT(*) AS c, SUM("IsRefresh"), AVG("ResolutionWidth") FROM hits GROUP BY "WatchID", "ClientIP" ORDER BY c DESC LIMIT 10
```
### To Reproduce
https://github.com/ClickHouse/ClickBench/tree/main/datafusion
### Expected behavior
q32 should work
ps: DuckDB takes 5.28s
### Additional context
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] comphead commented on issue #5969: Clickbench q32 not working
Posted by "comphead (via GitHub)" <gi...@apache.org>.
comphead commented on issue #5969:
URL: https://github.com/apache/arrow-datafusion/issues/5969#issuecomment-1504602856
It takes some CPU time on `datafusion::physical_plan::aggregates::row_hash::slice_and_maybe_filter`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] comphead commented on issue #5969: Clickbench q32 not working in 32GB RAM
Posted by "comphead (via GitHub)" <gi...@apache.org>.
comphead commented on issue #5969:
URL: https://github.com/apache/arrow-datafusion/issues/5969#issuecomment-1513959517
with optimized `datafusion::physical_plan::aggregates::row_hash::slice_and_maybe_filter` there is ~19% gain.
was 68sec, now 55 sec
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] alamb commented on issue #5969: Clickbench q32 not working in 32GB RAM
Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #5969:
URL: https://github.com/apache/arrow-datafusion/issues/5969#issuecomment-1604520039
> Also, do you guys think https://github.com/apache/arrow-datafusion/pull/6657 help with this query?
I do think so
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] jychen7 commented on issue #5969: Clickbench q32 not working in 32GB RAM
Posted by "jychen7 (via GitHub)" <gi...@apache.org>.
jychen7 commented on issue #5969:
URL: https://github.com/apache/arrow-datafusion/issues/5969#issuecomment-1507915671
> reg to flamegraph the datafusion::physical_plan::aggregates::row_hash::slice_and_maybe_filter is expensive.
it is expensive because excessive vector allocations
that makes sense. From my debugging above, I think `SortExec` is not a slow part. Looks like its children's plan like `RepartitionExec` or `AggregateExec` is slow
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] jychen7 commented on issue #5969: Clickbench q32 not working in 32GB RAM
Posted by "jychen7 (via GitHub)" <gi...@apache.org>.
jychen7 commented on issue #5969:
URL: https://github.com/apache/arrow-datafusion/issues/5969#issuecomment-1506036788
> I haven't try in larger instance, e.g. c6a.8xlarge (64GB ram) or up
confirm it works in `c6a.8xlarge` (32vCPU, 64GB RAM)
```
Query took 19.378 seconds
```
from `htop` observation, it uses full 32 core and up to 34 GB RAM
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] jychen7 commented on issue #5969: Clickbench q32 not working in 32GB RAM
Posted by "jychen7 (via GitHub)" <gi...@apache.org>.
jychen7 commented on issue #5969:
URL: https://github.com/apache/arrow-datafusion/issues/5969#issuecomment-1506166745
Before investigate deeper on reducing memory usage, we may improve `datafusion-cli` (or new option in datafusion core) to set memory pool with machine memory limit.
(ps: It already uses number of CPU cores for parallel reading/sorting, but not memory)
https://github.com/apache/arrow-datafusion/blob/4c7833ebfdb2d022830bb97862e0ce36b0b3d6b1/datafusion/execution/src/runtime_env.rs#L152-L161
---
But unfortunately, my local run returns following error
### before
unlimited, Query took 115.120 second
https://github.com/apache/arrow-datafusion/blob/4c7833ebfdb2d022830bb97862e0ce36b0b3d6b1/datafusion-cli/src/main.rs#L149-L152
### after
limit 16*10^9 bytes (~15GB), 80% fraction
`Resources exhausted: Failed to allocate additional 59290364 bytes for GroupedHashAggregateStream[1] with 149089206 bytes already allocated - maximum available is 16563438`
```
fn create_runtime_env() -> Result<RuntimeEnv> {
let rn_config = RuntimeConfig::new().with_memory_limit(
16000000000,
0.8
);
RuntimeEnv::new(rn_config)
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] jychen7 commented on issue #5969: Clickbench q32 not working in 32GB RAM
Posted by "jychen7 (via GitHub)" <gi...@apache.org>.
jychen7 commented on issue #5969:
URL: https://github.com/apache/arrow-datafusion/issues/5969#issuecomment-1506065309
I thought `sort` knows the machine's memory limit and spills when it is insufficient. Not sure why OOM yet
https://github.com/apache/arrow-datafusion/blob/4c7833ebfdb2d022830bb97862e0ce36b0b3d6b1/datafusion/core/src/physical_plan/sorts/sort.rs#L63-L72
for reference, the plan
```
explain SELECT "WatchID", "ClientIP", COUNT(*) AS c, SUM("IsRefresh"), AVG("ResolutionWidth") FROM hits GROUP BY "WatchID", "ClientIP" ORDER BY c DESC LIMIT 10;
+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type | plan
|
+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| logical_plan | Limit: skip=0, fetch=10
|
| | Sort: c DESC NULLS FIRST, fetch=10
|
| | Projection: hits.WatchID, hits.ClientIP, COUNT(UInt8(1)) AS c, SUM(hits.IsRefresh), AVG(hits.ResolutionWidth)
|
| | Aggregate: groupBy=[[hits.WatchID, hits.ClientIP]], aggr=[[COUNT(UInt8(1)), SUM(hits.IsRefresh), AVG(hits.ResolutionWidth)]]
|
| | TableScan: hits projection=[WatchID, ClientIP, IsRefresh, ResolutionWidth]
|
| physical_plan | GlobalLimitExec: skip=0, fetch=10
|
| | SortPreservingMergeExec: [c@2 DESC]
|
| | SortExec: fetch=10, expr=[c@2 DESC]
|
| | ProjectionExec: expr=[WatchID@0 as WatchID, ClientIP@1 as ClientIP, COUNT(UInt8(1))@2 as c, SUM(hits.IsRefresh)@3 as SUM(hits.IsRefresh), AVG(hits.ResolutionWidth)@4 as AVG(hits.ResolutionWidth)]
|
| | AggregateExec: mode=FinalPartitioned, gby=[WatchID@0 as WatchID, ClientIP@1 as ClientIP], aggr=[COUNT(UInt8(1)), SUM(hits.IsRefresh), AVG(hits.ResolutionWidth)]
|
| | CoalesceBatchesExec: target_batch_size=8192
|
| | RepartitionExec: partitioning=Hash([Column { name: "WatchID", index: 0 }, Column { name: "ClientIP", index: 1 }], 12), input_partitions=12
|
| | AggregateExec: mode=Partial, gby=[WatchID@0 as WatchID, ClientIP@1 as ClientIP], aggr=[COUNT(UInt8(1)), SUM(hits.IsRefresh), AVG(hits.ResolutionWidth)]
|
| | ParquetExec: limit=None, partitions={12 groups: [[hits.parquet:0..1231664704], [hits.parquet:1231664704..2463329408], [hits.parquet:2463329408..3694994112], [hits.parquet:3694994112..4926658816], [hits.parquet:4926658816..6158323520], [hits.parquet:6158323520..7389988224], [hits.parquet:7389988224..8621652928], [hits.parquet:8621652928..9853317632], [hits.parquet:9853317632..11084982336], [hits.parquet:11084982336..12316647040], [hits.parquet:12316647040..13548311744], [hits.parquet:13548311744..14779976446]]}, projection=[WatchID, ClientIP, IsRefresh, ResolutionWidth] |
| |
|
+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] comphead commented on issue #5969: Clickbench q32 not working
Posted by "comphead (via GitHub)" <gi...@apache.org>.
comphead commented on issue #5969:
URL: https://github.com/apache/arrow-datafusion/issues/5969#issuecomment-1504585917
![df-distinct-graph](https://user-images.githubusercontent.com/10247224/231349286-063c4c5f-61da-4cb1-8894-8504cc9b846f.svg)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] jychen7 commented on issue #5969: Clickbench q32 not working in 32GB RAM
Posted by "jychen7 (via GitHub)" <gi...@apache.org>.
jychen7 commented on issue #5969:
URL: https://github.com/apache/arrow-datafusion/issues/5969#issuecomment-1507903769
when running with `RUST_LOG=debug datafusion-cli`, I find out it is slow during `do_sort`, but not sure which part is slow: `insert_batch` or final `sort`.
So I add debug log in https://github.com/apache/arrow-datafusion/compare/main...jychen7:arrow-datafusion:debug-clickbench-q32?expand=1. Surprisingly, it takes 13 mins between `Start do_sort` and `Start insert_batch`
```
[2023-04-14T02:26:51Z DEBUG datafusion::physical_plan::sorts::sort] Start do_sort for partition 11 of context session_id 039882c2-6b74-4d6a-9739-a50c4fbe1ac7 and task_id None
[2023-04-14T02:39:01Z DEBUG datafusion::physical_plan::sorts::sort] Start insert_batch for partition 11 with input size 295992
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] comphead commented on issue #5969: Clickbench q32 not working
Posted by "comphead (via GitHub)" <gi...@apache.org>.
comphead commented on issue #5969:
URL: https://github.com/apache/arrow-datafusion/issues/5969#issuecomment-1504551303
Looking into this
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] jychen7 commented on issue #5969: Clickbench q32 not working
Posted by "jychen7 (via GitHub)" <gi...@apache.org>.
jychen7 commented on issue #5969:
URL: https://github.com/apache/arrow-datafusion/issues/5969#issuecomment-1504589481
I am wondering if parallel TopK operator could reduce the memory usage for q32 to run
https://github.com/apache/arrow-datafusion/issues/3516#issuecomment-1250258614
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] ozankabak commented on issue #5969: Clickbench q32 not working in 32GB RAM
Posted by "ozankabak (via GitHub)" <gi...@apache.org>.
ozankabak commented on issue #5969:
URL: https://github.com/apache/arrow-datafusion/issues/5969#issuecomment-1603361175
I wonder if this query still fails with OOM after the recent improvements. Also, do you guys think #6657 help with this query?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] alamb commented on issue #5969: Clickbench q32 not working in 32GB RAM
Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #5969:
URL: https://github.com/apache/arrow-datafusion/issues/5969#issuecomment-1638066217
After https://github.com/apache/arrow-datafusion/pull/6904 is merged, I have been able to run all click bench queries successfully with 32GB of ram where in datafusion 27.0.0 q32 caused an OOM for me. This I think this is done
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] alamb closed issue #5969: Clickbench q32 not working in 32GB RAM
Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb closed issue #5969: Clickbench q32 not working in 32GB RAM
URL: https://github.com/apache/arrow-datafusion/issues/5969
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] comphead commented on issue #5969: Clickbench q32 not working in 32GB RAM
Posted by "comphead (via GitHub)" <gi...@apache.org>.
comphead commented on issue #5969:
URL: https://github.com/apache/arrow-datafusion/issues/5969#issuecomment-1507912263
reg to flamegraph the `datafusion::physical_plan::aggregates::row_hash::slice_and_maybe_filter` is expensive.
its expensive because excessive vector allocations. I will try to rewrite this method tomorrow and share results
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] comphead commented on issue #5969: Clickbench q32 not working in 32GB RAM
Posted by "comphead (via GitHub)" <gi...@apache.org>.
comphead commented on issue #5969:
URL: https://github.com/apache/arrow-datafusion/issues/5969#issuecomment-1515188902
Filed https://github.com/apache/arrow-datafusion/issues/6064
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org