You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/01/16 12:14:57 UTC
[GitHub] [arrow] Dandandan opened a new pull request #9214: [Arrow][DataFusion] Mem table repartition [WIP]
Dandandan opened a new pull request #9214:
URL: https://github.com/apache/arrow/pull/9214
I think the feature to be able to repartition an in memory table is useful, as the repartitioning only needs to be applied once, and it's also quite cheap. This can be very useful for in-memory analytics.
The speed up from repartitioning is very big (mainly on aggregates), on my (8-core machine): 6-7x on query 1 and 12 versus a single partition, a bit less of a difference on query 5 when using 16 partitions and has very high cpu utilization.
@jorgecarleitao maybe this is of interest to you, as you mentioned you are looking into multi-threading. I think this would be a "high level" way to get more parallelism. I think in some optimizer rules and/or dynamically we can do repartitions, similar to what's described here https://issues.apache.org/jira/browse/ARROW-9464
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] codecov-io commented on pull request #9214: ARROW-11268: [Arrow][DataFusion] Mem table repartition
Posted by GitBox <gi...@apache.org>.
codecov-io commented on pull request #9214:
URL: https://github.com/apache/arrow/pull/9214#issuecomment-761556616
# [Codecov](https://codecov.io/gh/apache/arrow/pull/9214?src=pr&el=h1) Report
> Merging [#9214](https://codecov.io/gh/apache/arrow/pull/9214?src=pr&el=desc) (8ba5828) into [master](https://codecov.io/gh/apache/arrow/commit/1393188e1aa1b3d59993ce7d4ade7f7ac8570959?el=desc) (1393188) will **decrease** coverage by `0.01%`.
> The diff coverage is `0.00%`.
[![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9214/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9214?src=pr&el=tree)
```diff
@@ Coverage Diff @@
## master #9214 +/- ##
==========================================
- Coverage 81.61% 81.59% -0.02%
==========================================
Files 215 215
Lines 51867 51877 +10
==========================================
Hits 42329 42329
- Misses 9538 9548 +10
```
| [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9214?src=pr&el=tree) | Coverage Δ | |
|---|---|---|
| [rust/datafusion/src/datasource/memory.rs](https://codecov.io/gh/apache/arrow/pull/9214/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9kYXRhc291cmNlL21lbW9yeS5ycw==) | `80.98% <0.00%> (-5.30%)` | :arrow_down: |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9214?src=pr&el=continue).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9214?src=pr&el=footer). Last update [eaa7b7a...afd6528](https://codecov.io/gh/apache/arrow/pull/9214?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] alamb commented on a change in pull request #9214: ARROW-11268: [Rust][DataFusion] MemTable::load output partition support
Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #9214:
URL: https://github.com/apache/arrow/pull/9214#discussion_r559487823
##########
File path: rust/benchmarks/src/bin/tpch.rs
##########
@@ -66,6 +66,10 @@ struct BenchmarkOpt {
/// Load the data into a MemTable before executing the query
#[structopt(short = "m", long = "mem-table")]
mem_table: bool,
+
+ /// Number of partitions to use when using MemTable
Review comment:
```suggestion
/// Number of partitions to create when using MemTable as input
```
##########
File path: rust/datafusion/src/datasource/memory.rs
##########
@@ -126,6 +134,28 @@ impl MemTable {
data.push(result);
}
+ let exec = MemoryExec::try_new(&data, schema.clone(), None)?;
+
+ if let Some(num_partitions) = output_partitions {
Review comment:
👍
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #9214: ARROW-11268: [Arrow][DataFusion] Mem table repartition [WIP]
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #9214:
URL: https://github.com/apache/arrow/pull/9214#issuecomment-761554645
https://issues.apache.org/jira/browse/ARROW-11268
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] codecov-io edited a comment on pull request #9214: ARROW-11268: [Rust][DataFusion] MemTable output partition support
Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #9214:
URL: https://github.com/apache/arrow/pull/9214#issuecomment-761556616
# [Codecov](https://codecov.io/gh/apache/arrow/pull/9214?src=pr&el=h1) Report
> Merging [#9214](https://codecov.io/gh/apache/arrow/pull/9214?src=pr&el=desc) (9750ead) into [master](https://codecov.io/gh/apache/arrow/commit/1393188e1aa1b3d59993ce7d4ade7f7ac8570959?el=desc) (1393188) will **decrease** coverage by `0.02%`.
> The diff coverage is `0.00%`.
[![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9214/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9214?src=pr&el=tree)
```diff
@@ Coverage Diff @@
## master #9214 +/- ##
==========================================
- Coverage 81.61% 81.58% -0.03%
==========================================
Files 215 215
Lines 51867 51882 +15
==========================================
Hits 42329 42329
- Misses 9538 9553 +15
```
| [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9214?src=pr&el=tree) | Coverage Δ | |
|---|---|---|
| [rust/benchmarks/src/bin/tpch.rs](https://codecov.io/gh/apache/arrow/pull/9214/diff?src=pr&el=tree#diff-cnVzdC9iZW5jaG1hcmtzL3NyYy9iaW4vdHBjaC5ycw==) | `12.09% <0.00%> (-0.10%)` | :arrow_down: |
| [rust/datafusion/src/datasource/memory.rs](https://codecov.io/gh/apache/arrow/pull/9214/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9kYXRhc291cmNlL21lbW9yeS5ycw==) | `80.00% <0.00%> (-6.28%)` | :arrow_down: |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9214?src=pr&el=continue).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9214?src=pr&el=footer). Last update [eaa7b7a...9750ead](https://codecov.io/gh/apache/arrow/pull/9214?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #9214: [Arrow][DataFusion] Mem table repartition [WIP]
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #9214:
URL: https://github.com/apache/arrow/pull/9214#issuecomment-761554183
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
Thanks for opening a pull request!
Could you open an issue for this pull request on JIRA?
https://issues.apache.org/jira/browse/ARROW
Then could you also rename pull request title in the following format?
ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}
See also:
* [Other pull requests](https://github.com/apache/arrow/pulls/)
* [Contribution Guidelines - How to contribute patches](https://arrow.apache.org/docs/developers/contributing.html#how-to-contribute-patches)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] Dandandan commented on pull request #9214: ARROW-11268: [Rust][DataFusion] MemTable::load output partition support
Posted by GitBox <gi...@apache.org>.
Dandandan commented on pull request #9214:
URL: https://github.com/apache/arrow/pull/9214#issuecomment-761821904
This would also help us in the db-benchmark https://github.com/h2oai/db-benchmark/pull/182
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] jorgecarleitao closed pull request #9214: ARROW-11268: [Rust][DataFusion] MemTable::load output partition support
Posted by GitBox <gi...@apache.org>.
jorgecarleitao closed pull request #9214:
URL: https://github.com/apache/arrow/pull/9214
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] codecov-io edited a comment on pull request #9214: ARROW-11268: [Rust][DataFusion] MemTable output partition support
Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #9214:
URL: https://github.com/apache/arrow/pull/9214#issuecomment-761556616
# [Codecov](https://codecov.io/gh/apache/arrow/pull/9214?src=pr&el=h1) Report
> Merging [#9214](https://codecov.io/gh/apache/arrow/pull/9214?src=pr&el=desc) (34bd32f) into [master](https://codecov.io/gh/apache/arrow/commit/1393188e1aa1b3d59993ce7d4ade7f7ac8570959?el=desc) (1393188) will **decrease** coverage by `0.02%`.
> The diff coverage is `0.00%`.
[![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9214/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9214?src=pr&el=tree)
```diff
@@ Coverage Diff @@
## master #9214 +/- ##
==========================================
- Coverage 81.61% 81.58% -0.03%
==========================================
Files 215 215
Lines 51867 51882 +15
==========================================
Hits 42329 42329
- Misses 9538 9553 +15
```
| [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9214?src=pr&el=tree) | Coverage Δ | |
|---|---|---|
| [rust/benchmarks/src/bin/tpch.rs](https://codecov.io/gh/apache/arrow/pull/9214/diff?src=pr&el=tree#diff-cnVzdC9iZW5jaG1hcmtzL3NyYy9iaW4vdHBjaC5ycw==) | `12.09% <0.00%> (-0.10%)` | :arrow_down: |
| [rust/datafusion/src/datasource/memory.rs](https://codecov.io/gh/apache/arrow/pull/9214/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9kYXRhc291cmNlL21lbW9yeS5ycw==) | `80.00% <0.00%> (-6.28%)` | :arrow_down: |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9214?src=pr&el=continue).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9214?src=pr&el=footer). Last update [eaa7b7a...34bd32f](https://codecov.io/gh/apache/arrow/pull/9214?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org