You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/11/23 15:08:41 UTC
[GitHub] [arrow-datafusion] alamb opened a new pull request, #4340: Minor: use upstream RowSelection code in arrow
alamb opened a new pull request, #4340:
URL: https://github.com/apache/arrow-datafusion/pull/4340
Draft as it is waiting on
- [] https://github.com/apache/arrow-rs/pull/3173
# Which issue does this PR close?
N/A
# Rationale for this change
This code was upstreamed in https://github.com/apache/arrow-rs/issues/3003
# What changes are included in this PR?
Remove Datafusion copy of `intersect_row_selection` which has been upstreamed
# Are these changes tested?
Yes by existing tests
# Are there any user-facing changes?
Np
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #4340: Minor: use upstream RowSelection code from arrow `intersect_row_selection`
Posted by GitBox <gi...@apache.org>.
alamb commented on code in PR #4340:
URL: https://github.com/apache/arrow-datafusion/pull/4340#discussion_r1036373826
##########
datafusion/core/src/physical_plan/file_format/parquet/page_filter.rs:
##########
@@ -197,93 +195,14 @@ pub(crate) fn build_page_filter(
///
/// The final selection is the intersection of these `RowSelector`s:
/// * `final_selection:[ Skip(0~199), Read(200~249), Skip(250~299)]`
-fn combine_multi_col_selection(
- row_selections: VecDeque<Vec<RowSelector>>,
-) -> Vec<RowSelector> {
+fn combine_multi_col_selection(row_selections: Vec<Vec<RowSelector>>) -> RowSelection {
row_selections
.into_iter()
- .reduce(intersect_row_selection)
+ .map(RowSelection::from)
+ .reduce(|s1, s2| s1.intersection(&s2))
.unwrap()
}
-/// combine two `RowSelection` return the intersection
-/// For example:
-/// self: NNYYYYNNY
-/// other: NYNNNNNNY
-///
-/// returned: NNNNNNNNY
-/// set `need_combine` true will combine result: Select(2) + Select(1) + Skip(2) -> Select(3) + Skip(2)
-///
-/// Move to arrow-rs: https://github.com/apache/arrow-rs/issues/3003
-pub(crate) fn intersect_row_selection(
Review Comment:
upstreamed in https://github.com/apache/arrow-rs/pull/3173
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #4340: Minor: use upstream RowSelection code in arrow
Posted by GitBox <gi...@apache.org>.
alamb commented on code in PR #4340:
URL: https://github.com/apache/arrow-datafusion/pull/4340#discussion_r1030563480
##########
Cargo.toml:
##########
@@ -50,3 +50,11 @@ opt-level = 3
overflow-checks = false
panic = 'unwind'
rpath = false
+
+
+[patch.crates-io]
+arrow = { git = "https://github.com/alamb/arrow-rs.git", rev="7fd330b" }
Review Comment:
Waiting on arrow 28.0.0 release
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] tustvold merged pull request #4340: Minor: use upstream RowSelection code from arrow `intersect_row_selection`
Posted by GitBox <gi...@apache.org>.
tustvold merged PR #4340:
URL: https://github.com/apache/arrow-datafusion/pull/4340
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] ursabot commented on pull request #4340: Minor: use upstream RowSelection code from arrow `intersect_row_selection`
Posted by GitBox <gi...@apache.org>.
ursabot commented on PR #4340:
URL: https://github.com/apache/arrow-datafusion/pull/4340#issuecomment-1336130832
Benchmark runs are scheduled for baseline = 27dc295039454e6110f485b43e9db37826706aaa and contender = 8db99d24661e981002e9f6a5532358689200ae5f. 8db99d24661e981002e9f6a5532358689200ae5f is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/b30629fd5e43416881eafa10e85b8896...25d3e97e67364da6b7f04d05aea5b7d4/)
[Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] [test-mac-arm](https://conbench.ursa.dev/compare/runs/a6a876bdc4514557a86579b0f51650ff...f148f80b4e114c84b95568963bb968dc/)
[Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/fca997d0dd7d443484caa6f5a31d5977...adeabedc03654456a7a5c8d3f9c2f891/)
[Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/4bfe45cd97684951b878758c4c938f48...5b7bf2c8d7df4880adea00c587760aaf/)
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #4340: Minor: use upstream RowSelection code from arrow `intersect_row_selection`
Posted by GitBox <gi...@apache.org>.
alamb commented on code in PR #4340:
URL: https://github.com/apache/arrow-datafusion/pull/4340#discussion_r1036372866
##########
datafusion/core/src/physical_plan/file_format/parquet/page_filter.rs:
##########
@@ -123,7 +122,7 @@ pub(crate) fn build_page_filter(
if let (Some(file_offset_indexes), Some(file_page_indexes)) =
(file_offset_indexes, file_page_indexes)
{
- let mut row_selections = VecDeque::with_capacity(page_index_predicates.len());
+ let mut row_selections = Vec::with_capacity(page_index_predicates.len());
Review Comment:
since `row_selections` are always appended to (never pop_front, for example) I think Vec is just fine and using VecDequeue just makes the code more confusing
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org