You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/11/23 15:08:41 UTC

[GitHub] [arrow-datafusion] alamb opened a new pull request, #4340: Minor: use upstream RowSelection code in arrow

alamb opened a new pull request, #4340:
URL: https://github.com/apache/arrow-datafusion/pull/4340

   Draft as it is waiting on
   - [] https://github.com/apache/arrow-rs/pull/3173
   
   # Which issue does this PR close?
   
   N/A
   
   # Rationale for this change
   
   This code was upstreamed in https://github.com/apache/arrow-rs/issues/3003
   
   # What changes are included in this PR?
   
   Remove Datafusion copy of `intersect_row_selection` which has been upstreamed
   
   # Are these changes tested?
   Yes by existing tests
   
   # Are there any user-facing changes?
   Np


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #4340: Minor: use upstream RowSelection code from arrow `intersect_row_selection`

Posted by GitBox <gi...@apache.org>.
alamb commented on code in PR #4340:
URL: https://github.com/apache/arrow-datafusion/pull/4340#discussion_r1036373826


##########
datafusion/core/src/physical_plan/file_format/parquet/page_filter.rs:
##########
@@ -197,93 +195,14 @@ pub(crate) fn build_page_filter(
 ///
 /// The final selection is the intersection of these  `RowSelector`s:
 /// * `final_selection:[ Skip(0~199), Read(200~249), Skip(250~299)]`
-fn combine_multi_col_selection(
-    row_selections: VecDeque<Vec<RowSelector>>,
-) -> Vec<RowSelector> {
+fn combine_multi_col_selection(row_selections: Vec<Vec<RowSelector>>) -> RowSelection {
     row_selections
         .into_iter()
-        .reduce(intersect_row_selection)
+        .map(RowSelection::from)
+        .reduce(|s1, s2| s1.intersection(&s2))
         .unwrap()
 }
 
-/// combine two `RowSelection` return the intersection
-/// For example:
-/// self:     NNYYYYNNY
-/// other:    NYNNNNNNY
-///
-/// returned: NNNNNNNNY
-/// set `need_combine` true will combine result: Select(2) + Select(1) + Skip(2) -> Select(3) + Skip(2)
-///
-/// Move to arrow-rs: https://github.com/apache/arrow-rs/issues/3003
-pub(crate) fn intersect_row_selection(

Review Comment:
   upstreamed in https://github.com/apache/arrow-rs/pull/3173



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #4340: Minor: use upstream RowSelection code in arrow

Posted by GitBox <gi...@apache.org>.
alamb commented on code in PR #4340:
URL: https://github.com/apache/arrow-datafusion/pull/4340#discussion_r1030563480


##########
Cargo.toml:
##########
@@ -50,3 +50,11 @@ opt-level = 3
 overflow-checks = false
 panic = 'unwind'
 rpath = false
+
+
+[patch.crates-io]
+arrow = { git = "https://github.com/alamb/arrow-rs.git", rev="7fd330b" }

Review Comment:
   Waiting on arrow 28.0.0 release



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] tustvold merged pull request #4340: Minor: use upstream RowSelection code from arrow `intersect_row_selection`

Posted by GitBox <gi...@apache.org>.
tustvold merged PR #4340:
URL: https://github.com/apache/arrow-datafusion/pull/4340


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] ursabot commented on pull request #4340: Minor: use upstream RowSelection code from arrow `intersect_row_selection`

Posted by GitBox <gi...@apache.org>.
ursabot commented on PR #4340:
URL: https://github.com/apache/arrow-datafusion/pull/4340#issuecomment-1336130832

   Benchmark runs are scheduled for baseline = 27dc295039454e6110f485b43e9db37826706aaa and contender = 8db99d24661e981002e9f6a5532358689200ae5f. 8db99d24661e981002e9f6a5532358689200ae5f is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/b30629fd5e43416881eafa10e85b8896...25d3e97e67364da6b7f04d05aea5b7d4/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] [test-mac-arm](https://conbench.ursa.dev/compare/runs/a6a876bdc4514557a86579b0f51650ff...f148f80b4e114c84b95568963bb968dc/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/fca997d0dd7d443484caa6f5a31d5977...adeabedc03654456a7a5c8d3f9c2f891/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/4bfe45cd97684951b878758c4c938f48...5b7bf2c8d7df4880adea00c587760aaf/)
   Buildkite builds:
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #4340: Minor: use upstream RowSelection code from arrow `intersect_row_selection`

Posted by GitBox <gi...@apache.org>.
alamb commented on code in PR #4340:
URL: https://github.com/apache/arrow-datafusion/pull/4340#discussion_r1036372866


##########
datafusion/core/src/physical_plan/file_format/parquet/page_filter.rs:
##########
@@ -123,7 +122,7 @@ pub(crate) fn build_page_filter(
     if let (Some(file_offset_indexes), Some(file_page_indexes)) =
         (file_offset_indexes, file_page_indexes)
     {
-        let mut row_selections = VecDeque::with_capacity(page_index_predicates.len());
+        let mut row_selections = Vec::with_capacity(page_index_predicates.len());

Review Comment:
   since `row_selections` are always appended to (never pop_front, for example) I think Vec is just fine and using VecDequeue just makes the code more confusing



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org