You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "alamb (via GitHub)" <gi...@apache.org> on 2023/03/07 18:11:27 UTC

[GitHub] [arrow-datafusion] alamb opened a new pull request, #5506: Avoid circular(ish) dependency parquet-test-utils on datafusion

alamb opened a new pull request, #5506:
URL: https://github.com/apache/arrow-datafusion/pull/5506

   # Which issue does this PR close?
   Closes https://github.com/apache/arrow-datafusion/issues/5453
   
   # Rationale for this change
   
   Improve the DataFusion development cycle time by `cargo test -p datafusion` having to compile datafusion lib, then the parquet-test-utils, and then the tests.
   
   "A little copying is better than a lot of dependency"
   
    https://www.youtube.com/watch?v=PAAkCSZUG1c
   
   # What changes are included in this PR?
   
   Copy a (subset) part of ParquetTestFile into the datafusion core test that uses them
   
   # Are these changes tested?
   
   Yes (existing tests)
   
   # Are there any user-facing changes?
   No


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5506: Avoid circular(ish) dependency parquet-test-utils on datafusion

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #5506:
URL: https://github.com/apache/arrow-datafusion/pull/5506#discussion_r1128341652


##########
datafusion/core/tests/parquet/filter_pushdown.rs:
##########
@@ -600,3 +617,149 @@ fn get_value(metrics: &MetricsSet, metric_name: &str) -> usize {
         }
     }
 }
+
+///  a ParquetFile that has been created for testing.
+pub struct TestParquetFile {

Review Comment:
   this is a copied subset of https://github.com/apache/arrow-datafusion/blob/main/parquet-test-utils/src/lib.rs



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5506: Avoid circular(ish) dependency parquet-test-utils on datafusion

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #5506:
URL: https://github.com/apache/arrow-datafusion/pull/5506#discussion_r1128338927


##########
datafusion/core/Cargo.toml:
##########
@@ -114,7 +114,6 @@ ctor = "0.1.22"
 doc-comment = "0.3"
 env_logger = "0.10"
 half = "2.2.1"
-parquet-test-utils = { path = "../../parquet-test-utils" }

Review Comment:
   the point of this PR is to remove this line



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #5506: Avoid circular(ish) dependency parquet-test-utils on datafusion

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on PR #5506:
URL: https://github.com/apache/arrow-datafusion/pull/5506#issuecomment-1460990400

   > Another option might be to move this logic to datafusion::test_util so that it can then just be re-exported in parquet-test-utils?
   
   🤔  well since test utils depends on datafusion_core (not datafusion) I can't move the filter creation logic there. Perhaps I misunderstand your suggestion
   
   https://github.com/apache/arrow-datafusion/blob/e46924d80fddbed0faf35edc85b3bba6f050b344/test-utils/Cargo.toml#L27


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb closed pull request #5506: Avoid circular(ish) dependency parquet-test-utils on datafusion

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb closed pull request #5506: Avoid circular(ish) dependency parquet-test-utils on datafusion
URL: https://github.com/apache/arrow-datafusion/pull/5506


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] tustvold commented on pull request #5506: Avoid circular(ish) dependency parquet-test-utils on datafusion

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on PR #5506:
URL: https://github.com/apache/arrow-datafusion/pull/5506#issuecomment-1461037630

   I was referring to this - https://github.com/apache/arrow-datafusion/blob/main/datafusion/core/src/test_util.rs
   
   Which I think is fine as parquet-test-utils depends on datafusion - https://github.com/apache/arrow-datafusion/blob/main/parquet-test-utils/Cargo.toml#L26 and not datafusion_common


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #5506: Avoid circular(ish) dependency parquet-test-utils on datafusion

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on PR #5506:
URL: https://github.com/apache/arrow-datafusion/pull/5506#issuecomment-1462639391

   Here is an alternate approach: https://github.com/apache/arrow-datafusion/pull/5536


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #5506: Avoid circular(ish) dependency parquet-test-utils on datafusion

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on PR #5506:
URL: https://github.com/apache/arrow-datafusion/pull/5506#issuecomment-1462628991

   > I was referring to this - https://github.com/apache/arrow-datafusion/blob/main/datafusion/core/src/test_util.rs
   
   🤔 it seems like all that is in `parquet-test-util` is this code. If I just moved TestParquetFile to test_util.rs I could remove the entire crate. I'll make a new PR to do that.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org