You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/10/25 20:33:16 UTC

[GitHub] [arrow-datafusion] alamb opened a new pull request, #3962: Minor: Add some docstrings to `FileScanConfig` and `RuntimeEnv`

alamb opened a new pull request, #3962:
URL: https://github.com/apache/arrow-datafusion/pull/3962

   I was reading this code and figured I would encode some of the knowledge I gained into docstrings for the future.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #3962: Minor: Add some docstrings to `FileScanConfig` and `RuntimeEnv`

Posted by GitBox <gi...@apache.org>.
alamb commented on PR #3962:
URL: https://github.com/apache/arrow-datafusion/pull/3962#issuecomment-1294014844

   @metesynnada or @retikulum  perhaps you might be willing to review this PR as well


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] tustvold commented on a diff in pull request #3962: Minor: Add some docstrings to `FileScanConfig` and `RuntimeEnv`

Posted by GitBox <gi...@apache.org>.
tustvold commented on code in PR #3962:
URL: https://github.com/apache/arrow-datafusion/pull/3962#discussion_r1007394278


##########
datafusion/core/src/physical_plan/file_format/mod.rs:
##########
@@ -74,19 +74,30 @@ lazy_static! {
 /// any given file format.
 #[derive(Debug, Clone)]
 pub struct FileScanConfig {
-    /// Object store URL
+    /// Object store URL, used to get an [`ObjectStore`] instance from
+    /// [`RuntimeEnv::object_store`]
     pub object_store_url: ObjectStoreUrl,
-    /// Schema before projection. It contains the columns that are expected
-    /// to be in the files without the table partition columns.
+    /// Schema before `projection` is applied. It contains the all columns that may
+    /// appear in the files. It does not include table partition columns
+    /// that may be added.
     pub file_schema: SchemaRef,
     /// List of files to be processed, grouped into partitions
+    ///
+    /// Each file must have a schema of `file_schema` or a subset. If
+    /// a particular file has a subset, the missing columns are
+    /// padded with with NULLs.
+    ///
+    /// DataFusion may attempt to read each partition of files
+    /// concurrently, however files *within* a partition will be read
+    /// sequentially, one after the next.
     pub file_groups: Vec<Vec<PartitionedFile>>,
     /// Estimated overall statistics of the files, taking `filters` into account.
     pub statistics: Statistics,
     /// Columns on which to project the data. Indexes that are higher than the
     /// number of columns of `file_schema` refer to `table_partition_cols`.
     pub projection: Option<Vec<usize>>,
-    /// The minimum number of records required from this source plan
+    /// The maximum number of records to read from this plan. If None,
+    /// all records after filtering is returned.

Review Comment:
   ```suggestion
       /// all records after filtering are returned.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb merged pull request #3962: Minor: Add some docstrings to `FileScanConfig` and `RuntimeEnv`

Posted by GitBox <gi...@apache.org>.
alamb merged PR #3962:
URL: https://github.com/apache/arrow-datafusion/pull/3962


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] ursabot commented on pull request #3962: Minor: Add some docstrings to `FileScanConfig` and `RuntimeEnv`

Posted by GitBox <gi...@apache.org>.
ursabot commented on PR #3962:
URL: https://github.com/apache/arrow-datafusion/pull/3962#issuecomment-1295064002

   Benchmark runs are scheduled for baseline = 01cf17d0409b40505e5c6408d4cb45429d9bff05 and contender = ca42f4cdb9ce3cddd46447d289b3a89824b7e8d7. ca42f4cdb9ce3cddd46447d289b3a89824b7e8d7 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/eca5daacbcc84df8bc455a68da8679a5...4b105cbf74a34d499d7e2a036840ddc8/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] [test-mac-arm](https://conbench.ursa.dev/compare/runs/54daf7ba2cf4498985292eb5528fb1d4...208e4c7ed59140c188ca58124ac90572/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/3d479ca2d32941dfaeb78fcabd92c43d...2a1380c4cd4f4c2b844843d93a44806d/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/5fe9d61bbd7544b3bde88679fdbccfac...896ebaf393d94c3bac4d8d0a6c1e07d4/)
   Buildkite builds:
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org