You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "alamb (via GitHub)" <gi...@apache.org> on 2023/06/09 15:06:49 UTC

[GitHub] [arrow-rs] alamb opened a new pull request, #4392: Improve parquet `WriterProperites` and `ReaderProperties` docs

alamb opened a new pull request, #4392:
URL: https://github.com/apache/arrow-rs/pull/4392

   # Which issue does this PR close?
   
   N/A
   
   # Rationale for this change
    
   I was checking how the parquet writer was configured in IOx and it took me some digging to see what statistics were enabled by default. It would have been nice to avoid that digging
   
   # What changes are included in this PR?
   1. Improve the rustdocs by moving the property builder examples to the structure definitions they are used for
   2. Make the default values of the builder `pub` and document them
   
   # Are there any user-facing changes?
   
   
   <!--
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   
   <!---
   If there are any breaking changes to public APIs, please add the `breaking change` label.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alamb commented on a diff in pull request #4392: Improve parquet `WriterProperites` and `ReaderProperties` docs

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #4392:
URL: https://github.com/apache/arrow-rs/pull/4392#discussion_r1224433252


##########
parquet/src/file/properties.rs:
##########
@@ -15,55 +15,7 @@
 // specific language governing permissions and limitations
 // under the License.
 
-//! [`WriterProperties`]

Review Comment:
   I moved the examples to the `WriterProperties` and `ReaderProperties` so they would be easier to find
   
   I think the module level docs now look pretty clear:
   
   ![Screenshot 2023-06-09 at 11 07 30 AM](https://github.com/apache/arrow-rs/assets/490673/5851310e-b17d-48d7-958a-b0a67a3e4972)
   



##########
parquet/src/file/properties.rs:
##########
@@ -72,20 +24,30 @@ use crate::file::metadata::KeyValue;
 use crate::format::SortingColumn;
 use crate::schema::types::ColumnPath;
 
-const DEFAULT_PAGE_SIZE: usize = 1024 * 1024;
-const DEFAULT_WRITE_BATCH_SIZE: usize = 1024;
-const DEFAULT_WRITER_VERSION: WriterVersion = WriterVersion::PARQUET_1_0;
-const DEFAULT_COMPRESSION: Compression = Compression::UNCOMPRESSED;
-const DEFAULT_DICTIONARY_ENABLED: bool = true;
-const DEFAULT_DICTIONARY_PAGE_SIZE_LIMIT: usize = DEFAULT_PAGE_SIZE;
-const DEFAULT_STATISTICS_ENABLED: EnabledStatistics = EnabledStatistics::Page;
-const DEFAULT_MAX_STATISTICS_SIZE: usize = 4096;
-const DEFAULT_MAX_ROW_GROUP_SIZE: usize = 1024 * 1024;
-const DEFAULT_CREATED_BY: &str =
+/// Default value for [`WriterProperties::data_page_size_limit`]
+pub const DEFAULT_PAGE_SIZE: usize = 1024 * 1024;
+/// Default value for [`WriterProperties::write_batch_size`]
+pub const DEFAULT_WRITE_BATCH_SIZE: usize = 1024;
+/// Default value for [`WriterProperties::writer_version`]
+pub const DEFAULT_WRITER_VERSION: WriterVersion = WriterVersion::PARQUET_1_0;
+/// Default value for [`WriterProperties::compression`]
+pub const DEFAULT_COMPRESSION: Compression = Compression::UNCOMPRESSED;
+/// Default value for [`WriterProperties::dictionary_enabled`]
+pub const DEFAULT_DICTIONARY_ENABLED: bool = true;
+/// Default value for [`WriterProperties::dictionary_page_size_limit`]
+pub const DEFAULT_DICTIONARY_PAGE_SIZE_LIMIT: usize = DEFAULT_PAGE_SIZE;
+/// Default value for [`WriterProperties::statistics_enabled`]
+pub const DEFAULT_STATISTICS_ENABLED: EnabledStatistics = EnabledStatistics::Page;

Review Comment:
   This is what I was looking for, but since the value was not `pub` it wasn't in the rustdocs
   
   I could have included the value in the text of `WriterProperties::statistics_enabled` but thought that this way was less likely to get out of sync



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold merged pull request #4392: Improve parquet `WriterProperites` and `ReaderProperties` docs

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold merged PR #4392:
URL: https://github.com/apache/arrow-rs/pull/4392


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org