You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/11/14 04:38:40 UTC

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3103: Expose `SortingColumn` in parquet files

tustvold commented on code in PR #3103:
URL: https://github.com/apache/arrow-rs/pull/3103#discussion_r1021065301


##########
parquet/src/file/metadata.rs:
##########
@@ -940,6 +974,48 @@ impl OffsetIndexBuilder {
     }
 }
 
+/// Metadata for sorting order for columns
+#[derive(Debug, Clone, PartialEq, Eq)]
+pub struct SortingColumnMetaData {

Review Comment:
   ```suggestion
   pub struct SortingColumn {
   ```
   
   I think it is better to use the same name as the parquet docs



##########
parquet/src/file/metadata.rs:
##########
@@ -940,6 +974,48 @@ impl OffsetIndexBuilder {
     }
 }
 
+/// Metadata for sorting order for columns
+#[derive(Debug, Clone, PartialEq, Eq)]
+pub struct SortingColumnMetaData {
+    /// The column index (in this row group) *
+    pub column_index: i32,
+    /// If true, indicates this column is sorted in descending order. *
+    pub descending: bool,
+    /// If true, nulls will come before non-null values, otherwise,
+    /// nulls go at the end.
+    pub nulls_first: bool,
+}
+
+impl SortingColumnMetaData {
+    pub fn new(column_index: i32, descending: bool, nulls_first: bool) -> Self {

Review Comment:
   Given all the fields are public, I don't think this is necessary?



##########
parquet/src/file/metadata.rs:
##########
@@ -940,6 +974,48 @@ impl OffsetIndexBuilder {
     }
 }
 
+/// Metadata for sorting order for columns
+#[derive(Debug, Clone, PartialEq, Eq)]

Review Comment:
   ```suggestion
   #[derive(Debug, Clone, Copy, PartialEq, Eq)]
   ```



##########
parquet/src/file/metadata.rs:
##########
@@ -303,9 +308,19 @@ impl RowGroupMetaData {
             let cc = ColumnChunkMetaData::from_thrift(d.clone(), c)?;
             columns.push(cc);
         }
+        let sorting_columns: Option<Vec<SortingColumnMetaData>> = match rg.sorting_columns
+        {
+            Some(sorting_columns) => {
+                let result: Vec<SortingColumnMetaData> =
+                    sorting_columns.iter().map(|f| f.into()).collect();

Review Comment:
   ```suggestion
                       sorting_columns.iter().map(Into::into).collect();
   ```



##########
parquet/src/file/metadata.rs:
##########
@@ -353,6 +377,15 @@ impl RowGroupMetaDataBuilder {
         self
     }
 
+    /// Sets the sorting order for columns
+    pub fn set_sorting_columns(
+        mut self,
+        value: Option<Vec<SortingColumnMetaData>>,
+    ) -> Self {
+        self.sorting_columns = value;

Review Comment:
   ```suggestion
           value: Vec<SortingColumnMetaData>,
       ) -> Self {
           self.sorting_columns = Some(value);
   ```
   This is consistent with set_page_offset



##########
parquet/src/file/metadata.rs:
##########
@@ -260,6 +261,10 @@ impl RowGroupMetaData {
         self.num_rows
     }
 
+    pub fn sorting_columns(&self) -> &Option<Vec<SortingColumnMetaData>> {
+        &self.sorting_columns
+    }

Review Comment:
   ```suggestion
       /// Returns the sort ordering of the rows in this RowGroup if any
       pub fn sorting_columns(&self) -> Option<&[SortingColumnMetaData]> {
           self.sorting_columns.as_deref()
       }
   ```



##########
parquet/src/file/metadata.rs:
##########
@@ -303,9 +308,19 @@ impl RowGroupMetaData {
             let cc = ColumnChunkMetaData::from_thrift(d.clone(), c)?;
             columns.push(cc);
         }
+        let sorting_columns: Option<Vec<SortingColumnMetaData>> = match rg.sorting_columns
+        {
+            Some(sorting_columns) => {
+                let result: Vec<SortingColumnMetaData> =

Review Comment:
   ```suggestion
                   let result =
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org