You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/08/18 15:42:19 UTC

[GitHub] [arrow] sunchao commented on a change in pull request #7917: ARROW-8423: [Rust] [Parquet] Serialize Arrow schema metadata

sunchao commented on a change in pull request #7917:
URL: https://github.com/apache/arrow/pull/7917#discussion_r472294646



##########
File path: rust/parquet/src/arrow/schema.rs
##########
@@ -83,12 +90,77 @@ where
         .map(|fields| Schema::new_with_metadata(fields, metadata))
 }
 
+/// Try to convert Arrow schema metadata into a schema
+fn get_arrow_schema_from_metadata(encoded_meta: &str) -> Option<Schema> {
+    let decoded = base64::decode(encoded_meta);
+    match decoded {
+        Ok(bytes) => {
+            let slice = if bytes[0..4] == [255u8; 4] {
+                &bytes[8..]
+            } else {
+                bytes.as_slice()
+            };
+            let message = arrow::ipc::get_root_as_message(slice);
+            message
+                .header_as_schema()
+                .map(arrow::ipc::convert::fb_to_schema)
+        }
+        Err(err) => {
+            // The C++ implementation returns an error if the schema can't be parsed.
+            // To prevent this, we explicitly log this, then compute the schema without the metadata
+            eprintln!(
+                "Unable to decode the encoded schema stored in {}, {:?}",
+                super::ARROW_SCHEMA_META_KEY,
+                err
+            );
+            None
+        }
+    }
+}
+
+/// Mutates writer metadata by encoding the Arrow schema and storing it in the metadata.
+/// If there is an existing Arrow schema metadata, it is replaced.
+pub fn add_encoded_arrow_schema_to_metadata(

Review comment:
       np - my comments are mostly cosmetic - it will be good to change the visibility though.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org