You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "alamb (via GitHub)" <gi...@apache.org> on 2023/05/10 00:07:39 UTC

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #6307: Support null values in Avro string columns

alamb commented on code in PR #6307:
URL: https://github.com/apache/arrow-datafusion/pull/6307#discussion_r1189233014


##########
datafusion/core/src/avro_to_arrow/arrow_array_reader.rs:
##########
@@ -860,13 +862,14 @@ fn flatten_string_values(values: &[&Value]) -> Vec<Option<String>> {
 /// Reads an Avro value as a string, regardless of its type.
 /// This is useful if the expected datatype is a string, in which case we preserve
 /// all the values regardless of they type.
-fn resolve_string(v: &Value) -> ArrowResult<String> {
+fn resolve_string(v: &Value) -> ArrowResult<Option<String>> {
     let v = if let Value::Union(_, b) = v { b } else { v };
     match v {
-        Value::String(s) => Ok(s.clone()),
-        Value::Bytes(bytes) => {
-            String::from_utf8(bytes.to_vec()).map_err(AvroError::ConvertToUtf8)
-        }
+        Value::String(s) => Ok(Some(s.clone())),
+        Value::Bytes(bytes) => String::from_utf8(bytes.to_vec())
+            .map_err(AvroError::ConvertToUtf8)
+            .map(Some),
+        Value::Null => Ok(None),

Review Comment:
   Looks reasonable to me. 👍 



##########
datafusion/core/src/datasource/file_format/avro.rs:
##########
@@ -350,6 +393,48 @@ mod tests {
         Ok(())
     }
 
+    #[tokio::test]
+    async fn read_null_binary_alltypes_plain_avro() -> Result<()> {
+        let session_ctx = SessionContext::new();
+        let state = session_ctx.state();
+        let task_ctx = state.task_ctx();
+        let projection = Some(vec![6]);
+        let exec =
+            get_exec(&state, "alltypes_nulls_plain.avro", projection, None).await?;
+
+        let batches = collect(exec, task_ctx).await?;

Review Comment:
   It might also be worth checking out the https://docs.rs/datafusion/latest/datafusion/macro.assert_batches_eq.html macro to verify the rows / columns in a more easy to maintain wai



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org