You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/10/02 20:10:06 UTC

[GitHub] [arrow] alamb opened a new pull request #8331: ARROW-10162: [Rust] Add pretty print support for DictionaryArray

alamb opened a new pull request #8331:
URL: https://github.com/apache/arrow/pull/8331


   This adds some logic for handling dictionary arrays when pretty printing batches. Part of a larger set of work I am working on for better DictionaryArray handling: https://issues.apache.org/jira/browse/ARROW-10159


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on pull request #8331: ARROW-10162: [Rust] Add pretty print support for DictionaryArray

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #8331:
URL: https://github.com/apache/arrow/pull/8331#issuecomment-703082988


   @jhorstmann  
   
   > Changes look good to me, but I'm wondering now how the existing code handles null values in the primitive arrays. Seems to me that would print the default value. For Utf8 I think it works kinda implicitly because of how the offsets for null values are layed out.
   
   Yes you are correct (that the behavior is incorrect). Good point. I opened  https://issues.apache.org/jira/browse/ARROW-10169 and will fix it in a subsequent PR. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao closed pull request #8331: ARROW-10162: [Rust] Add pretty print support for DictionaryArray

Posted by GitBox <gi...@apache.org>.
jorgecarleitao closed pull request #8331:
URL: https://github.com/apache/arrow/pull/8331


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] paddyhoran commented on a change in pull request #8331: ARROW-10162: [Rust] Add pretty print support for DictionaryArray

Posted by GitBox <gi...@apache.org>.
paddyhoran commented on a change in pull request #8331:
URL: https://github.com/apache/arrow/pull/8331#discussion_r499105205



##########
File path: rust/arrow/src/util/pretty.rs
##########
@@ -126,15 +130,56 @@ fn array_value_to_string(column: array::ArrayRef, row: usize) -> Result<String>
         DataType::Time64(unit) if *unit == TimeUnit::Nanosecond => {
             make_string!(array::Time64NanosecondArray, column, row)
         }
+        DataType::Dictionary(index_type, _value_type) => match **index_type {
+            DataType::Int8 => dict_array_value_to_string::<Int8Type>(column, row),
+            DataType::Int16 => dict_array_value_to_string::<Int16Type>(column, row),
+            DataType::Int32 => dict_array_value_to_string::<Int32Type>(column, row),
+            DataType::Int64 => dict_array_value_to_string::<Int64Type>(column, row),
+            DataType::UInt8 => dict_array_value_to_string::<UInt8Type>(column, row),
+            DataType::UInt16 => dict_array_value_to_string::<UInt16Type>(column, row),
+            DataType::UInt32 => dict_array_value_to_string::<UInt32Type>(column, row),
+            DataType::UInt64 => dict_array_value_to_string::<UInt64Type>(column, row),
+            _ => Err(ArrowError::InvalidArgumentError(format!(
+                "Unsupported index type {:?} type for {:?} in repl.",

Review comment:
       I think this was part of DataFusion but was moved into `arrow` as it was generally useful.  I don't understand repl in this context either.  I suggest updating it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #8331: ARROW-10162: [Rust] Add pretty print support for DictionaryArray

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #8331:
URL: https://github.com/apache/arrow/pull/8331#issuecomment-702941984


   https://issues.apache.org/jira/browse/ARROW-10162


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on a change in pull request #8331: ARROW-10162: [Rust] Add pretty print support for DictionaryArray

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #8331:
URL: https://github.com/apache/arrow/pull/8331#discussion_r499030937



##########
File path: rust/arrow/src/util/pretty.rs
##########
@@ -80,8 +84,8 @@ macro_rules! make_string {
     }};
 }
 
-/// Get the value at the given row in an array as a string
-fn array_value_to_string(column: array::ArrayRef, row: usize) -> Result<String> {
+/// Get the value at the given row in an array as a String
+pub fn array_value_to_string(column: &array::ArrayRef, row: usize) -> Result<String> {

Review comment:
       I made this `pub` so I could call it from datafusion/src/test/sql.rs which currently has a copy/pasted version of pretty printing in `array_str`: https://github.com/apache/arrow/blob/master/rust/datafusion/tests/sql.rs#L646




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on a change in pull request #8331: ARROW-10162: [Rust] Add pretty print support for DictionaryArray

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #8331:
URL: https://github.com/apache/arrow/pull/8331#discussion_r499135674



##########
File path: rust/arrow/src/util/pretty.rs
##########
@@ -126,15 +130,56 @@ fn array_value_to_string(column: array::ArrayRef, row: usize) -> Result<String>
         DataType::Time64(unit) if *unit == TimeUnit::Nanosecond => {
             make_string!(array::Time64NanosecondArray, column, row)
         }
+        DataType::Dictionary(index_type, _value_type) => match **index_type {
+            DataType::Int8 => dict_array_value_to_string::<Int8Type>(column, row),
+            DataType::Int16 => dict_array_value_to_string::<Int16Type>(column, row),
+            DataType::Int32 => dict_array_value_to_string::<Int32Type>(column, row),
+            DataType::Int64 => dict_array_value_to_string::<Int64Type>(column, row),
+            DataType::UInt8 => dict_array_value_to_string::<UInt8Type>(column, row),
+            DataType::UInt16 => dict_array_value_to_string::<UInt16Type>(column, row),
+            DataType::UInt32 => dict_array_value_to_string::<UInt32Type>(column, row),
+            DataType::UInt64 => dict_array_value_to_string::<UInt64Type>(column, row),
+            _ => Err(ArrowError::InvalidArgumentError(format!(
+                "Unsupported index type {:?} type for {:?} in repl.",

Review comment:
       I will update the messages. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb edited a comment on pull request #8331: ARROW-10162: [Rust] Add pretty print support for DictionaryArray

Posted by GitBox <gi...@apache.org>.
alamb edited a comment on pull request #8331:
URL: https://github.com/apache/arrow/pull/8331#issuecomment-703082988


   @jhorstmann  
   
   > Changes look good to me, but I'm wondering now how the existing code handles null values in the primitive arrays. Seems to me that would print the default value. For Utf8 I think it works kinda implicitly because of how the offsets for null values are layed out.
   
   Yes you are correct (that the behavior is incorrect). Good point. I opened  https://issues.apache.org/jira/browse/ARROW-10169 and https://github.com/apache/arrow/pull/8332


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on a change in pull request #8331: ARROW-10162: [Rust] Add pretty print support for DictionaryArray

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #8331:
URL: https://github.com/apache/arrow/pull/8331#discussion_r499031446



##########
File path: rust/arrow/src/util/pretty.rs
##########
@@ -60,7 +64,7 @@ fn create_table(results: &[RecordBatch]) -> Result<Table> {
             let mut cells = Vec::new();
             for col in 0..batch.num_columns() {
                 let column = batch.column(col);
-                cells.push(Cell::new(&array_value_to_string(column.clone(), row)?));

Review comment:
       There is no reason / need to clone to column when printing each value




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on a change in pull request #8331: ARROW-10162: [Rust] Add pretty print support for DictionaryArray

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #8331:
URL: https://github.com/apache/arrow/pull/8331#discussion_r499028581



##########
File path: rust/arrow/src/util/pretty.rs
##########
@@ -126,15 +130,56 @@ fn array_value_to_string(column: array::ArrayRef, row: usize) -> Result<String>
         DataType::Time64(unit) if *unit == TimeUnit::Nanosecond => {
             make_string!(array::Time64NanosecondArray, column, row)
         }
+        DataType::Dictionary(index_type, _value_type) => match **index_type {
+            DataType::Int8 => dict_array_value_to_string::<Int8Type>(column, row),
+            DataType::Int16 => dict_array_value_to_string::<Int16Type>(column, row),
+            DataType::Int32 => dict_array_value_to_string::<Int32Type>(column, row),
+            DataType::Int64 => dict_array_value_to_string::<Int64Type>(column, row),
+            DataType::UInt8 => dict_array_value_to_string::<UInt8Type>(column, row),
+            DataType::UInt16 => dict_array_value_to_string::<UInt16Type>(column, row),
+            DataType::UInt32 => dict_array_value_to_string::<UInt32Type>(column, row),
+            DataType::UInt64 => dict_array_value_to_string::<UInt64Type>(column, row),
+            _ => Err(ArrowError::InvalidArgumentError(format!(
+                "Unsupported index type {:?} type for {:?} in repl.",

Review comment:
       I don't really grok why the error messages refer to `repl` in this file, but I have carried forward the tradition with this PR




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] paddyhoran commented on a change in pull request #8331: ARROW-10162: [Rust] Add pretty print support for DictionaryArray

Posted by GitBox <gi...@apache.org>.
paddyhoran commented on a change in pull request #8331:
URL: https://github.com/apache/arrow/pull/8331#discussion_r499105205



##########
File path: rust/arrow/src/util/pretty.rs
##########
@@ -126,15 +130,56 @@ fn array_value_to_string(column: array::ArrayRef, row: usize) -> Result<String>
         DataType::Time64(unit) if *unit == TimeUnit::Nanosecond => {
             make_string!(array::Time64NanosecondArray, column, row)
         }
+        DataType::Dictionary(index_type, _value_type) => match **index_type {
+            DataType::Int8 => dict_array_value_to_string::<Int8Type>(column, row),
+            DataType::Int16 => dict_array_value_to_string::<Int16Type>(column, row),
+            DataType::Int32 => dict_array_value_to_string::<Int32Type>(column, row),
+            DataType::Int64 => dict_array_value_to_string::<Int64Type>(column, row),
+            DataType::UInt8 => dict_array_value_to_string::<UInt8Type>(column, row),
+            DataType::UInt16 => dict_array_value_to_string::<UInt16Type>(column, row),
+            DataType::UInt32 => dict_array_value_to_string::<UInt32Type>(column, row),
+            DataType::UInt64 => dict_array_value_to_string::<UInt64Type>(column, row),
+            _ => Err(ArrowError::InvalidArgumentError(format!(
+                "Unsupported index type {:?} type for {:?} in repl.",

Review comment:
       I think this (pretty printing) was part of DataFusion's cli to begin with but was moved into `arrow` as it was generally useful.  I don't understand repl in this context either.  I suggest updating it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on pull request #8331: ARROW-10162: [Rust] Add pretty print support for DictionaryArray

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #8331:
URL: https://github.com/apache/arrow/pull/8331#issuecomment-703237419


   Rebased to resolve conflicts 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jhorstmann commented on pull request #8331: ARROW-10162: [Rust] Add pretty print support for DictionaryArray

Posted by GitBox <gi...@apache.org>.
jhorstmann commented on pull request #8331:
URL: https://github.com/apache/arrow/pull/8331#issuecomment-702964608


   Changes look good to me, but I'm wondering now how the existing code handles null values in the primitive arrays. Seems to me that would print the default value. For `Utf8` I think it works kinda implicitly because of how the offsets for null values are layed out.
   
   Would it make sense to fix this in the same PR or should I open a separate ticket?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on a change in pull request #8331: ARROW-10162: [Rust] Add pretty print support for DictionaryArray

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #8331:
URL: https://github.com/apache/arrow/pull/8331#discussion_r499232922



##########
File path: rust/arrow/src/util/pretty.rs
##########
@@ -80,8 +84,8 @@ macro_rules! make_string {
     }};
 }
 
-/// Get the value at the given row in an array as a string
-fn array_value_to_string(column: array::ArrayRef, row: usize) -> Result<String> {
+/// Get the value at the given row in an array as a String
+pub fn array_value_to_string(column: &array::ArrayRef, row: usize) -> Result<String> {

Review comment:
       See https://github.com/apache/arrow/pull/8333
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org