You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/10/18 19:29:38 UTC

[GitHub] [arrow] jorgecarleitao opened a new pull request #8486: ARROW-10336: [Rust] Added FromIter and ToIter for string arrays

jorgecarleitao opened a new pull request #8486:
URL: https://github.com/apache/arrow/pull/8486


   What the title says: adds string array from and to iterators.
   
   This PR also simplifies some of the functions around strings.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8486: ARROW-10336: [Rust] Added FromIter and ToIter for string arrays

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #8486:
URL: https://github.com/apache/arrow/pull/8486#discussion_r512143167



##########
File path: rust/arrow/src/array/array.rs
##########
@@ -1540,22 +1540,38 @@ impl<OffsetSize: StringOffsetSizeTrait> GenericStringArray<OffsetSize> {
             offsets.push(length_so_far);
             values.extend_from_slice(s.as_bytes());
         }
-        let array_data = ArrayData::builder(data_type)
+        let array_data = ArrayData::builder(OffsetSize::DATA_TYPE)
             .len(v.len())
             .add_buffer(Buffer::from(offsets.to_byte_slice()))
             .add_buffer(Buffer::from(&values[..]))
             .build();
         Self::from(array_data)
     }
 
-    pub(crate) fn from_opt_vec(v: Vec<Option<&str>>, data_type: DataType) -> Self {
-        let mut offsets = Vec::with_capacity(v.len() + 1);
+    pub(crate) fn from_opt_vec(v: Vec<Option<&str>>) -> Self {
+        let iter = v.iter().map(|e| e.map(|e| e.to_string()));
+        GenericStringArray::from_iter(iter)
+    }
+}
+
+impl<'a, Ptr, OffsetSize: StringOffsetSizeTrait> FromIterator<Ptr>

Review comment:
       I see. I have no strong feelings about it, I just think that explicit is better than implicit, and `P` or `OS` forces the reader to go to the `where` clause to understand what they mean. OTOH, generics are generics, so...
   
   I will leave it like this just because the rest of the code base around offsets is like this.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8486: ARROW-10336: [Rust] Added FromIter and ToIter for string arrays

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #8486:
URL: https://github.com/apache/arrow/pull/8486#discussion_r512143167



##########
File path: rust/arrow/src/array/array.rs
##########
@@ -1540,22 +1540,38 @@ impl<OffsetSize: StringOffsetSizeTrait> GenericStringArray<OffsetSize> {
             offsets.push(length_so_far);
             values.extend_from_slice(s.as_bytes());
         }
-        let array_data = ArrayData::builder(data_type)
+        let array_data = ArrayData::builder(OffsetSize::DATA_TYPE)
             .len(v.len())
             .add_buffer(Buffer::from(offsets.to_byte_slice()))
             .add_buffer(Buffer::from(&values[..]))
             .build();
         Self::from(array_data)
     }
 
-    pub(crate) fn from_opt_vec(v: Vec<Option<&str>>, data_type: DataType) -> Self {
-        let mut offsets = Vec::with_capacity(v.len() + 1);
+    pub(crate) fn from_opt_vec(v: Vec<Option<&str>>) -> Self {
+        let iter = v.iter().map(|e| e.map(|e| e.to_string()));
+        GenericStringArray::from_iter(iter)
+    }
+}
+
+impl<'a, Ptr, OffsetSize: StringOffsetSizeTrait> FromIterator<Ptr>

Review comment:
       I see. I have no strong feelings about it, I just think that explicit is better than implicit, and `P` or `OS` forces the reader to go to the `where` clause to understand what they mean. OTOH, generics are generics, and also rust seems to prefer one-letter generic types.
   
   I will leave it like this just because the rest of the code base around offsets is like this.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] vertexclique commented on a change in pull request #8486: ARROW-10336: [Rust] Added FromIter and ToIter for string arrays

Posted by GitBox <gi...@apache.org>.
vertexclique commented on a change in pull request #8486:
URL: https://github.com/apache/arrow/pull/8486#discussion_r511819768



##########
File path: rust/arrow/src/array/array.rs
##########
@@ -1540,22 +1540,38 @@ impl<OffsetSize: StringOffsetSizeTrait> GenericStringArray<OffsetSize> {
             offsets.push(length_so_far);
             values.extend_from_slice(s.as_bytes());
         }
-        let array_data = ArrayData::builder(data_type)
+        let array_data = ArrayData::builder(OffsetSize::DATA_TYPE)
             .len(v.len())
             .add_buffer(Buffer::from(offsets.to_byte_slice()))
             .add_buffer(Buffer::from(&values[..]))
             .build();
         Self::from(array_data)
     }
 
-    pub(crate) fn from_opt_vec(v: Vec<Option<&str>>, data_type: DataType) -> Self {
-        let mut offsets = Vec::with_capacity(v.len() + 1);
+    pub(crate) fn from_opt_vec(v: Vec<Option<&str>>) -> Self {
+        let iter = v.iter().map(|e| e.map(|e| e.to_string()));
+        GenericStringArray::from_iter(iter)
+    }
+}
+
+impl<'a, Ptr, OffsetSize: StringOffsetSizeTrait> FromIterator<Ptr>

Review comment:
       I meant type params like:
   ```suggestion
   impl<'a, P, OS: StringOffsetSizeTrait> FromIterator<P>
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao edited a comment on pull request #8486: ARROW-10336: [Rust] Added FromIter and ToIter for string arrays

Posted by GitBox <gi...@apache.org>.
jorgecarleitao edited a comment on pull request #8486:
URL: https://github.com/apache/arrow/pull/8486#issuecomment-716298625






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #8486: ARROW-10336: [Rust] Added FromIter and ToIter for string arrays

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #8486:
URL: https://github.com/apache/arrow/pull/8486#issuecomment-711408742


   https://issues.apache.org/jira/browse/ARROW-10336


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on pull request #8486: ARROW-10336: [Rust] Added FromIter and ToIter for string arrays

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on pull request #8486:
URL: https://github.com/apache/arrow/pull/8486#issuecomment-716298625


   @vertexclique , do you refer to the `DATA_TYPE`? I changed it to upper case after clippy asked me not to not express `const` as upper case.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nevi-me closed pull request #8486: ARROW-10336: [Rust] Added FromIter and ToIter for string arrays

Posted by GitBox <gi...@apache.org>.
nevi-me closed pull request #8486:
URL: https://github.com/apache/arrow/pull/8486


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nevi-me commented on a change in pull request #8486: ARROW-10336: [Rust] Added FromIter and ToIter for string arrays

Posted by GitBox <gi...@apache.org>.
nevi-me commented on a change in pull request #8486:
URL: https://github.com/apache/arrow/pull/8486#discussion_r512415707



##########
File path: rust/arrow/src/array/array.rs
##########
@@ -1540,22 +1540,38 @@ impl<OffsetSize: StringOffsetSizeTrait> GenericStringArray<OffsetSize> {
             offsets.push(length_so_far);
             values.extend_from_slice(s.as_bytes());
         }
-        let array_data = ArrayData::builder(data_type)
+        let array_data = ArrayData::builder(OffsetSize::DATA_TYPE)
             .len(v.len())
             .add_buffer(Buffer::from(offsets.to_byte_slice()))
             .add_buffer(Buffer::from(&values[..]))
             .build();
         Self::from(array_data)
     }
 
-    pub(crate) fn from_opt_vec(v: Vec<Option<&str>>, data_type: DataType) -> Self {
-        let mut offsets = Vec::with_capacity(v.len() + 1);
+    pub(crate) fn from_opt_vec(v: Vec<Option<&str>>) -> Self {
+        let iter = v.iter().map(|e| e.map(|e| e.to_string()));
+        GenericStringArray::from_iter(iter)
+    }
+}
+
+impl<'a, Ptr, OffsetSize: StringOffsetSizeTrait> FromIterator<Ptr>

Review comment:
       Happy with it as is, if we need to change, we can do that for the whole module/codebase in future




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org