You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/11/22 13:43:35 UTC

[GitHub] [arrow-rs] askoa opened a new pull request, #3158: Add finish_cloned to ArrayBuilder

askoa opened a new pull request, #3158:
URL: https://github.com/apache/arrow-rs/pull/3158

   # Which issue does this PR close?
   
   Closes #3154 
   
   # Are there any user-facing changes?
   Yes, users will get a new function finish_cloned to get snapshot of data in ArrayBuilder.
   
   
   WIP: This is the first commit to make sure the PR is in right direction.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] askoa commented on a diff in pull request #3158: Add finish_cloned to ArrayBuilder

Posted by GitBox <gi...@apache.org>.
askoa commented on code in PR #3158:
URL: https://github.com/apache/arrow-rs/pull/3158#discussion_r1030420833


##########
arrow-array/src/builder/union_builder.rs:
##########
@@ -47,6 +47,8 @@ trait FieldDataValues: std::fmt::Debug {
     fn append_null(&mut self);
 
     fn finish(&mut self) -> Buffer;
+
+    fn finish_cloned(&self) -> Buffer;

Review Comment:
   Though I added `build_cloned` function, I am not sure if it's required. The existing function `build` has a signature `(self)` which means the builder cannot be reused after calling the function. So there is no need to add `build_cloned` which does not reset the builder. What do you think?
   
   Edit: I removed `build_cloned`. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3158: Add finish_cloned to ArrayBuilder

Posted by GitBox <gi...@apache.org>.
tustvold commented on code in PR #3158:
URL: https://github.com/apache/arrow-rs/pull/3158#discussion_r1030119748


##########
arrow-array/src/builder/fixed_size_list_builder.rs:
##########
@@ -140,6 +146,42 @@ where
 
         FixedSizeListArray::from(array_data)
     }
+
+    /// Builds the [`FixedSizeListBuilder`] without resetting the builder.
+    pub fn finish_cloned(&self) -> FixedSizeListArray {
+        let len = self.len();
+        let values_arr = self
+            .values_builder
+            .as_any()
+            .downcast_ref::<T>()
+            .unwrap()
+            .finish_cloned();

Review Comment:
   ```suggestion
           let values_arr = self.values_builder.finish_cloned();
   ```



##########
arrow-array/src/builder/map_builder.rs:
##########
@@ -152,6 +153,58 @@ impl<K: ArrayBuilder, V: ArrayBuilder> MapBuilder<K, V> {
 
         MapArray::from(array_data)
     }
+
+    pub fn finish_cloned(&self) -> MapArray {
+        let len = self.len();
+
+        // Build the keys
+        let keys_arr = self
+            .key_builder
+            .as_any()
+            .downcast_ref::<K>()
+            .unwrap()
+            .finish_cloned();
+        let values_arr = self
+            .value_builder
+            .as_any()
+            .downcast_ref::<V>()
+            .unwrap()
+            .finish_cloned();

Review Comment:
   ```suggestion
           let keys_arr = self.key_builder.finish_cloned();
           let values_arr = self.value_builder.finish_cloned();
   ```



##########
arrow-array/src/builder/fixed_size_list_builder.rs:
##########
@@ -140,6 +146,42 @@ where
 
         FixedSizeListArray::from(array_data)
     }
+
+    /// Builds the [`FixedSizeListBuilder`] without resetting the builder.
+    pub fn finish_cloned(&self) -> FixedSizeListArray {
+        let len = self.len();
+        let values_arr = self
+            .values_builder
+            .as_any()
+            .downcast_ref::<T>()
+            .unwrap()
+            .finish_cloned();
+        let values_data = values_arr.data();
+
+        assert!(
+            values_data.len() == len * self.list_len as usize,

Review Comment:
   ```suggestion
           assert_eq!(
               values_data.len(), len * self.list_len as usize,
   ```



##########
arrow-array/src/builder/generic_list_builder.rs:
##########
@@ -143,6 +149,39 @@ where
         GenericListArray::<OffsetSize>::from(array_data)
     }
 
+    /// Builds the [`GenericListArray`] without resetting the builder.
+    pub fn finish_cloned(&self) -> GenericListArray<OffsetSize> {
+        let len = self.len();
+        let values_arr = self
+            .values_builder
+            .as_any()
+            .downcast_ref::<T>()
+            .unwrap()
+            .finish_cloned();

Review Comment:
   ```suggestion
           let values_arr = self.values_builder.finish_cloned();
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold merged pull request #3158: Add finish_cloned to ArrayBuilder

Posted by GitBox <gi...@apache.org>.
tustvold merged PR #3158:
URL: https://github.com/apache/arrow-rs/pull/3158


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3158: Add finish_cloned to ArrayBuilder

Posted by GitBox <gi...@apache.org>.
tustvold commented on code in PR #3158:
URL: https://github.com/apache/arrow-rs/pull/3158#discussion_r1030327493


##########
arrow-array/src/builder/generic_list_builder.rs:
##########
@@ -143,6 +149,39 @@ where
         GenericListArray::<OffsetSize>::from(array_data)
     }
 
+    /// Builds the [`GenericListArray`] without resetting the builder.
+    pub fn finish_cloned(&self) -> GenericListArray<OffsetSize> {
+        let len = self.len();
+        let values_arr = self
+            .values_builder
+            .as_any()
+            .downcast_ref::<T>()
+            .unwrap()
+            .finish_cloned();

Review Comment:
   Likely a holdover from some refactor - I fixed it in #3166 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] askoa commented on a diff in pull request #3158: Add finish_cloned to ArrayBuilder

Posted by GitBox <gi...@apache.org>.
askoa commented on code in PR #3158:
URL: https://github.com/apache/arrow-rs/pull/3158#discussion_r1030325680


##########
arrow-array/src/builder/union_builder.rs:
##########
@@ -47,6 +47,8 @@ trait FieldDataValues: std::fmt::Debug {
     fn append_null(&mut self);
 
     fn finish(&mut self) -> Buffer;
+
+    fn finish_cloned(&self) -> Buffer;

Review Comment:
   Good catch. I have to create a `build_cloned` function in UnionArray.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] askoa commented on a diff in pull request #3158: Add finish_cloned to ArrayBuilder

Posted by GitBox <gi...@apache.org>.
askoa commented on code in PR #3158:
URL: https://github.com/apache/arrow-rs/pull/3158#discussion_r1030326715


##########
arrow-array/src/builder/generic_list_builder.rs:
##########
@@ -143,6 +149,39 @@ where
         GenericListArray::<OffsetSize>::from(array_data)
     }
 
+    /// Builds the [`GenericListArray`] without resetting the builder.
+    pub fn finish_cloned(&self) -> GenericListArray<OffsetSize> {
+        let len = self.len();
+        let values_arr = self
+            .values_builder
+            .as_any()
+            .downcast_ref::<T>()
+            .unwrap()
+            .finish_cloned();

Review Comment:
   I just copied from `finish` function. Do you know why it was coded this way in `finish` function?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] askoa commented on pull request #3158: Add finish_cloned to ArrayBuilder

Posted by GitBox <gi...@apache.org>.
askoa commented on PR #3158:
URL: https://github.com/apache/arrow-rs/pull/3158#issuecomment-1323850323

   > I wonder if instead of `MutableBuffer: Clone` and adding `finish_cloned` to `BufferBuilder`, etc... we could just make use of https://docs.rs/arrow-buffer/latest/arrow_buffer/struct.Buffer.html#method.from_slice_ref?
   > 
   > e.g.
   > 
   > ```
   > let buffer = Buffer::from_slice_ref(self.values_builder.as_slice());
   > ```
   > 
   > What do you think?
   
   I made the change locally and getting below error
   
   Error for BooleanBufferBuilder:
   ```
   "the size for values of type `[u8]` cannot be known at compilation time\nthe trait `Sized` is not implemented for `[u8]`"
   ```
   
   Error for BufferBuilder:
   ```
   "the size for values of type `[T]` cannot be known at compilation time\nthe trait `Sized` is not implemented for `[T]`"
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] askoa commented on a diff in pull request #3158: Add finish_cloned to ArrayBuilder

Posted by GitBox <gi...@apache.org>.
askoa commented on code in PR #3158:
URL: https://github.com/apache/arrow-rs/pull/3158#discussion_r1030409937


##########
arrow-array/src/builder/fixed_size_list_builder.rs:
##########
@@ -140,6 +146,42 @@ where
 
         FixedSizeListArray::from(array_data)
     }
+
+    /// Builds the [`FixedSizeListBuilder`] without resetting the builder.
+    pub fn finish_cloned(&self) -> FixedSizeListArray {
+        let len = self.len();
+        let values_arr = self
+            .values_builder
+            .as_any()
+            .downcast_ref::<T>()
+            .unwrap()
+            .finish_cloned();

Review Comment:
   I marked it resolved because I was about to push commit. I pushed it now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3158: Add finish_cloned to ArrayBuilder

Posted by GitBox <gi...@apache.org>.
tustvold commented on code in PR #3158:
URL: https://github.com/apache/arrow-rs/pull/3158#discussion_r1030117213


##########
arrow-array/src/builder/union_builder.rs:
##########
@@ -47,6 +47,8 @@ trait FieldDataValues: std::fmt::Debug {
     fn append_null(&mut self);
 
     fn finish(&mut self) -> Buffer;
+
+    fn finish_cloned(&self) -> Buffer;

Review Comment:
   This doesn't appear to be used



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on pull request #3158: Add finish_cloned to ArrayBuilder

Posted by GitBox <gi...@apache.org>.
tustvold commented on PR #3158:
URL: https://github.com/apache/arrow-rs/pull/3158#issuecomment-1324073891

   > Edit: I think finish_cloned function needs to be added to null_buffer_builder as null_buffer_builder.bitmap_builder might not be accessible outside the struct.
   
   I'd suggest adding something like
   
   ```
       /// Returns the packed bits
       pub fn as_slice(&self) -> Option<&[u8]> {
           Some(self.bitmap_builder?.as_slice())
       }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3158: Add finish_cloned to ArrayBuilder

Posted by GitBox <gi...@apache.org>.
tustvold commented on code in PR #3158:
URL: https://github.com/apache/arrow-rs/pull/3158#discussion_r1030339905


##########
arrow-array/src/builder/fixed_size_list_builder.rs:
##########
@@ -140,6 +146,42 @@ where
 
         FixedSizeListArray::from(array_data)
     }
+
+    /// Builds the [`FixedSizeListBuilder`] without resetting the builder.
+    pub fn finish_cloned(&self) -> FixedSizeListArray {
+        let len = self.len();
+        let values_arr = self
+            .values_builder
+            .as_any()
+            .downcast_ref::<T>()
+            .unwrap()
+            .finish_cloned();

Review Comment:
   Why is this resolved?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] ursabot commented on pull request #3158: Add finish_cloned to ArrayBuilder

Posted by GitBox <gi...@apache.org>.
ursabot commented on PR #3158:
URL: https://github.com/apache/arrow-rs/pull/3158#issuecomment-1325335419

   Benchmark runs are scheduled for baseline = 12a67b9bc7e1538f5af1a189cc0a78c14d551897 and contender = 6c466afe3b0b3a4c7b90c99c27eefade62011c31. 6c466afe3b0b3a4c7b90c99c27eefade62011c31 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Skipped :warning: Benchmarking of arrow-rs-commits is not supported on ec2-t3-xlarge-us-east-2] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/df967be906aa4a9a9026b918e3ed5b02...46f6ef3ec2a54fd093835ddff8da254b/)
   [Skipped :warning: Benchmarking of arrow-rs-commits is not supported on test-mac-arm] [test-mac-arm](https://conbench.ursa.dev/compare/runs/f55b31c894304d2aa3850879a3f1a0cd...98567f0d5b5a4c2ba10e890aec2871cd/)
   [Skipped :warning: Benchmarking of arrow-rs-commits is not supported on ursa-i9-9960x] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/e65ff8cd2e7f4cac99f22ef54b675e36...c8d8f69da19d47d3a18fb4b0aab025ab/)
   [Skipped :warning: Benchmarking of arrow-rs-commits is not supported on ursa-thinkcentre-m75q] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/d1182ec3dce446e99855c8f5101a14f9...844935ad59a445d59d82765f6e0af6bb/)
   Buildkite builds:
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on pull request #3158: Add finish_cloned to ArrayBuilder

Posted by GitBox <gi...@apache.org>.
tustvold commented on PR #3158:
URL: https://github.com/apache/arrow-rs/pull/3158#issuecomment-1323819470

   I wonder if instead of `MutableBuffer: Clone` and adding `finish_cloned` to `BufferBuilder`, etc... we could just make use of https://docs.rs/arrow-buffer/latest/arrow_buffer/struct.Buffer.html#method.from_slice_ref?
   
   What do you think?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on pull request #3158: Add finish_cloned to ArrayBuilder

Posted by GitBox <gi...@apache.org>.
tustvold commented on PR #3158:
URL: https://github.com/apache/arrow-rs/pull/3158#issuecomment-1323866054

   Aah yeah, the signature for `Buffer::from_slice_ref` is a little strange... `Buffer::from_slice_ref(&self.values_builder.as_slice());` should work


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] askoa commented on a diff in pull request #3158: Add finish_cloned to ArrayBuilder

Posted by GitBox <gi...@apache.org>.
askoa commented on code in PR #3158:
URL: https://github.com/apache/arrow-rs/pull/3158#discussion_r1030420833


##########
arrow-array/src/builder/union_builder.rs:
##########
@@ -47,6 +47,8 @@ trait FieldDataValues: std::fmt::Debug {
     fn append_null(&mut self);
 
     fn finish(&mut self) -> Buffer;
+
+    fn finish_cloned(&self) -> Buffer;

Review Comment:
   Though I added `build_cloned` function, I am not sure if it's required. The existing function `build` has a signature `(self)` which means the builder cannot be reused after calling the function. So there is no need to add `build_cloned` which does not reset the builder. What do you think?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] askoa commented on pull request #3158: Add finish_cloned to ArrayBuilder

Posted by GitBox <gi...@apache.org>.
askoa commented on PR #3158:
URL: https://github.com/apache/arrow-rs/pull/3158#issuecomment-1323927017

   > Aah yeah, the signature for `Buffer::from_slice_ref` is a little strange... `Buffer::from_slice_ref(&self.values_builder.as_slice());` should work
   
   The suggestion works. I'll send complete PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org