You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/08/26 08:46:01 UTC

[GitHub] [arrow-rs] psvri opened a new pull request, #2592: Refactor Binary Builder and String Builder Constructors

psvri opened a new pull request, #2592:
URL: https://github.com/apache/arrow-rs/pull/2592

   # Which issue does this PR close?
   
   Closes #2054.
   
   # Rationale for this change
    
   Removes capacity from new constructors
   
   # What changes are included in this PR?
   
   Removes capacity from new constructors
   
   # Are there any user-facing changes?
   
   Yes
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] psvri commented on a diff in pull request #2592: Refactor Binary Builder and String Builder Constructors

Posted by GitBox <gi...@apache.org>.
psvri commented on code in PR #2592:
URL: https://github.com/apache/arrow-rs/pull/2592#discussion_r956542490


##########
arrow/src/array/builder/generic_binary_builder.rs:
##########
@@ -34,15 +34,8 @@ pub struct GenericBinaryBuilder<OffsetSize: OffsetSizeTrait> {
 
 impl<OffsetSize: OffsetSizeTrait> GenericBinaryBuilder<OffsetSize> {
     /// Creates a new [`GenericBinaryBuilder`].
-    /// `capacity` is the number of bytes in the values array.
-    pub fn new(capacity: usize) -> Self {
-        let mut offsets_builder = BufferBuilder::<OffsetSize>::new(1024);
-        offsets_builder.append(OffsetSize::zero());
-        Self {
-            value_builder: UInt8BufferBuilder::new(capacity),
-            offsets_builder,
-            null_buffer_builder: NullBufferBuilder::new(1024),
-        }
+    pub fn new() -> Self {
+        Self::with_capacity(1024, 1024)

Review Comment:
   When we used new previously , it was using 1024 as no. of items, hence I have retained the same.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] psvri commented on pull request #2592: Refactor Binary Builder and String Builder Constructors

Posted by GitBox <gi...@apache.org>.
psvri commented on PR #2592:
URL: https://github.com/apache/arrow-rs/pull/2592#issuecomment-1228251700

   I have replace all `new` calls with `with_capacity(1024, capacity)` . This way the logic wont change.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #2592: Refactor Binary Builder and String Builder Constructors

Posted by GitBox <gi...@apache.org>.
tustvold commented on code in PR #2592:
URL: https://github.com/apache/arrow-rs/pull/2592#discussion_r956577785


##########
arrow/src/array/builder/generic_binary_builder.rs:
##########
@@ -34,15 +34,8 @@ pub struct GenericBinaryBuilder<OffsetSize: OffsetSizeTrait> {
 
 impl<OffsetSize: OffsetSizeTrait> GenericBinaryBuilder<OffsetSize> {
     /// Creates a new [`GenericBinaryBuilder`].
-    /// `capacity` is the number of bytes in the values array.
-    pub fn new(capacity: usize) -> Self {
-        let mut offsets_builder = BufferBuilder::<OffsetSize>::new(1024);
-        offsets_builder.append(OffsetSize::zero());
-        Self {
-            value_builder: UInt8BufferBuilder::new(capacity),
-            offsets_builder,
-            null_buffer_builder: NullBufferBuilder::new(1024),
-        }
+    pub fn new() -> Self {
+        Self::with_capacity(1024, 1024)

Review Comment:
   It's not a reasonable assumption no, but it is a decent guess to avoid worst case bump allocation, probably 😅



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] HaoYang670 commented on a diff in pull request #2592: Refactor Binary Builder and String Builder Constructors

Posted by GitBox <gi...@apache.org>.
HaoYang670 commented on code in PR #2592:
URL: https://github.com/apache/arrow-rs/pull/2592#discussion_r956534869


##########
arrow/src/array/builder/generic_binary_builder.rs:
##########
@@ -34,15 +34,8 @@ pub struct GenericBinaryBuilder<OffsetSize: OffsetSizeTrait> {
 
 impl<OffsetSize: OffsetSizeTrait> GenericBinaryBuilder<OffsetSize> {
     /// Creates a new [`GenericBinaryBuilder`].
-    /// `capacity` is the number of bytes in the values array.
-    pub fn new(capacity: usize) -> Self {
-        let mut offsets_builder = BufferBuilder::<OffsetSize>::new(1024);
-        offsets_builder.append(OffsetSize::zero());
-        Self {
-            value_builder: UInt8BufferBuilder::new(capacity),
-            offsets_builder,
-            null_buffer_builder: NullBufferBuilder::new(1024),
-        }
+    pub fn new() -> Self {
+        Self::with_capacity(1024, 1024)

Review Comment:
   Is it a reasonable assumption that the array contains the same number of elements and `bytes`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] psvri commented on a diff in pull request #2592: Refactor Binary Builder and String Builder Constructors

Posted by GitBox <gi...@apache.org>.
psvri commented on code in PR #2592:
URL: https://github.com/apache/arrow-rs/pull/2592#discussion_r956542490


##########
arrow/src/array/builder/generic_binary_builder.rs:
##########
@@ -34,15 +34,8 @@ pub struct GenericBinaryBuilder<OffsetSize: OffsetSizeTrait> {
 
 impl<OffsetSize: OffsetSizeTrait> GenericBinaryBuilder<OffsetSize> {
     /// Creates a new [`GenericBinaryBuilder`].
-    /// `capacity` is the number of bytes in the values array.
-    pub fn new(capacity: usize) -> Self {
-        let mut offsets_builder = BufferBuilder::<OffsetSize>::new(1024);
-        offsets_builder.append(OffsetSize::zero());
-        Self {
-            value_builder: UInt8BufferBuilder::new(capacity),
-            offsets_builder,
-            null_buffer_builder: NullBufferBuilder::new(1024),
-        }
+    pub fn new() -> Self {
+        Self::with_capacity(1024, 1024)

Review Comment:
   When we used new previously , it was using 1024 as capacity, hence I have retained the same.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #2592: Refactor Binary Builder and String Builder Constructors

Posted by GitBox <gi...@apache.org>.
tustvold commented on code in PR #2592:
URL: https://github.com/apache/arrow-rs/pull/2592#discussion_r956578260


##########
arrow/src/compute/kernels/cast.rs:
##########
@@ -2355,7 +2355,7 @@ where
     let values = cast_values.as_any().downcast_ref::<StringArray>().unwrap();
 
     let keys_builder = PrimitiveBuilder::<K>::with_capacity(values.len());
-    let values_builder = StringBuilder::new(values.len());
+    let values_builder = StringBuilder::with_capacity(1024, values.len());

Review Comment:
   In this case I'd be tempted to just go with new, this estimate isn't obviously better



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] psvri commented on a diff in pull request #2592: Refactor Binary Builder and String Builder Constructors

Posted by GitBox <gi...@apache.org>.
psvri commented on code in PR #2592:
URL: https://github.com/apache/arrow-rs/pull/2592#discussion_r956580115


##########
arrow/src/compute/kernels/cast.rs:
##########
@@ -2355,7 +2355,7 @@ where
     let values = cast_values.as_any().downcast_ref::<StringArray>().unwrap();
 
     let keys_builder = PrimitiveBuilder::<K>::with_capacity(values.len());
-    let values_builder = StringBuilder::new(values.len());
+    let values_builder = StringBuilder::with_capacity(1024, values.len());

Review Comment:
   Yeah, I should have retained new.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] HaoYang670 commented on a diff in pull request #2592: Refactor Binary Builder and String Builder Constructors

Posted by GitBox <gi...@apache.org>.
HaoYang670 commented on code in PR #2592:
URL: https://github.com/apache/arrow-rs/pull/2592#discussion_r956534869


##########
arrow/src/array/builder/generic_binary_builder.rs:
##########
@@ -34,15 +34,8 @@ pub struct GenericBinaryBuilder<OffsetSize: OffsetSizeTrait> {
 
 impl<OffsetSize: OffsetSizeTrait> GenericBinaryBuilder<OffsetSize> {
     /// Creates a new [`GenericBinaryBuilder`].
-    /// `capacity` is the number of bytes in the values array.
-    pub fn new(capacity: usize) -> Self {
-        let mut offsets_builder = BufferBuilder::<OffsetSize>::new(1024);
-        offsets_builder.append(OffsetSize::zero());
-        Self {
-            value_builder: UInt8BufferBuilder::new(capacity),
-            offsets_builder,
-            null_buffer_builder: NullBufferBuilder::new(1024),
-        }
+    pub fn new() -> Self {
+        Self::with_capacity(1024, 1024)

Review Comment:
   Is it a reasonable assumption that the array contains the same number of elements and bytes?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold merged pull request #2592: Refactor Binary Builder and String Builder Constructors

Posted by GitBox <gi...@apache.org>.
tustvold merged PR #2592:
URL: https://github.com/apache/arrow-rs/pull/2592


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] ursabot commented on pull request #2592: Refactor Binary Builder and String Builder Constructors

Posted by GitBox <gi...@apache.org>.
ursabot commented on PR #2592:
URL: https://github.com/apache/arrow-rs/pull/2592#issuecomment-1229187321

   Benchmark runs are scheduled for baseline = ff81dee368946a79ae90132fe3693620e0c4f710 and contender = 744412f751d926311851c30e7271e3f4f14757f7. 744412f751d926311851c30e7271e3f4f14757f7 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Skipped :warning: Benchmarking of arrow-rs-commits is not supported on ec2-t3-xlarge-us-east-2] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/d59c91fdba6f4c1285eb6f4589132941...27951bf85bf94629b5577bf43a5dcfd4/)
   [Skipped :warning: Benchmarking of arrow-rs-commits is not supported on test-mac-arm] [test-mac-arm](https://conbench.ursa.dev/compare/runs/51a2f2e5906a4a8aa8bd677838c5bf59...7f7e64cdc9af4ced868b160c083082a4/)
   [Skipped :warning: Benchmarking of arrow-rs-commits is not supported on ursa-i9-9960x] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/000ea79145e84f288753ed55f8bc5b0f...168d0ce24cdb4772b73f73ecc700a9b3/)
   [Skipped :warning: Benchmarking of arrow-rs-commits is not supported on ursa-thinkcentre-m75q] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/4bb7c6dd30cb4b6691af42a506123059...2830a483738e447ab1f66e82354bc4ef/)
   Buildkite builds:
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #2592: Refactor Binary Builder and String Builder Constructors

Posted by GitBox <gi...@apache.org>.
tustvold commented on code in PR #2592:
URL: https://github.com/apache/arrow-rs/pull/2592#discussion_r956577785


##########
arrow/src/array/builder/generic_binary_builder.rs:
##########
@@ -34,15 +34,8 @@ pub struct GenericBinaryBuilder<OffsetSize: OffsetSizeTrait> {
 
 impl<OffsetSize: OffsetSizeTrait> GenericBinaryBuilder<OffsetSize> {
     /// Creates a new [`GenericBinaryBuilder`].
-    /// `capacity` is the number of bytes in the values array.
-    pub fn new(capacity: usize) -> Self {
-        let mut offsets_builder = BufferBuilder::<OffsetSize>::new(1024);
-        offsets_builder.append(OffsetSize::zero());
-        Self {
-            value_builder: UInt8BufferBuilder::new(capacity),
-            offsets_builder,
-            null_buffer_builder: NullBufferBuilder::new(1024),
-        }
+    pub fn new() -> Self {
+        Self::with_capacity(1024, 1024)

Review Comment:
   Its not a reasonable assumption no, is it the best guess to avoid worst case bump allocation, probably 😅



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org