You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/03/07 22:17:09 UTC

[GitHub] [arrow] zeyuanxy opened a new pull request #12579: ARROW-15829, Support SetNull in NumericalBuilder

zeyuanxy opened a new pull request #12579:
URL: https://github.com/apache/arrow/pull/12579


   Support `SetNull` in NumericalBuilder


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] zeyuanxy edited a comment on pull request #12579: ARROW-15829: [C++] Support SetNull in NumericalBuilder

Posted by GitBox <gi...@apache.org>.
zeyuanxy edited a comment on pull request #12579:
URL: https://github.com/apache/arrow/pull/12579#issuecomment-1062123925


   > Do you have an actual use case for this?
   
   Yeah. We have an operator building a sample over streaming data using reservoir sampling. This requires making updates to the builders. We made our own builder implementations (e.g., for numerical we use a value builder and a uint8 builder for nullness; for string we use a vector for value), and we realized for numerical builder, it is straightforward to support it in arrow.  Also it seems Arrow support updating a value (through `T& operator[](int64_t)`) but not support updating the nullness. Therefore we did this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #12579: ARROW-15829, Support SetNull in NumericalBuilder

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #12579:
URL: https://github.com/apache/arrow/pull/12579#issuecomment-1061194626






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #12579: ARROW-15829: [C++] Support SetNull in NumericalBuilder

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #12579:
URL: https://github.com/apache/arrow/pull/12579#issuecomment-1061765189


   Do you have an actual use case for this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] zeyuanxy commented on a change in pull request #12579: ARROW-15829: [C++] Support SetNull in NumericalBuilder

Posted by GitBox <gi...@apache.org>.
zeyuanxy commented on a change in pull request #12579:
URL: https://github.com/apache/arrow/pull/12579#discussion_r822017575



##########
File path: cpp/src/arrow/array/builder_base.h
##########
@@ -250,6 +250,9 @@ class ARROW_EXPORT ArrayBuilder {
   // Set the next validity bits to null (i.e. invalid).
   void UnsafeSetNull(int64_t length);
 
+  /// Set the index bit to null or not null
+  void UnsafeSetIsNull(int64_t index, bool is_null);

Review comment:
       Thanks. Yeah it makes sense. I moved this method to `NumericBuilder`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] cyb70289 commented on a change in pull request #12579: ARROW-15829: [C++] Support SetNull in NumericalBuilder

Posted by GitBox <gi...@apache.org>.
cyb70289 commented on a change in pull request #12579:
URL: https://github.com/apache/arrow/pull/12579#discussion_r822247547



##########
File path: cpp/src/arrow/array/builder_primitive.h
##########
@@ -146,6 +146,11 @@ class NumericBuilder : public ArrayBuilder {
     return reinterpret_cast<value_type*>(data_builder_.mutable_data())[index];
   }
 
+  void UnsafeSetIsNull(int64_t index, bool is_null) {
+    this->operator[](index) = value_type{};

Review comment:
       It may cause surprise if this api always clears an existing slot implicitly, even if we valid-fy a valid slot, which I suppose should have no side effect.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] cyb70289 commented on a change in pull request #12579: ARROW-15829, Support SetNull in NumericalBuilder

Posted by GitBox <gi...@apache.org>.
cyb70289 commented on a change in pull request #12579:
URL: https://github.com/apache/arrow/pull/12579#discussion_r821271455



##########
File path: cpp/src/arrow/array/builder_base.h
##########
@@ -250,6 +250,9 @@ class ARROW_EXPORT ArrayBuilder {
   // Set the next validity bits to null (i.e. invalid).
   void UnsafeSetNull(int64_t length);
 
+  /// Set the index bit to null or not null
+  void UnsafeSetIsNull(int64_t index, bool is_null);

Review comment:
       Looks this api doesn't work for variable length types like string. Probably should not be in base class.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] zeyuanxy commented on pull request #12579: ARROW-15829: [C++] Support SetNull in NumericalBuilder

Posted by GitBox <gi...@apache.org>.
zeyuanxy commented on pull request #12579:
URL: https://github.com/apache/arrow/pull/12579#issuecomment-1066333718


   > > Yeah. We have an operator building a sample over streaming data using reservoir sampling. This requires making updates to the builders.
   > 
   > This is a bit of a slippery slope. Array builders are meant for sequential appending, they normally don't support mutating values. At a lower level you can use `TypedBufferBuilder` or you can even handle buffer allocation yourself.
   
   I guess we could. I guess another reason is that since `NumericalBuilder` supports updating value through `operator []`, it would be complementary to support updating null as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] zeyuanxy commented on a change in pull request #12579: ARROW-15829: [C++] Support SetNull in NumericalBuilder

Posted by GitBox <gi...@apache.org>.
zeyuanxy commented on a change in pull request #12579:
URL: https://github.com/apache/arrow/pull/12579#discussion_r825587959



##########
File path: cpp/src/arrow/array/builder_primitive.h
##########
@@ -146,6 +146,11 @@ class NumericBuilder : public ArrayBuilder {
     return reinterpret_cast<value_type*>(data_builder_.mutable_data())[index];
   }
 
+  void UnsafeSetIsNull(int64_t index, bool is_null) {
+    this->operator[](index) = value_type{};

Review comment:
       Yeah. Here I am trying to imitate what `AppendNull` does.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on pull request #12579: ARROW-15829: [C++] Support SetNull in NumericalBuilder

Posted by GitBox <gi...@apache.org>.
lidavidm commented on pull request #12579:
URL: https://github.com/apache/arrow/pull/12579#issuecomment-1061765269


   Isn't `T& operator[](int64_t)` exactly the operator to overload to enable assignment? e.g. [cppreference states](https://en.cppreference.com/w/cpp/language/operators#Array_subscript_operator) "User-defined classes that provide array-like access that allows both reading and writing typically define two overloads for operator[]: const and non-const variants"
   
   So then it would make sense IMO to allow updating the null bitmap in addition to the values


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] cyb70289 commented on pull request #12579: ARROW-15829: [C++] Support SetNull in NumericalBuilder

Posted by GitBox <gi...@apache.org>.
cyb70289 commented on pull request #12579:
URL: https://github.com/apache/arrow/pull/12579#issuecomment-1062500127


   > Isn't `T& operator[](int64_t)` exactly the operator to overload to enable assignment?
   
   I thought it might be for better performance as I met with once that continuous appending is much slower than directly writing to the underlying buffer.
   @zeyuanxy gives a use case that looks reasonable to mutate an appended value.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] zeyuanxy commented on pull request #12579: ARROW-15829: [C++] Support SetNull in NumericalBuilder

Posted by GitBox <gi...@apache.org>.
zeyuanxy commented on pull request #12579:
URL: https://github.com/apache/arrow/pull/12579#issuecomment-1062123925


   > Do you have an actual use case for this?
   
   Yeah. We have an operator building a sample over streaming data using reservoir sampling. This requires making updates to the builders. We made our own builder implementations (e.g., for numerical we use a value builder and a uint8 builder for nullness; for string we use a vector for value), and we realized for numerical builder, it is straightforward to support it in arrow. Therefore we did this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #12579: ARROW-15829: [C++] Support SetNull in NumericalBuilder

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #12579:
URL: https://github.com/apache/arrow/pull/12579#issuecomment-1062867758


   > Yeah. We have an operator building a sample over streaming data using reservoir sampling. This requires making updates to the builders.
   
   This is a bit of a slippery slope. Array builders are meant for sequential appending, they normally don't support mutating values.
   At a lower level you can use `TypedBufferBuilder` or you can even handle buffer allocation yourself.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org