You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/01/05 20:21:33 UTC

[GitHub] [arrow] bu2 edited a comment on pull request #9024: ARROW-11044: [C++] Add "replace" kernel

bu2 edited a comment on pull request #9024:
URL: https://github.com/apache/arrow/pull/9024#issuecomment-754871329


   Wow, I was not expecting such brainstorming on the name of this kernel. My initial intent was to mimic pandas.Series.replace(to_replace, value), where in C++ to_replace could be an BooleanArray mask (same length as input array) which trigger value replacement => So this kernel could be used after any combination of Compare and Boolean operations to implement specific replacement logic.
   
   This could be extended to another implementation where to_replace could be an IntegerArray (length <= as input array) of indexes to replace.
   
   Here is how I use "replace" in combination with "is_nan" ([ARROW-11043](https://github.com/apache/arrow/pull/9023)) to implement fillna():
   
   ```
   template <typename value_type>
   std::shared_ptr<DataFrame> DataFrame::fillna(value_type value) {
       auto outdf = std::make_shared<DataFrame>();
       ...
       ...
       for (int i = 0 ; i < this->table_->num_columns() ; ++i) {
           auto field = this->table_->schema()->field(i);
           auto chunked_array = this->table_->column(i);
           auto value_datum = arrow::compute::Cast(arrow::Datum(value), chunked_array->type()).ValueOrDie();
           auto nulls = arrow::compute::IsNull(chunked_array).ValueOrDie();
           auto nans = arrow::compute::IsNan(chunked_array).ValueOrDie().chunked_array();
           auto to_replace = arrow::compute::Or(nulls, nans).ValueOrDie();
           auto clean_chunked_array = arrow::compute::Replace(chunked_array, to_replace, value_datum).ValueOrDie().chunked_array();
           outdf->table_ = outdf->table_->AddColumn(i, field, clean_chunked_array).ValueOrDie();
       }
   
       return outdf;
   }
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org