You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/12/27 19:56:09 UTC

[GitHub] [arrow] bu2 opened a new pull request #9022: Add replace and isnan scalar kernels

bu2 opened a new pull request #9022:
URL: https://github.com/apache/arrow/pull/9022


   Purpose a "replace" compute kernel which could fulfil ARROW-10641 - [C++] A "replace" or "map" kernel to replace values in array based on mapping (@jorisvandenbossche). The implementation is inspired by "fill_null" kernel except it takes an additional BooleanArray parameter which is used as a mask to trigger value replacement. 
   **WARNING:** the current implementation expects all null values to be replaced in the output (corresponding bit set to 1 in input mask) because it will not carry nulls into the output (feel free to share your thoughts on the current implementation and to give me hints on the easiest way to deal with nulls that should make it in the output).
   
   Add a "is_nan" kernel to check for NaN "equality" for FloatArray and DoubleArray (based on std::isnan()). The kernel signature is based on "is_null" kernel so I put my code in arrow/compute/kernels/scalar_validity.cc... but the implementation take some inspiration from "compare" kernel.
   
   Both kernels are used to mimic pandas.DataFrame.fillna(value=X) in C++. See below an example of usage:
   
   `
       template <typename value_type>
       std::shared_ptr<DataFrame> DataFrame::fillna(value_type value) {
           auto outdf = std::make_shared<DataFrame>();
   
           if (outdf->table_->num_rows() == 0)
               outdf->table_ = arrow::Table::Make(
                       std::make_shared<arrow::Schema>(std::vector<std::shared_ptr<arrow::Field>>()),
                       std::vector<std::shared_ptr<arrow::ChunkedArray>>(),
                       this->table_->num_rows());
   
           for (int i = 0 ; i < this->table_->num_columns() ; ++i) {
   
               if (this->table_->ColumnNames()[i] == INDEX_COLUMN) {
                   auto field = this->table_->schema()->field(i);
                   auto chunked_array = this->table_->column(i);
                   outdf->table_ = outdf->table_->AddColumn(i, field, chunked_array).ValueOrDie();
               } else {
                   auto field = this->table_->schema()->field(i);
                   auto chunked_array = this->table_->column(i);
                   auto value_datum = arrow::compute::Cast(arrow::Datum(value), chunked_array->type()).ValueOrDie();
                   auto nulls = arrow::compute::IsNull(chunked_array).ValueOrDie();
                   auto nans = **arrow::compute::IsNan(chunked_array)**.ValueOrDie().chunked_array();
                   auto to_replace = arrow::compute::Or(nulls, nans).ValueOrDie();
                   auto clean_chunked_array = **arrow::compute::Replace(chunked_array, nans, value_datum)**.ValueOrDie().chunked_array();
                   outdf->table_ = outdf->table_->AddColumn(i, field, clean_chunked_array).ValueOrDie();
               }
           }
   
           return outdf;
       }
   `
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] bu2 commented on pull request #9022: [C++] Add replace and isnan scalar kernels

Posted by GitBox <gi...@apache.org>.
bu2 commented on pull request #9022:
URL: https://github.com/apache/arrow/pull/9022#issuecomment-751533238


   Killing this pull request to recreate 2 distinct pull requests (one for each kernel) with corresponding Jira ticket.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #9022: [C++] Add replace and isnan scalar kernels

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #9022:
URL: https://github.com/apache/arrow/pull/9022#issuecomment-751510556


   <!--
     Licensed to the Apache Software Foundation (ASF) under one
     or more contributor license agreements.  See the NOTICE file
     distributed with this work for additional information
     regarding copyright ownership.  The ASF licenses this file
     to you under the Apache License, Version 2.0 (the
     "License"); you may not use this file except in compliance
     with the License.  You may obtain a copy of the License at
   
       http://www.apache.org/licenses/LICENSE-2.0
   
     Unless required by applicable law or agreed to in writing,
     software distributed under the License is distributed on an
     "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
     KIND, either express or implied.  See the License for the
     specific language governing permissions and limitations
     under the License.
   -->
   
   Thanks for opening a pull request!
   
   Could you open an issue for this pull request on JIRA?
   https://issues.apache.org/jira/browse/ARROW
   
   Then could you also rename pull request title in the following format?
   
       ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}
   
   See also:
   
     * [Other pull requests](https://github.com/apache/arrow/pulls/)
     * [Contribution Guidelines - How to contribute patches](https://arrow.apache.org/docs/developers/contributing.html#how-to-contribute-patches)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] bu2 closed pull request #9022: [C++] Add replace and isnan scalar kernels

Posted by GitBox <gi...@apache.org>.
bu2 closed pull request #9022:
URL: https://github.com/apache/arrow/pull/9022


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org