You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/10/28 15:06:08 UTC

[GitHub] [arrow] romainfrancois commented on pull request #8365: ARROW-6582: [R] Arrow to R fails with embedded nuls in strings

romainfrancois commented on pull request #8365:
URL: https://github.com/apache/arrow/pull/8365#issuecomment-717996007


   based on @bkietz comment from https://github.com/apache/arrow/pull/8536 Converter_String::Ingest_some_nulls() could end with: 
   
   ```cpp
   cpp11::unwind_protect([&] {
         if (array->null_count()) {
           // need to watch for nulls
           arrow::internal::BitmapReader null_reader(array->null_bitmap_data(),
                                                     array->offset(), n);
           for (int i = 0; i < n; i++, null_reader.Next()) {
             if (null_reader.IsSet()) {
               SET_STRING_ELT(data, start + i, unsafe_r_string(string_array->GetString(i)));
             } else {
               SET_STRING_ELT(data, start + i, NA_STRING);
             }
           }
   
         } else {
           for (int i = 0; i < n; i++) {
             SET_STRING_ELT(data, start + i, unsafe_r_string(string_array->GetString(i)));
           }
         }
       });
   ```
   
   this builds on knowing that `GetString()` gives an utf8 string, and that we know its size, as opposed to what `cpp11::r_string(std::string)` does here: https://github.com/r-lib/cpp11/blob/master/inst/include/cpp11/r_string.hpp#L18 and uses unwind protection around the loop instead of the individual conversions. 
   
   i.e. it assumes utf-8 but since it does not use the known size, it searches for `nul`: r_string(const std::string& data) : data_(safe[Rf_mkCharCE](data.c_str(), CE_UTF8)) {}
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org