You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/01/25 03:45:02 UTC

[GitHub] [arrow] joeyac opened a new issue #9311: [arrow c++] How can I modify `RecordBatch` or `Table` value piped from another c++ program?

joeyac opened a new issue #9311:
URL: https://github.com/apache/arrow/issues/9311


   writer.cc: construct table like https://arrow.apache.org/docs/cpp/examples/row_columnar_conversion.html :: VectorToColumnarTable, then use the following code write table to stdout:
   ```
   arrow::Status WriteTableToStdout(const std::shared_ptr<arrow::Table> &table) {
     arrow::TableBatchReader tbr(*table);
     BatchVector out_batches;
     std::shared_ptr<arrow::RecordBatch> rb = nullptr;
     while (true) {
       ARROW_RETURN_NOT_OK(tbr.ReadNext(&rb));
       if (rb == nullptr) break;
       out_batches.push_back(rb);
     }
   
     arrow::io::StdoutStream output_stream;
     auto write_options = arrow::ipc::IpcWriteOptions::Defaults();
     PARQUET_ASSIGN_OR_THROW(
         std::shared_ptr<arrow::ipc::RecordBatchWriter> writer,
         arrow::ipc::MakeStreamWriter(&output_stream,
                                      out_batches[0]->schema(), write_options));
     for (const auto& batch : out_batches) {
       ARROW_RETURN_NOT_OK(writer->WriteRecordBatch(*batch));
     }
     std::cerr << "write done" << std::endl;
     writer->Close();
     return arrow::Status::OK();
   }
   ```
   
   reader.cc:
   use the following code read RecordBatch from stdin:
   ```
   arrow::Status ReadTableFromStdin(std::shared_ptr<arrow::Table>* table) {
     BatchVector out_batches;
     arrow::io::StdinStream input_stream;
     PARQUET_ASSIGN_OR_THROW(auto stream_reader, arrow::ipc::RecordBatchStreamReader::Open(&input_stream));
     PARQUET_THROW_NOT_OK(stream_reader->ReadAll(&out_batches));
     PARQUET_ASSIGN_OR_THROW(*table, arrow::Table::FromRecordBatches(out_batches));
     return arrow::Status::OK();
   }
   ```
   
   finally, I found `mutable_ccv_ptr` is nullptr so I can't modify the table value.
   ```
     auto cost_components =
         std::static_pointer_cast<arrow::ListArray>(table->column(2)->chunk(0));
     auto cost_components_values =
         std::static_pointer_cast<arrow::DoubleArray>(cost_components->values());
     const double* ccv_ptr = cost_components_values->data()->GetValues<double>(1);
     double* mutable_ccv_ptr = cost_components_values->data()->GetMutableValues<double>(1);
   ```
   
   But in example https://arrow.apache.org/docs/cpp/examples/row_columnar_conversion.html, I found I can modify value through `mutable_ccv_ptr` like this.
   
   Did I use it wrong?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] bkietz commented on issue #9311: [arrow c++] How can I modify `RecordBatch` or `Table` value piped from another c++ program?

Posted by GitBox <gi...@apache.org>.
bkietz commented on issue #9311:
URL: https://github.com/apache/arrow/issues/9311#issuecomment-766925916


   Arrow data structures like RecordBatch and Table are intended to be immutable, so modification of their internal buffers like this is not really supported.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] bkietz commented on issue #9311: [arrow c++] How can I modify `RecordBatch` or `Table` value piped from another c++ program?

Posted by GitBox <gi...@apache.org>.
bkietz commented on issue #9311:
URL: https://github.com/apache/arrow/issues/9311#issuecomment-766925916


   Arrow data structures like RecordBatch and Table are intended to be immutable, so modification of their internal buffers like this is not really supported.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] joeyac commented on issue #9311: [arrow c++] How can I modify `RecordBatch` or `Table` value piped from another c++ program?

Posted by GitBox <gi...@apache.org>.
joeyac commented on issue #9311:
URL: https://github.com/apache/arrow/issues/9311#issuecomment-766898581






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] joeyac commented on issue #9311: [arrow c++] How can I modify `RecordBatch` or `Table` value piped from another c++ program?

Posted by GitBox <gi...@apache.org>.
joeyac commented on issue #9311:
URL: https://github.com/apache/arrow/issues/9311#issuecomment-766968333


   @bkietz so if I want to modify some data in RecordBatch or Table, the probably way is to call `RemoveColumn` and `AddColumn`? This seems less efficient if I only want to change a few data in a large column.
   For other languages like python/java, seems support to change data efficiently.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on issue #9311: [arrow c++] How can I modify `RecordBatch` or `Table` value piped from another c++ program?

Posted by GitBox <gi...@apache.org>.
emkornfield commented on issue #9311:
URL: https://github.com/apache/arrow/issues/9311#issuecomment-770164491


   another option is a discussion on the dev@ mailing list.  closing for now.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] bkietz commented on issue #9311: [arrow c++] How can I modify `RecordBatch` or `Table` value piped from another c++ program?

Posted by GitBox <gi...@apache.org>.
bkietz commented on issue #9311:
URL: https://github.com/apache/arrow/issues/9311#issuecomment-768344025


   Since you're proposing a new feature, could you [open a JIRA](https://issues.apache.org/jira/secure/CreateIssueDetails!init.jspa?pid=12319525) with a description of the mutation you'd like to accomplish and an example of how you'd express it in Python?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] joeyac commented on issue #9311: [arrow c++] How can I modify `RecordBatch` or `Table` value piped from another c++ program?

Posted by GitBox <gi...@apache.org>.
joeyac commented on issue #9311:
URL: https://github.com/apache/arrow/issues/9311#issuecomment-766898581


   finally, I found the related source code:
   https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/memory.cc#L351
   before this line, `buffer_` from `StdinStream` is mutable, but `buffer` sliced by `SliceBuffer` returns an immutable `buffer` by default.
   I replace
   ```cpp
   return SliceBuffer(buffer_, position, nbytes);
   ```
   with
   ```cpp
   if (buffer_->is_mutable()) return SliceMutableBuffer(buffer_, position, nbytes);
   else return SliceBuffer(buffer_, position, nbytes);
   ```
   then I can modify data with mutable data ptr.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield closed issue #9311: [arrow c++] How can I modify `RecordBatch` or `Table` value piped from another c++ program?

Posted by GitBox <gi...@apache.org>.
emkornfield closed issue #9311:
URL: https://github.com/apache/arrow/issues/9311


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org