You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/01/26 07:21:43 UTC

[GitHub] [arrow] chenchi-ponyai opened a new issue #9325: [arrow c++]When I read parquet file and get a arrow::Table, how can I convert it to std::vector?

chenchi-ponyai opened a new issue #9325:
URL: https://github.com/apache/arrow/issues/9325


   I refer to https://github.com/apache/arrow/blob/master/cpp/examples/arrow/row-wise-conversion-example.cc
   But the demo is easy, the column I used is very complex.
   For example, there are list of struct, and multiple struct nesting, I don't know how to convert it to c++ basic data type, like std::vector. Is there some examples of read parquet file?
   Thank you~


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield closed issue #9325: [arrow c++]When I read parquet file and get a arrow::Table, how can I convert it to std::vector?

Posted by GitBox <gi...@apache.org>.
emkornfield closed issue #9325:
URL: https://github.com/apache/arrow/issues/9325


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] joeyac commented on issue #9325: [arrow c++]When I read parquet file and get a arrow::Table, how can I convert it to std::vector?

Posted by GitBox <gi...@apache.org>.
joeyac commented on issue #9325:
URL: https://github.com/apache/arrow/issues/9325#issuecomment-768241897


   You can convert struct to c++ basic data type by StructArray like this line: https://github.com/apache/arrow/blob/master/cpp/examples/arrow/row-wise-conversion-example.cc#L145


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] westonpace commented on issue #9325: [arrow c++]When I read parquet file and get a arrow::Table, how can I convert it to std::vector?

Posted by GitBox <gi...@apache.org>.
westonpace commented on issue #9325:
URL: https://github.com/apache/arrow/issues/9325#issuecomment-769971724


   @joeyac is on the right track.  A `ListArray` is an array that has an extra list of offsets.  So if you have a `ListArray` of struct then you can think of it conceptually like...
   ```
   [{a: 1}, {a: 2}, {a: 3}, {a: 4}, {a: 5}, {a: 6}] with offsets [0, 2, 4, 6]
   ```
   This represents the two dimensional list...
   ```
   [
     [{a: 1}, {a: 2}],
     [{a: 3}, {a: 4}],
     [{a: 5}, {a: 6}]
   ]
   ```
   The offsets tell you the start and stop of each inner list (e.g. the first list starts at index 0 and stops at index 2).  The function `value_slice` mentioned by @joeyac will give you the nth list as an Array (the type of that array will match the type of the list array so it will be an array of struct).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] xuanqing94 commented on issue #9325: [arrow c++]When I read parquet file and get a arrow::Table, how can I convert it to std::vector?

Posted by GitBox <gi...@apache.org>.
xuanqing94 commented on issue #9325:
URL: https://github.com/apache/arrow/issues/9325#issuecomment-791152745


   @joeyac Sorry to bring it up again... but I am deeply confused about this line of code:
   `auto column_struct = std::static_pointer_cast<arrow::StructArray>(table->GetColumnByName("column")->chunk(0));
   `
   since GetColumnByName returns a shared_ptr to ChunkedArray, would accessing only the chunk(0) returns incomplete data? 
   like if the data file is huge and truncated (from the document I couldn't really get the idea). 
   Could you say a bit more about this? Thanks a lot!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on issue #9325: [arrow c++]When I read parquet file and get a arrow::Table, how can I convert it to std::vector?

Posted by GitBox <gi...@apache.org>.
emkornfield commented on issue #9325:
URL: https://github.com/apache/arrow/issues/9325#issuecomment-770164349


   Thanks for the discussion here.  In general we prefer to handle these types of questions on the user@ mailing list.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] joeyac edited a comment on issue #9325: [arrow c++]When I read parquet file and get a arrow::Table, how can I convert it to std::vector?

Posted by GitBox <gi...@apache.org>.
joeyac edited a comment on issue #9325:
URL: https://github.com/apache/arrow/issues/9325#issuecomment-768899759


   I didn't test schema like this.
   But the data in arrow buffer is flat and `ListArray` is a two-dim data, first you can try with `ListArray::value_slice(idx)` to get single row data. Notice that the single row data type is `std::shared_ptr<Array>`, then you can try cast it into a `StructArray`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] chenchi-ponyai commented on issue #9325: [arrow c++]When I read parquet file and get a arrow::Table, how can I convert it to std::vector?

Posted by GitBox <gi...@apache.org>.
chenchi-ponyai commented on issue #9325:
URL: https://github.com/apache/arrow/issues/9325#issuecomment-768883924


   Thank you very much!
   But the question is if I get a object which type is arrow::ListArray, like this:
   `auto elements =std::static_pointer_cast<arrow::ListArray>(table->column(0)->chunk(0));`
   The every element's type of elements is struct, like Data you mentioned. Now I need to parse by line, each line is a list of struct, I don’t know how to get every list through offset.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] westonpace commented on issue #9325: [arrow c++]When I read parquet file and get a arrow::Table, how can I convert it to std::vector?

Posted by GitBox <gi...@apache.org>.
westonpace commented on issue #9325:
URL: https://github.com/apache/arrow/issues/9325#issuecomment-792201837


   @xuanqing94 Yes, that example is only a basic example to get started.  In actual code you would want to process every chunk.  You can never know when a table has multiple chunks.  It can even happen for rather small tables depending on how they were generated.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] chenchi-ponyai commented on issue #9325: [arrow c++]When I read parquet file and get a arrow::Table, how can I convert it to std::vector?

Posted by GitBox <gi...@apache.org>.
chenchi-ponyai commented on issue #9325:
URL: https://github.com/apache/arrow/issues/9325#issuecomment-768785630


   > You can convert struct to c++ basic data type by StructArray like this line: https://github.com/apache/arrow/blob/master/cpp/examples/arrow/row-wise-conversion-example.cc#L145
   
   How can I parse arrow::ListArray, the type is struct, is there some example, I'm new to c++.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] joeyac commented on issue #9325: [arrow c++]When I read parquet file and get a arrow::Table, how can I convert it to std::vector?

Posted by GitBox <gi...@apache.org>.
joeyac commented on issue #9325:
URL: https://github.com/apache/arrow/issues/9325#issuecomment-768870886


   for a column named `column` with data in struct like this:
   ```cpp
   struct Data {
    int64_t x;
    std::vector<int64_t> y;
   };
   ```
   you can fetch data like this:
   ```cpp
   // for table
   auto column_struct = std::static_pointer_cast<arrow::StructArray>(table->GetColumnByName("column")->chunk(0));
   // for recordBatch
   auto column_struct = std::static_pointer_cast<arrow::StructArray>(recordBatch->GetColumnByName("column"));
   // then iterate on fields
   for (int idx = 0; idx < column_struct->num_fields(); ++idx) {
    auto type_name =  column_struct->field(idx)->type()->name();
    if (type_name == "list") {
      auto list_ = std::static_pointer_cast<arrow::ListArray>(column_struct->field(idx));
     // then do it like code in example
    } else if (type_name == "int64") {
      ...
   }
   ```
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] joeyac commented on issue #9325: [arrow c++]When I read parquet file and get a arrow::Table, how can I convert it to std::vector?

Posted by GitBox <gi...@apache.org>.
joeyac commented on issue #9325:
URL: https://github.com/apache/arrow/issues/9325#issuecomment-768899759


   I didn't test schema like this.
   But the data in arrow buffer is flat and `ListArray` is a two-dim data, first you can try with ListArray::value_slice(idx) to get single row data. Notice that the single row data type is `std::shared_ptr<Array>`, then you can try cast it into a `StructArray`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] joeyac commented on issue #9325: [arrow c++]When I read parquet file and get a arrow::Table, how can I convert it to std::vector?

Posted by GitBox <gi...@apache.org>.
joeyac commented on issue #9325:
URL: https://github.com/apache/arrow/issues/9325#issuecomment-768849731


   There is an arrow type named arrow::StructArray, you can get data by offset in the buffer like the arrow::ListArray code in example. If you want use this open source project with different custom demands, I suggest you read the source code and the unit test in source.
   In the past week, I spent a lot of time reading the c++ source code and learned a lot.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org