You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "tahsinreza (via GitHub)" <gi...@apache.org> on 2023/03/18 16:06:26 UTC

[GitHub] [arrow] tahsinreza opened a new issue, #34633: [C++][Parquet] Can't read converted type DECIMAL in Parquet Stream Reader C++ API

tahsinreza opened a new issue, #34633:
URL: https://github.com/apache/arrow/issues/34633

   ### Describe the bug, including details regarding any error messages, version, and platform.
   
   
   I'm trying to read the following field in a parquet file using the Stream Reader C++ API (I didn't create the parquet file):
   
   Schema output:
   ```text
   optional fixed_len_byte_array(16) field_id=-1 processstarted_seconds (Decimal(precision=38, scale=18));
   ```
   
   ```c++
   // code
   optional<std::array<char, 16>> processstarted_seconds;
   ... 
   stream_reader >> columns_obj.processstarted_seconds;
   // code
   ```
   
   When I run the above code, I get the following runtime error message:
   ```text
   what():  Column converted type mismatch.  Column 'processstarted_seconds' has converted type 'DECIMAL' not 'NONE'
   ```
   
   The error is originating here: https://github.com/apache/arrow/blob/main/cpp/src/parquet/stream_reader.cc#L158
   
   Note that I'm able to read the FIXED_LEN_BYTE_ARRAY type with parquet::ConvertedType::NONE. For example, the following works fine.
   Schema output:
   ```text
   optional fixed_len_byte_array(4) field_id=-1 country_char[4];
   ```
   
   I'm using Arrow-Parquet C++ v9.0.0.
   
   Thanks for your help.
   
   
   ### Component(s)
   
   C++


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wgtmac commented on issue #34633: [C++][Parquet] Can't read converted type DECIMAL in Parquet Stream Reader C++ API

Posted by "wgtmac (via GitHub)" <gi...@apache.org>.
wgtmac commented on issue #34633:
URL: https://github.com/apache/arrow/issues/34633#issuecomment-1478876336

   take


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] tahsinreza commented on issue #34633: [C++][Parquet] Can't read converted type DECIMAL in Parquet Stream Reader C++ API

Posted by "tahsinreza (via GitHub)" <gi...@apache.org>.
tahsinreza commented on issue #34633:
URL: https://github.com/apache/arrow/issues/34633#issuecomment-1478691831

   Thanks for looking into this.  Can you please provide an example of the work around? I'm looking for something similar to the example [here](https://github.com/apache/arrow/blob/main/cpp/examples/parquet/parquet_stream_api/stream_reader_writer.cc#L251).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wgtmac commented on issue #34633: [C++][Parquet] Can't read converted type DECIMAL in Parquet Stream Reader C++ API

Posted by "wgtmac (via GitHub)" <gi...@apache.org>.
wgtmac commented on issue #34633:
URL: https://github.com/apache/arrow/issues/34633#issuecomment-1475858467

   I took an initial look and found that reading decimals from `FIXED_LEN_BYTE_ARRAY` is not yet implemented. To work around this, you may need to read it with `parquet::ConvertedType::NONE` and then use `arrow::Decimal128` to restore the value.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] tahsinreza commented on issue #34633: [C++][Parquet] Can't read converted type DECIMAL in Parquet Stream Reader C++ API

Posted by "tahsinreza (via GitHub)" <gi...@apache.org>.
tahsinreza commented on issue #34633:
URL: https://github.com/apache/arrow/issues/34633#issuecomment-1595655286

   @wgtmac It took a while but I finally had some time to try out the updated code. Seems like its working as expected. Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace closed issue #34633: [C++][Parquet] Can't read converted type DECIMAL in Parquet Stream Reader C++ API

Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace closed issue #34633: [C++][Parquet] Can't read converted type DECIMAL in Parquet Stream Reader C++ API
URL: https://github.com/apache/arrow/issues/34633


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] tahsinreza commented on issue #34633: [C++][Parquet] Can't read converted type DECIMAL in Parquet Stream Reader C++ API

Posted by "tahsinreza (via GitHub)" <gi...@apache.org>.
tahsinreza commented on issue #34633:
URL: https://github.com/apache/arrow/issues/34633#issuecomment-1478873958

   I already tried reading in the DECIMAL value in a char char_array[N]. It throws the following error (please see the original post): 
   
   optional<std::array<char, 16>> processstarted_seconds;
   ... 
   stream_reader >> processstarted_seconds;
   
   what():  Column converted type mismatch.  Column 'processstarted_seconds' has converted type 'DECIMAL' not 'NONE'


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] tahsinreza commented on issue #34633: [C++][Parquet] Can't read converted type DECIMAL in Parquet Stream Reader C++ API

Posted by "tahsinreza (via GitHub)" <gi...@apache.org>.
tahsinreza commented on issue #34633:
URL: https://github.com/apache/arrow/issues/34633#issuecomment-1480271510

   Please find attached an example parquet file. The file has three columns/fields. Thanks.
   
   Column types:
   ```
   optional binary field_id=-1 pid_hash (String);
   optional double field_id=-1 t1;
   optional fixed_len_byte_array(16) field_id=-1 t2 (Decimal(precision=38, scale=0));
   ```
   
   [decimal_test.parquet.zip](https://github.com/apache/arrow/files/11044813/decimal_test.parquet.zip)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wgtmac commented on issue #34633: [C++][Parquet] Can't read converted type DECIMAL in Parquet Stream Reader C++ API

Posted by "wgtmac (via GitHub)" <gi...@apache.org>.
wgtmac commented on issue #34633:
URL: https://github.com/apache/arrow/issues/34633#issuecomment-1491373548

   @tahsinreza This has been fixed and can be tested on the latest main branch.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] tahsinreza commented on issue #34633: [C++][Parquet] Can't read converted type DECIMAL in Parquet Stream Reader C++ API

Posted by "tahsinreza (via GitHub)" <gi...@apache.org>.
tahsinreza commented on issue #34633:
URL: https://github.com/apache/arrow/issues/34633#issuecomment-1492309621

   Thanks for letting me know. I'll test the updated code next week and get back to you.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wgtmac commented on issue #34633: [C++][Parquet] Can't read converted type DECIMAL in Parquet Stream Reader C++ API

Posted by "wgtmac (via GitHub)" <gi...@apache.org>.
wgtmac commented on issue #34633:
URL: https://github.com/apache/arrow/issues/34633#issuecomment-1595792419

   Thanks for confirmation!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wgtmac commented on issue #34633: [C++][Parquet] Can't read converted type DECIMAL in Parquet Stream Reader C++ API

Posted by "wgtmac (via GitHub)" <gi...@apache.org>.
wgtmac commented on issue #34633:
URL: https://github.com/apache/arrow/issues/34633#issuecomment-1478819743

   You may use `char char_array[N]` to read decimal value into fixed_length_byte_array type first. Then follow the code here to convert it to `arrow::Decimal128`: https://github.com/apache/arrow/blob/main/cpp/src/parquet/arrow/reader_internal.cc#L579
   
   Hope it helps. Probably we should explicitly add an interface to read/write `arrow::Decimal128` directly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wgtmac commented on issue #34633: [C++][Parquet] Can't read converted type DECIMAL in Parquet Stream Reader C++ API

Posted by "wgtmac (via GitHub)" <gi...@apache.org>.
wgtmac commented on issue #34633:
URL: https://github.com/apache/arrow/issues/34633#issuecomment-1478876208

   OK, could you please a sample parquet file? I can try to fix it later.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org