You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/11/28 08:42:46 UTC

[GitHub] [arrow] zzzzwj opened a new issue, #14748: column_reader.HasNext() throws an exception "Access violation executing location"

zzzzwj opened a new issue, #14748:
URL: https://github.com/apache/arrow/issues/14748

   ### Describe the usage question you have. Please include as many useful details as  possible.
   
   
   I'm trying to use C++ API to read a parquet file generated by a C# application based on PraquetSharp. As the title said, I encountered a problem of loading the UUID column. The schema, column description and sample code are shown below respectively.
   ![image](https://user-images.githubusercontent.com/23235538/204231588-d3174c9d-496b-438e-a4fd-a5b093836049.png)
   ![image](https://user-images.githubusercontent.com/23235538/204229089-96f3e1ff-f752-4784-9720-f398f7016ce0.png)
   ![image](https://user-images.githubusercontent.com/23235538/204230699-3870f353-503c-4938-96e4-e8837746cdc9.png)
   Is the method I called incorrect?
   
   P.S I can successfully load the FixedLenByteArray Column from the file generated by C++ API with "logical_type=NONE".
   
   ### Component(s)
   
   C++


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zzzzwj commented on issue #14748: [C++][Parquet] column_reader.HasNext() throws an exception "Access violation executing location"

Posted by GitBox <gi...@apache.org>.
zzzzwj commented on issue #14748:
URL: https://github.com/apache/arrow/issues/14748#issuecomment-1333206692

   Thanks for your kindly reply. Here are my answers.
   1. Sure. You can download it from [Google Drive](https://drive.google.com/file/d/1JHK6R1gvk_dbzWavUY_LZDJsOg7HAZZd/view?usp=share_link) and [One Drive](https://microsoftapc-my.sharepoint.com/:u:/g/personal/wenjiezhang_microsoft_com/EVfTkXxbbFlFiTQHmN_d2CgBpiesE58CESuLcL01Bo-jHw?e=bBh5Fk).
   2. Sorry, I don't know what you mean. The details of access violation? All I can get is "Exception thrown at 0x00000167FD250550 in TestCppParquet.exe: 0xC0000005: Access violation executing location 0x00000167FD250550" in Visual Studio.
   ![image](https://user-images.githubusercontent.com/23235538/204969866-99e2185b-373c-4383-aaf5-79e6792054dd.png)
   3. No. I cannot read the file without logical type in C#.
   
   My develop env list.
   1. Visual Studio 2022
   2. C++17
   3. [ParquetSharp](https://github.com/G-Research/ParquetSharp) v8.0.0
   4. arrow v10.0.0


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zzzzwj commented on issue #14748: [C++][Parquet] column_reader.HasNext() throws an exception "Access violation executing location"

Posted by GitBox <gi...@apache.org>.
zzzzwj commented on issue #14748:
URL: https://github.com/apache/arrow/issues/14748#issuecomment-1357008040

   That's alright. I'll check the comments as soon as I can. Thanks for your kind reply again.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on issue #14748: [C++][Parquet] column_reader.HasNext() throws an exception "Access violation executing location"

Posted by GitBox <gi...@apache.org>.
pitrou commented on issue #14748:
URL: https://github.com/apache/arrow/issues/14748#issuecomment-1332019387

   @emkornfield 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] emkornfield commented on issue #14748: [C++][Parquet] column_reader.HasNext() throws an exception "Access violation executing location"

Posted by GitBox <gi...@apache.org>.
emkornfield commented on issue #14748:
URL: https://github.com/apache/arrow/issues/14748#issuecomment-1350476049

   Apologies for the late reply.
   
   > So, we need to update all the doc or comments mentioned "ownership" of a returned shared_ptr, right?
   
   Yes, I believe so.  Would need to double check all of the methods
   
   > And another question, how to assign "UUID" to the logical type of a column? When I setup the schema followed by the [example code](https://github.com/apache/arrow/blob/75ae2cca38ea36c521c9b9c2dc30f8e12762d409/cpp/examples/parquet/low_level_api/reader_writer.h#L62), there is no choice of "UUID" like value in parquet::ConvertedType.
   
   You need to use the variant that takes a logical type and construct it with the [static factory](https://github.com/apache/arrow/blob/147b5c922efe19d34ef7e7cda635b7d8a07be2eb/cpp/src/parquet/schema.h#L215)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] emkornfield commented on issue #14748: [C++][Parquet] column_reader.HasNext() throws an exception "Access violation executing location"

Posted by GitBox <gi...@apache.org>.
emkornfield commented on issue #14748:
URL: https://github.com/apache/arrow/issues/14748#issuecomment-1333332838

   So looking into it it appears the problem is likely:
   ```
   static_cast<parquet::FixedLenByteArrayReader*>(rg_reader->Column(10).get());
   ```
   The docs on [`Column()`](https://github.com/apache/arrow/blob/75ae2cca38ea36c521c9b9c2dc30f8e12762d409/cpp/src/parquet/file_reader.h#L56) appear to be incorrect it looks like a fresh reader is [returned each time](https://github.com/apache/arrow/blame/75ae2cca38ea36c521c9b9c2dc30f8e12762d409/cpp/src/parquet/file_reader.cc#L75).  
   
   If you assign the column reader to a variable first and then do the cast does it fix your issue?  If it does, would you like to contribute a PR updating the docs?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zzzzwj commented on issue #14748: [C++][Parquet] column_reader.HasNext() throws an exception "Access violation executing location"

Posted by GitBox <gi...@apache.org>.
zzzzwj commented on issue #14748:
URL: https://github.com/apache/arrow/issues/14748#issuecomment-1333291500

   Yeah, the exception type varies per run. Sometimes Access Violation and Parquet Status Exception the other. I'm also confused about that.
   
   ```(C++)
   int main() {
       auto reader = parquet::ParquetFileReader::OpenFile("D:\\sample.parquet", false);
       std::cout << reader->metadata()->schema()->ToString() << std::endl;
       auto rg_reader = reader->RowGroup(0);
       for (int i = 0; i < rg_reader->metadata()->num_columns(); i++) {
           std::cout << "Column " << i << std::endl << rg_reader->Column(i)->descr()->ToString() << std::endl << std::endl;
       }
       auto column_reader = static_cast<parquet::FixedLenByteArrayReader*>(rg_reader->Column(10).get());
       int64_t bytes_read = 0;
       int16_t def_levels, rep_levels;
       while (column_reader->HasNext()) {
           parquet::FixedLenByteArray fl_byte_array;
           column_reader->ReadBatch(1, &def_levels, &rep_levels, &fl_byte_array, &bytes_read);
           break;
       }
       return 0;
   }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zzzzwj closed issue #14748: [C++][Parquet] column_reader.HasNext() throws an exception "Access violation executing location"

Posted by GitBox <gi...@apache.org>.
zzzzwj closed issue #14748: [C++][Parquet] column_reader.HasNext() throws an exception "Access violation executing location"
URL: https://github.com/apache/arrow/issues/14748


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zzzzwj commented on issue #14748: [C++][Parquet] column_reader.HasNext() throws an exception "Access violation executing location"

Posted by GitBox <gi...@apache.org>.
zzzzwj commented on issue #14748:
URL: https://github.com/apache/arrow/issues/14748#issuecomment-1333405054

   And another question, how to assign "UUID" to the logical type of a column? When I setup the schema followed by the [example code](https://github.com/apache/arrow/blob/75ae2cca38ea36c521c9b9c2dc30f8e12762d409/cpp/examples/parquet/low_level_api/reader_writer.h#L62), there is no choice of "UUID" like value in parquet::ConvertedType.
   ```(C++)
   fields.push_back(
       PrimitiveNode::Make(
           "ActivityId",
           Repetition::REQUIRED,
           Type::FIXED_LEN_BYTE_ARRAY,
           ConvertedType::NONE,              // Is there any other way to assign "UUID" to the logical type?
           GUID_LEN
       )
   );
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] emkornfield commented on issue #14748: [C++][Parquet] column_reader.HasNext() throws an exception "Access violation executing location"

Posted by GitBox <gi...@apache.org>.
emkornfield commented on issue #14748:
URL: https://github.com/apache/arrow/issues/14748#issuecomment-1333164082

   On the surface the reading code looks correct to me (at least the part not blocked by the exception):
   1.  Could you share the file that is generating this error?
   2. What are the details on the exception being thrown?
   3. You mention a file written by C++ code without the logical type, is the same true for the C# code?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] emkornfield commented on issue #14748: [C++][Parquet] column_reader.HasNext() throws an exception "Access violation executing location"

Posted by GitBox <gi...@apache.org>.
emkornfield commented on issue #14748:
URL: https://github.com/apache/arrow/issues/14748#issuecomment-1333336429

   In fact, I think we might want to review most ownership comments in the file, I think all of these might be saying, that when the shared_ptr is returned from one of the objects, the Object must outlive the shared pointer as there is shared referenced state.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zzzzwj commented on issue #14748: [C++][Parquet] column_reader.HasNext() throws an exception "Access violation executing location"

Posted by GitBox <gi...@apache.org>.
zzzzwj commented on issue #14748:
URL: https://github.com/apache/arrow/issues/14748#issuecomment-1333384123

   Aha, it works! Thanks for your help. It'll be my pleasure to contribute a PR to this repo. So, we need to update all the doc or comments mentioned "ownership" of a returned shared_ptr?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] emkornfield commented on issue #14748: [C++][Parquet] column_reader.HasNext() throws an exception "Access violation executing location"

Posted by GitBox <gi...@apache.org>.
emkornfield commented on issue #14748:
URL: https://github.com/apache/arrow/issues/14748#issuecomment-1333167323

   Also what versions of the libraries are you using?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zzzzwj commented on issue #14748: [C++][Parquet] column_reader.HasNext() throws an exception "Access violation executing location"

Posted by GitBox <gi...@apache.org>.
zzzzwj commented on issue #14748:
URL: https://github.com/apache/arrow/issues/14748#issuecomment-1333310892

   And my compile parameter is 
   ```(Bash)
   cmake .. -G "Visual Studio 17 2022" -A x64 -DARROW_PARQUET=ON -DPARQUET_REQUIRE_ENCRYPTION=ON -DARROW_WITH_BZ2=ON -DARROW_WITH_LZ4=ON -DARROW_WITH_SNAPPY=ON -DARROW_WITH_ZLIB=ON -DARROW_WITH_ZSTD=ON -DCMAKE_BUILD_TYPE=Release
   cmake --build . --config Release
   ```
   Hope this can help.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] emkornfield commented on issue #14748: [C++][Parquet] column_reader.HasNext() throws an exception "Access violation executing location"

Posted by GitBox <gi...@apache.org>.
emkornfield commented on issue #14748:
URL: https://github.com/apache/arrow/issues/14748#issuecomment-1333273256

   > Sorry, I don't know what you mean. The details of access violation? 
   
   The exception shown in the first screen shot are different then the second screenshot.  The first one shows ParquetStatusException which I would expect to have more details, the second one is an access violation.
   
   Could you upload the full code snippet (parts are hidden and I assume it is being called correctly but would like to verify) and also try on my box.  Using a precompiled version  pyarrow (6.0.1) to open the sample file seems to work fine for me, we should also try with later versions as there might have been a regression.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org