You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "agranot (via GitHub)" <gi...@apache.org> on 2023/06/03 10:50:15 UTC

[GitHub] [arrow] agranot opened a new issue, #35897: reader_writer2.cc seems to produce bad parquet file.

agranot opened a new issue, #35897:
URL: https://github.com/apache/arrow/issues/35897

   ### Describe the usage question you have. Please include as many useful details as  possible.
   
   
   Has anyone encountered the following error message when trying to build and run the "reader_writer.cc" C++ example on arrow 9?
   
   `Parquet read error: IOError: Corrupt snappy compressed data`
   
   Important Notes:
   1. I commented out `assert(row_group_reader->metadata()->total_byte_size() < ROW_GROUP_SIZE);` because it seemed that the total bytes written were greater than the expectation. However, I wanted to see if the produced parquet file could still be read by the sample code.
   
   2. I am forced to use g++ 5.4.0 and c++14. I built both arrow 9 (make target parquet-all) and the example with this compiler with no compiler/linker errors.
   Here is the cmake command I used before building arrow 9:
   
   cmake -B=<path to cmake build>
    -DBOOST_ROOT=<path to a boost build>
    -DARROW_PARQUET="ON"
    -DARROW_WITH_SNAPPY="ON"
    -DARROW_BUILD_STATIC="OFF"
    -DARROW_WITH_RE2="OFF"
    -DARROW_WITH_UTF8PROC="OFF"
    -DCMAKE_BUILD_TYPE="Release"
    -DCMAKE_INSTALL_PREFIX:PATH=<install destination>
    -DCMAKE_CXX_STANDARD="14"
    -DCMAKE_CXX_COMPILER=<path to g++>
    -DCMAKE_C_COMPILER=<path to gcc>
   
   ### Component(s)
   
   C++, Parquet


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on issue #35897: reader_writer2.cc seems to produce bad parquet file.

Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace commented on issue #35897:
URL: https://github.com/apache/arrow/issues/35897#issuecomment-1585140042

   I was able to run it with the latest code (I can't easily build 9.0.0 as I would need to recreate a development environment for it).
   
   Did you modify the example at all beyond removing that one assert?
   
   Arrow 9 will be the last version you will be able to build with C++14 (we moved to C++17 in 10.0.0)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] agranot commented on issue #35897: reader_writer2.cc seems to produce bad parquet file.

Posted by "agranot (via GitHub)" <gi...@apache.org>.
agranot commented on issue #35897:
URL: https://github.com/apache/arrow/issues/35897#issuecomment-1585145681

   After debugging I saw that the problem was in the snappy library. Building with an older snappy library resolved the issue.
   There was actually a comment about this in one of the cmake dependencies.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org