You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "ExpandingMan (via GitHub)" <gi...@apache.org> on 2023/05/26 19:56:32 UTC

[GitHub] [arrow] ExpandingMan opened a new issue, #35797: help debugging thrift issue?

ExpandingMan opened a new issue, #35797:
URL: https://github.com/apache/arrow/issues/35797

   ### Describe the usage question you have. Please include as many useful details as  possible.
   
   
   Hi, I maintain [Parquet2.jl](https://gitlab.com/ExpandingMan/Parquet2.jl) and [Thrift2.jl](https://gitlab.com/ExpandingMan/Thrift2.jl).
   
   I have recently re-implemented the thrift protocol in Julia (Thrift.jl) because the older implementation, Thrift.jl was extremely slow.  Currently, my output from Thrift2.jl is read properly by Thrift.jl, Thrift2.jl and fastparquet (all of which have completely separate read implementations), but currently `pyarrow` gets the following error:
   ```
   ERROR: Python: OSError: Could not open Parquet input source '<Buffer>': Couldn't deserialize thrift: TProtocolException: Invalid data
   
   Python stacktrace:
    [1] pyarrow.lib.check_status
      @ pyarrow/error.pxi:115
    [2] pyarrow.lib.pyarrow_internal_check_status
      @ pyarrow/error.pxi:144
    [3] pyarrow._dataset.Fragment.physical_schema.__get__
      @ pyarrow/_dataset.pyx:1345
   ```
   
   Unfortunately this error is *extremely* opaque, so it's very hard for me to figure out what's going on.  I was wondering if anyone could offer any suggestions on how to debug it.  Thanks.
   
   ### Component(s)
   
   Parquet


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ExpandingMan commented on issue #35797: help debugging thrift issue?

Posted by "ExpandingMan (via GitHub)" <gi...@apache.org>.
ExpandingMan commented on issue #35797:
URL: https://github.com/apache/arrow/issues/35797#issuecomment-1565085895

   Ok, so I have found that the best way to debug this issue was to work with the C++ implementation of thrift directly.
    
   It turns out that what was happening was that I was varint and zigzag encoding `Int8` values, when the thrift spec explicitly states that they are special cased.  Of course I now find myself wishing that this special case was mentioned in the spec in gigantic red letters, but I can't exactly blame the spec authors for that can I.
   
   Anyway, only commentary in regards to `pyarrow` is that it would be nice if metadata failures were not so incredibly opaque.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ExpandingMan closed issue #35797: help debugging thrift issue?

Posted by "ExpandingMan (via GitHub)" <gi...@apache.org>.
ExpandingMan closed issue #35797: help debugging thrift issue?
URL: https://github.com/apache/arrow/issues/35797


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org