You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/11/06 08:28:26 UTC

[GitHub] [arrow-rs] tustvold commented on issue #3017: Converted type is None according to Parquet Tools then utilizing logical types

tustvold commented on issue #3017:
URL: https://github.com/apache/arrow-rs/issues/3017#issuecomment-1304746170

   I've narrowed this down to pyarrow not being able to read the converted type correctly.
   
   ```
   >>> import pyarrow.parquet as pq
   >>> pq.ParquetFile('tmp.par').schema.column(0).converted_type
   'NONE'
   ```
   
   However, fastparquet is able to read the converted type, as it is correctly encoded in the thrift definition
   
   ```
   >>> ParquetFile('tmp.par').schema.schema_element('col1').converted_type
   10
   ```
   
   I also tried https://github.com/xitongsys/parquet-go/tree/master/tool/parquet-tools, which resulted in
   
   ```
   ./parquet-tools -cmd schema -file /home/raphael/repos/external/arrow-rs/parquet/tmp.par 
   {
     "Tag": "name=Schema, repetitiontype=REQUIRED",
     "Fields": [
       {
         "Tag": "name=Col1, type=INT64, convertedtype=TIMESTAMP_MICROS, repetitiontype=REQUIRED"
       }
     ]
   }
   ```
   
   This leads me to think that this is actually a bug in the pyarrow, and therefore the C++ arrow implementation. Perhaps you might like to raise a bug there?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org