You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "siumingdev (via GitHub)" <gi...@apache.org> on 2023/04/27 12:51:45 UTC

[GitHub] [arrow-datafusion-python] siumingdev opened a new issue, #349: Support reading Avro files in zstd codec

siumingdev opened a new issue, #349:
URL: https://github.com/apache/arrow-datafusion-python/issues/349

   **Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
   I would like to read Avro files in ztsd codec, like this
   ```
   from datafusion import SessionContext
   
   ctx = SessionContext()
   ctx.read_avro("/path/to/my/avro/in/zstd/codec")
   ```
   But currently it gives the following error:
   ```
   ---------------------------------------------------------------------------
   Exception                                 Traceback (most recent call last)
   Cell In[4], line 4
         1 from datafusion import SessionContext
         3 ctx = SessionContext()
   ----> 4 ctx.read_avro("/path/to/my/avro/in/zstd/codec")
   
   Exception: DataFusion error: AvroError(CodecNotSupported("zstandard"))
   ```
   
   **Describe the solution you'd like**
   No idea, is it even not supported in the original Rust implementation?
   
   **Describe alternatives you've considered**
   Read the file into using other Avro libraries and convert into datafusion dataframes.
   
   **Additional context**
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion-python] mesejo commented on issue #349: Support reading Avro files in zstd codec

Posted by "mesejo (via GitHub)" <gi...@apache.org>.
mesejo commented on issue #349:
URL: https://github.com/apache/arrow-datafusion-python/issues/349#issuecomment-1687722026

   As you mentioned, this is not supported by the Rust Implementation (see the [reading options](https://docs.rs/datafusion/latest/datafusion/datasource/file_format/options/struct.AvroReadOptions.html)). Is better to move the ticket to the appropriate [repo](https://github.com/apache/arrow-datafusion). What are your thoughts @alamb @andygrove?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion-python] alamb commented on issue #349: Support reading Avro files in zstd codec

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #349:
URL: https://github.com/apache/arrow-datafusion-python/issues/349#issuecomment-1688172443

   I agree  -- this is a problem lower down in the stack (either in DataFusion or avro_rs) 
   
   I did a brief look at avro_rs and didn't see any mention of zstd 🤔 https://docs.rs/avro-rs/latest/avro_rs/#using-codecs-to-compress-data


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org