You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/08/18 10:38:43 UTC

[GitHub] [arrow-datafusion] Igosuki opened a new issue #903: Avro table provider

Igosuki opened a new issue #903:
URL: https://github.com/apache/arrow-datafusion/issues/903


   In a platform I work on, I decided to write avro log files so I could easily close and append binary files to s3. Since I didn't want to bother transforming it to another format using Spark, which is the thing I wanted to drop in the first place, I started writing what's required to read avro as a datasource in datafusion.
   
   Here is the branch on my fork (I merged the nested field PR in it but it can be removed) :
   https://github.com/Igosuki/arrow-datafusion/tree/avro2_m
   
   I transformed all parquet test files to avro and plan to add a test case for each of these.
   
   My question would be is Avro support desirable for datafusion or should I just make a sidecar crate on my own ?
   
   **Describe alternatives you've considered**
   Transforming data in json or parquet to reuse the existing code.
   
   **Additional context**
   I'm new to the new arrow data types, and it's been a challenge to find out what I should do with avro union types that are just a nullable field. Ultimately I decided to make them nullable fields and drop the union, but I had to add special cases here and there because of that.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan commented on issue #903: Avro table provider

Posted by GitBox <gi...@apache.org>.
Dandandan commented on issue #903:
URL: https://github.com/apache/arrow-datafusion/issues/903#issuecomment-901092796


   I think avro files as source would be great to have in DataFusion 👍 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb closed issue #903: Avro table provider

Posted by GitBox <gi...@apache.org>.
alamb closed issue #903:
URL: https://github.com/apache/arrow-datafusion/issues/903


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] houqp commented on issue #903: Avro table provider

Posted by GitBox <gi...@apache.org>.
houqp commented on issue #903:
URL: https://github.com/apache/arrow-datafusion/issues/903#issuecomment-901651789


   :+1: from me as long as you can commit to keep maintaining the code :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan commented on issue #903: Avro table provider

Posted by GitBox <gi...@apache.org>.
Dandandan commented on issue #903:
URL: https://github.com/apache/arrow-datafusion/issues/903#issuecomment-901092796


   I think avro files as source would be great to have in DataFusion 👍 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Igosuki commented on issue #903: Avro table provider

Posted by GitBox <gi...@apache.org>.
Igosuki commented on issue #903:
URL: https://github.com/apache/arrow-datafusion/issues/903#issuecomment-901066904


   Just tested my code on real avro files I own and got 200Mb processed in 0.3s (over a window function, Datafusion is the real deal !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Igosuki commented on issue #903: Avro table provider

Posted by GitBox <gi...@apache.org>.
Igosuki commented on issue #903:
URL: https://github.com/apache/arrow-datafusion/issues/903#issuecomment-901066904


   Just tested my code on real avro files I own and got 200Mb processed in 0.3s (over a window function, Datafusion is the real deal !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Igosuki commented on issue #903: Avro table provider

Posted by GitBox <gi...@apache.org>.
Igosuki commented on issue #903:
URL: https://github.com/apache/arrow-datafusion/issues/903#issuecomment-902842995


   https://github.com/apache/arrow-datafusion/pull/910


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org