You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@arrow.apache.org by "John E. Conlon" <jc...@apache.org> on 2020/12/29 06:33:05 UTC

[Java AvroToArrow] Creating Arrow Files from Avro

Creating a DataEngineering pipeline that will create transform binary Avro objects in S3 buckets to S3 Arrow objects and Parquet objects.  

See that Java libraries don't support Parquet at this time so I plan to first use the Arrow Java libraries for the Avro->Arrow transform and then use the Python Arrow to do the Arrow->Parquet transform.  

On the Java side I plan to download my Avro objects to a file, then create the Arrow files and then upload these.  

See the AvroToArrow.avroToArrowIterator(schema, decoder, config) also see the tests using AvroToArrow but even though I have read the limited documentation I am not sure how to use go about using this to read the Avro files and write output Arrow file. 

Can someone provide me with an example? 





Re: [Java AvroToArrow] Creating Arrow Files from Avro

Posted by Micah Kornfield <em...@gmail.com>.
Hi John,
The overview of the java API might help here [1].  I also wrote up some
notes on avro->Arrow conversion for a different user question [2].
ARROW-9613 [3] is tracking the impedance mismatch I mentioned in the e-mail.

Hope this helps.

-Micah

[1]
https://arrow.apache.org/docs/java/ipc.html#writing-and-reading-random-access-files
[2]
https://lists.apache.org/thread.html/rfa51f801b752faa881d318cff7394ee5b43161c100a707810c6c92fd%40%3Cuser.arrow.apache.org%3E
[3] https://issues.apache.org/jira/browse/ARROW-9613

On Mon, Dec 28, 2020 at 10:33 PM John E. Conlon <jc...@apache.org> wrote:

> Creating a DataEngineering pipeline that will create transform binary Avro
> objects in S3 buckets to S3 Arrow objects and Parquet objects.
>
> See that Java libraries don't support Parquet at this time so I plan to
> first use the Arrow Java libraries for the Avro->Arrow transform and then
> use the Python Arrow to do the Arrow->Parquet transform.
>
> On the Java side I plan to download my Avro objects to a file, then create
> the Arrow files and then upload these.
>
> See the AvroToArrow.avroToArrowIterator(schema, decoder, config) also see
> the tests using AvroToArrow but even though I have read the limited
> documentation I am not sure how to use go about using this to read the Avro
> files and write output Arrow file.
>
> Can someone provide me with an example?
>
>
>
>
>