You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@iceberg.apache.org by Ayush <ay...@gmail.com> on 2021/05/08 07:55:23 UTC

Queries regarding fetching of old iceberg schema and how to use iceberg-arrow apis

Hi,

I was trying to use apache iceberg to evolve my table schema and
correspondingly add the new data to a parquet file. Now when I tried to revert
back to a previous snapshot, I got the old data from the listed parquet files
in that snapshot, but I was not able to retrieve the old schema. I was able to
fetch the schema from the parquet file and convert it to iceberg schema. Is
that the only way to fetch the old schema?

Also are there any test cases or examples on how to use iceberg-arrow apis to
read data from parquet file to arrow in batches?



Please do let me know regarding these queries.



Thanks,

Ayush Bhardwaj




RE: Queries regarding fetching of old iceberg schema and how to use iceberg-arrow apis

Posted by Mayur Srivastava <Ma...@twosigma.com>.
Hi Ayush,

The iceberg-arrow changes that Ryan mentioned (https://github.com/apache/iceberg/pull/2286) was merged recently but it is not feature complete and require a bit more work. May be you could contribute to make it better! Hope this helps.

Thanks,
Mayur

From: Ryan Blue <bl...@apache.org>
Sent: Sunday, May 9, 2021 5:54 PM
To: dev@iceberg.apache.org
Subject: Re: Queries regarding fetching of old iceberg schema and how to use iceberg-arrow apis

Ayush, we're currently adding the ability to track the schema that a snapshot was current when a snapshot was written. Until then, you'll get the table's current schema instead.

I think there's also a PR that adds some ability to use iceberg-arrow like Iceberg generics. I've not had a chance to review it yet, but that should help you see what you'd need to do.

On Sat, May 8, 2021 at 2:38 PM Ayush <ay...@gmail.com>> wrote:
Hi,
I was trying to use apache iceberg to evolve my table schema and correspondingly add the new data to a parquet file. Now when I tried to revert back to a previous snapshot, I got the old data from the listed parquet files in that snapshot, but I was not able to retrieve the old schema. I was able to fetch the schema from the parquet file and convert it to iceberg schema. Is that the only way to fetch the old schema?
Also are there any test cases or examples on how to use iceberg-arrow apis to read data from parquet file to arrow in batches?

Please do let me know regarding these queries.

Thanks,
Ayush Bhardwaj



--
Ryan Blue

Re: Queries regarding fetching of old iceberg schema and how to use iceberg-arrow apis

Posted by Ryan Blue <bl...@apache.org>.
Ayush, we're currently adding the ability to track the schema that a
snapshot was current when a snapshot was written. Until then, you'll get
the table's current schema instead.

I think there's also a PR that adds some ability to use iceberg-arrow like
Iceberg generics. I've not had a chance to review it yet, but that should
help you see what you'd need to do.

On Sat, May 8, 2021 at 2:38 PM Ayush <ay...@gmail.com> wrote:

> Hi,
>
> I was trying to use apache iceberg to evolve my table schema and
> correspondingly add the new data to a parquet file. Now when I tried to
> revert back to a previous snapshot, I got the old data from the listed
> parquet files in that snapshot, but I was not able to retrieve the old
> schema. I was able to fetch the schema from the parquet file and convert it
> to iceberg schema. Is that the only way to fetch the old schema?
>
> Also are there any test cases or examples on how to use iceberg-arrow apis
> to read data from parquet file to arrow in batches?
>
>
>
> Please do let me know regarding these queries.
>
>
>
> Thanks,
>
> Ayush Bhardwaj
>
>
>


-- 
Ryan Blue