You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@iceberg.apache.org by "Agrawal, Sanket" <sa...@deloitte.com.INVALID> on 2022/09/14 06:57:58 UTC

Reading Incremental data using SQL

Hi,

We are developing an application where the data is kept in Iceberg format. We are looking for a way to read the incremental data using SQL. I know we can do this in spark by providing the start and end snapshot but can we do the same using SQL?

Thanks,
S. Agrawal

This message (including any attachments) contains confidential information intended for a specific individual and purpose, and is protected by law. If you are not the intended recipient, you should delete this message and any disclosure, copying, or distribution of this message, or the taking of any action based on it, by you is strictly prohibited.

Deloitte refers to a Deloitte member firm, one of its related entities, or Deloitte Touche Tohmatsu Limited ("DTTL"). Each Deloitte member firm is a separate legal entity and a member of DTTL. DTTL does not provide services to clients. Please see www.deloitte.com/about to learn more.

v.E.1

Re: Reading Incremental data using SQL

Posted by Igor Calabria <ig...@gmail.com>.
Hi, it's relatively easy to add this functionality yourself if you're able
to add spark extensions to the cluster. I've wrote about it here:
https://github.com/apache/iceberg/issues/5590
I got no responses so far on my issue, but if there's enough interest I
could provide a PR adding the functionality proposed in the issue.

On Wed, Sep 14, 2022 at 12:52 PM Gabor Kaszab
<ga...@cloudera.com.invalid> wrote:

> Hi Agrawal,
> I checked the page for Spark SQL (
> https://iceberg.apache.org/docs/latest/spark-queries/) but apparently
> there is no existing SQL to get the increment between 2 particular
> snapshots. One idea comes to my mind is to write a query that selects both
> snapshots and do an anti-join based on some ID column? I'm not sure this
> helps but throwing it here as an idea.
>
> Gabor
>
>
> On Wed, Sep 14, 2022 at 8:58 AM Agrawal, Sanket
> <sa...@deloitte.com.invalid> wrote:
>
>> Hi,
>>
>>
>>
>> We are developing an application where the data is kept in Iceberg
>> format. We are looking for a way to read the incremental data using SQL. I
>> know we can do this in spark by providing the start and end snapshot but
>> can we do the same using SQL?
>>
>>
>>
>> Thanks,
>>
>> S. Agrawal
>>
>> This message (including any attachments) contains confidential
>> information intended for a specific individual and purpose, and is
>> protected by law. If you are not the intended recipient, you should delete
>> this message and any disclosure, copying, or distribution of this message,
>> or the taking of any action based on it, by you is strictly prohibited.
>>
>> Deloitte refers to a Deloitte member firm, one of its related entities,
>> or Deloitte Touche Tohmatsu Limited ("DTTL"). Each Deloitte member firm is
>> a separate legal entity and a member of DTTL. DTTL does not provide
>> services to clients. Please see www.deloitte.com/about to learn more.
>>
>> v.E.1
>>
>

Re: Reading Incremental data using SQL

Posted by Gabor Kaszab <ga...@cloudera.com.INVALID>.
Hi Agrawal,
I checked the page for Spark SQL (
https://iceberg.apache.org/docs/latest/spark-queries/) but apparently there
is no existing SQL to get the increment between 2 particular snapshots. One
idea comes to my mind is to write a query that selects both snapshots and
do an anti-join based on some ID column? I'm not sure this helps but
throwing it here as an idea.

Gabor


On Wed, Sep 14, 2022 at 8:58 AM Agrawal, Sanket
<sa...@deloitte.com.invalid> wrote:

> Hi,
>
>
>
> We are developing an application where the data is kept in Iceberg format.
> We are looking for a way to read the incremental data using SQL. I know we
> can do this in spark by providing the start and end snapshot but can we do
> the same using SQL?
>
>
>
> Thanks,
>
> S. Agrawal
>
> This message (including any attachments) contains confidential information
> intended for a specific individual and purpose, and is protected by law. If
> you are not the intended recipient, you should delete this message and any
> disclosure, copying, or distribution of this message, or the taking of any
> action based on it, by you is strictly prohibited.
>
> Deloitte refers to a Deloitte member firm, one of its related entities, or
> Deloitte Touche Tohmatsu Limited ("DTTL"). Each Deloitte member firm is a
> separate legal entity and a member of DTTL. DTTL does not provide services
> to clients. Please see www.deloitte.com/about to learn more.
>
> v.E.1
>