You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@iceberg.apache.org by Ryan Blue <bl...@apache.org> on 2021/05/02 18:10:19 UTC

Re: Iceberg Transactions via spark

Vivek,

Currently, Spark doesn't support any of the BEGIN/COMMIT statements for
transactions, so I don't think that it is possible right now. What are you
trying to do? It may be that some of the newer commands, like MERGE INTO,
would work for you instead.

On Thu, Apr 29, 2021 at 5:49 PM vivek B <vi...@gmail.com> wrote:

> Hey All,
> Is there a way to run multiple sql operations via spark as one single
> transaction ?
>
> Thanks,
> vivek
>


-- 
Ryan Blue

Re: Iceberg Transactions via spark

Posted by Ryan Blue <bl...@apache.org>.
Vivek,

You might want to try MERGE INTO again. You should be able to make it more
efficient by adding predicates to the ON clause. Those will get pushed down
to the target table to avoid a big scan.

Iceberg supports transactions to do what you want, but it doesn't use table
locking. Instead, it will apply both operations when retrying the
transaction commit. So it still uses optimistic concurrency for
coordination. If the commit fails, then nothing in your table would be
changed by operations in the transaction. The difficulty is that Spark
doesn't support transactions with SQL statements.

On Mon, May 3, 2021 at 9:54 AM vivek B <vi...@gmail.com> wrote:

>
>
> On 2021/05/02 18:10:19, Ryan Blue <bl...@apache.org> wrote:
> > Vivek,
> >
> > Currently, Spark doesn't support any of the BEGIN/COMMIT statements for
> > transactions, so I don't think that it is possible right now. What are
> you
> > trying to do? It may be that some of the newer commands, like MERGE INTO,
> > would work for you instead.
> >
> > On Thu, Apr 29, 2021 at 5:49 PM vivek B <vi...@gmail.com>
> wrote:
> >
> > > Hey All,
> > > Is there a way to run multiple sql operations via spark as one single
> > > transaction ?
> > >
> > > Thanks,
> > > vivek
> > >
> >
> >
> > --
> Ryan Blue
>
> I was using merge into command to do insert update and delete in one sql
> query.
> But found it to be slow  and I am guessing it may be due to fact that
> merge into reads whole iceberg table into spark and does join.
>
> So wanted to do explicitly delete , update and insert on iceberg table.
> I was asking whether  there is a way to hold lock on iceberg table.(so
> that anybody else cannot write to iceberg table and increment the snapshot
> id ).
> And apply some sql operations. but if anything goes wrong then roll back
> to snapshot id that was there at the beginning of my sql operations.
>
> Thanks,
> vivek
>
>
>

-- 
Ryan Blue

Re: Iceberg Transactions via spark

Posted by vivek B <vi...@gmail.com>.

On 2021/05/02 18:10:19, Ryan Blue <bl...@apache.org> wrote: 
> Vivek,
> 
> Currently, Spark doesn't support any of the BEGIN/COMMIT statements for
> transactions, so I don't think that it is possible right now. What are you
> trying to do? It may be that some of the newer commands, like MERGE INTO,
> would work for you instead.
> 
> On Thu, Apr 29, 2021 at 5:49 PM vivek B <vi...@gmail.com> wrote:
> 
> > Hey All,
> > Is there a way to run multiple sql operations via spark as one single
> > transaction ?
> >
> > Thanks,
> > vivek
> >
> 
> 
> -- 
Ryan Blue
 
I was using merge into command to do insert update and delete in one sql query.
But found it to be slow  and I am guessing it may be due to fact that merge into reads whole iceberg table into spark and does join.

So wanted to do explicitly delete , update and insert on iceberg table.
I was asking whether  there is a way to hold lock on iceberg table.(so that anybody else cannot write to iceberg table and increment the snapshot id ).
And apply some sql operations. but if anything goes wrong then roll back to snapshot id that was there at the beginning of my sql operations.

Thanks,
vivek