You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Siavash Namvar <sn...@gmail.com> on 2020/08/12 13:18:41 UTC

How can I use pyspark to upsert one row without replacing entire table

Hi,
<https://stackoverflow.com/posts/63377211/timeline>

I have a use case, and read data from a db table and need to update few
rows based on primary key without replacing the entire table.

for instance if I have 3 following rows

-------------------
id | fname         -------------------
 1 | john          -------------------
 2 | Steve         -------------------
 3 | Jack         -------------------

And I would like to update the row with id=2 from Steve to Michael without
replacing the entire table and the outpur looks like

-------------------
id | fname         -------------------
 1 | john          -------------------
 2 | Michael         -------------------
 3 | Jack         -------------------

Keep in mind the actual db table is so huge and database is old and cannot
read and replace entire table

Thanks

Re: How can I use pyspark to upsert one row without replacing entire table

Posted by Nicholas Gustafson <nj...@gmail.com>.

The delta docs have examples of upserting:

https://docs.delta.io/0.4.0/delta-update.html#upsert-into-a-table-using-merge

> On Aug 12, 2020, at 08:31, Siavash Namvar <sn...@gmail.com> wrote:
> 
> 
> Thanks Sean,
> 
> Do you have any URL or reference to help me how to upsert in Spark? I need to update Sybase db
> 
>> On Wed, Aug 12, 2020 at 11:06 AM Sean Owen <sr...@gmail.com> wrote:
>> It's not so much Spark but the data format, whether it supports
>> upserts. Parquet, CSV, JSON, etc would not.
>> That is what Delta, Hudi et al are for, and yes you can upsert them in Spark.
>> 
>> On Wed, Aug 12, 2020 at 9:57 AM Siavash Namvar <sn...@gmail.com> wrote:
>> >
>> > Hi,
>> >
>> > I have a use case, and read data from a db table and need to update few rows based on primary key without replacing the entire table.
>> >
>> > for instance if I have 3 following rows
>> >
>> > -------------------
>> > id | fname
>> > -------------------
>> >  1 | john
>> > -------------------
>> >  2 | Steve
>> > -------------------
>> >  3 | Jack
>> > -------------------
>> >
>> > And I would like to update the row with id=2 from Steve to Michael without replacing the entire table and the outpur looks like
>> >
>> > -------------------
>> > id | fname
>> > -------------------
>> >  1 | john
>> > -------------------
>> >  2 | Michael
>> > -------------------
>> >  3 | Jack
>> > -------------------
>> >
>> > Keep in mind the actual db table is so huge and database is old and cannot read and replace entire table
>> >
>> > Thanks

Re: How can I use pyspark to upsert one row without replacing entire table

Posted by Siavash Namvar <sn...@gmail.com>.

That's kind of solution Ed, can you elaborate how can I do this on Spark
side? Or do I need to update table configuration in the DB

Siavash

On Wed, Aug 12, 2020 at 5:55 PM ed elliott <ed...@outlook.com> wrote:

> You’ll need to do an insert and use a trigger on the table to change it
> into an upsert, also make sure your mode is append rather than overwrite.
>
> Ed
>
> ------------------------------
> *From:* Siavash Namvar <sn...@gmail.com>
> *Sent:* Wednesday, August 12, 2020 4:09:07 PM
> *To:* Sean Owen <sr...@gmail.com>
> *Cc:* User <us...@spark.apache.org>
> *Subject:* Re: How can I use pyspark to upsert one row without replacing
> entire table
>
> Thanks Sean,
>
> Do you have any URL or reference to help me how to upsert in Spark? I need
> to update Sybase db
>
> On Wed, Aug 12, 2020 at 11:06 AM Sean Owen <sr...@gmail.com> wrote:
>
> It's not so much Spark but the data format, whether it supports
> upserts. Parquet, CSV, JSON, etc would not.
> That is what Delta, Hudi et al are for, and yes you can upsert them in
> Spark.
>
> On Wed, Aug 12, 2020 at 9:57 AM Siavash Namvar <sn...@gmail.com> wrote:
> >
> > Hi,
> >
> > I have a use case, and read data from a db table and need to update few
> rows based on primary key without replacing the entire table.
> >
> > for instance if I have 3 following rows
> >
> > -------------------
> > id | fname
> > -------------------
> >  1 | john
> > -------------------
> >  2 | Steve
> > -------------------
> >  3 | Jack
> > -------------------
> >
> > And I would like to update the row with id=2 from Steve to Michael
> without replacing the entire table and the outpur looks like
> >
> > -------------------
> > id | fname
> > -------------------
> >  1 | john
> > -------------------
> >  2 | Michael
> > -------------------
> >  3 | Jack
> > -------------------
> >
> > Keep in mind the actual db table is so huge and database is old and
> cannot read and replace entire table
> >
> > Thanks
>
>

Re: How can I use pyspark to upsert one row without replacing entire table

Posted by ed elliott <ed...@outlook.com>.

You’ll need to do an insert and use a trigger on the table to change it into an upsert, also make sure your mode is append rather than overwrite.

Ed

________________________________
From: Siavash Namvar <sn...@gmail.com>
Sent: Wednesday, August 12, 2020 4:09:07 PM
To: Sean Owen <sr...@gmail.com>
Cc: User <us...@spark.apache.org>
Subject: Re: How can I use pyspark to upsert one row without replacing entire table

Thanks Sean,

Do you have any URL or reference to help me how to upsert in Spark? I need to update Sybase db

On Wed, Aug 12, 2020 at 11:06 AM Sean Owen <sr...@gmail.com>> wrote:
It's not so much Spark but the data format, whether it supports
upserts. Parquet, CSV, JSON, etc would not.
That is what Delta, Hudi et al are for, and yes you can upsert them in Spark.

On Wed, Aug 12, 2020 at 9:57 AM Siavash Namvar <sn...@gmail.com>> wrote:
>
> Hi,
>
> I have a use case, and read data from a db table and need to update few rows based on primary key without replacing the entire table.
>
> for instance if I have 3 following rows
>
> -------------------
> id | fname
> -------------------
>  1 | john
> -------------------
>  2 | Steve
> -------------------
>  3 | Jack
> -------------------
>
> And I would like to update the row with id=2 from Steve to Michael without replacing the entire table and the outpur looks like
>
> -------------------
> id | fname
> -------------------
>  1 | john
> -------------------
>  2 | Michael
> -------------------
>  3 | Jack
> -------------------
>
> Keep in mind the actual db table is so huge and database is old and cannot read and replace entire table
>
> Thanks

Re: How can I use pyspark to upsert one row without replacing entire table

Posted by Siavash Namvar <sn...@gmail.com>.

Thanks Sean,

Do you have any URL or reference to help me how to upsert in Spark? I need
to update Sybase db

On Wed, Aug 12, 2020 at 11:06 AM Sean Owen <sr...@gmail.com> wrote:

> It's not so much Spark but the data format, whether it supports
> upserts. Parquet, CSV, JSON, etc would not.
> That is what Delta, Hudi et al are for, and yes you can upsert them in
> Spark.
>
> On Wed, Aug 12, 2020 at 9:57 AM Siavash Namvar <sn...@gmail.com> wrote:
> >
> > Hi,
> >
> > I have a use case, and read data from a db table and need to update few
> rows based on primary key without replacing the entire table.
> >
> > for instance if I have 3 following rows
> >
> > -------------------
> > id | fname
> > -------------------
> >  1 | john
> > -------------------
> >  2 | Steve
> > -------------------
> >  3 | Jack
> > -------------------
> >
> > And I would like to update the row with id=2 from Steve to Michael
> without replacing the entire table and the outpur looks like
> >
> > -------------------
> > id | fname
> > -------------------
> >  1 | john
> > -------------------
> >  2 | Michael
> > -------------------
> >  3 | Jack
> > -------------------
> >
> > Keep in mind the actual db table is so huge and database is old and
> cannot read and replace entire table
> >
> > Thanks
>

Re: How can I use pyspark to upsert one row without replacing entire table

Posted by Sean Owen <sr...@gmail.com>.

It's not so much Spark but the data format, whether it supports
upserts. Parquet, CSV, JSON, etc would not.
That is what Delta, Hudi et al are for, and yes you can upsert them in Spark.

On Wed, Aug 12, 2020 at 9:57 AM Siavash Namvar <sn...@gmail.com> wrote:
>
> Hi,
>
> I have a use case, and read data from a db table and need to update few rows based on primary key without replacing the entire table.
>
> for instance if I have 3 following rows
>
> -------------------
> id | fname
> -------------------
>  1 | john
> -------------------
>  2 | Steve
> -------------------
>  3 | Jack
> -------------------
>
> And I would like to update the row with id=2 from Steve to Michael without replacing the entire table and the outpur looks like
>
> -------------------
> id | fname
> -------------------
>  1 | john
> -------------------
>  2 | Michael
> -------------------
>  3 | Jack
> -------------------
>
> Keep in mind the actual db table is so huge and database is old and cannot read and replace entire table
>
> Thanks

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org