You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Siavash Namvar <sn...@gmail.com> on 2020/08/12 13:18:41 UTC
How can I use pyspark to upsert one row without replacing entire table
Hi,
<https://stackoverflow.com/posts/63377211/timeline>
I have a use case, and read data from a db table and need to update few
rows based on primary key without replacing the entire table.
for instance if I have 3 following rows
-------------------
id | fname -------------------
1 | john -------------------
2 | Steve -------------------
3 | Jack -------------------
And I would like to update the row with id=2 from Steve to Michael without
replacing the entire table and the outpur looks like
-------------------
id | fname -------------------
1 | john -------------------
2 | Michael -------------------
3 | Jack -------------------
Keep in mind the actual db table is so huge and database is old and cannot
read and replace entire table
Thanks
Re: How can I use pyspark to upsert one row without replacing entire table
Posted by Nicholas Gustafson <nj...@gmail.com>.
The delta docs have examples of upserting:
https://docs.delta.io/0.4.0/delta-update.html#upsert-into-a-table-using-merge
> On Aug 12, 2020, at 08:31, Siavash Namvar <sn...@gmail.com> wrote:
>
>
> Thanks Sean,
>
> Do you have any URL or reference to help me how to upsert in Spark? I need to update Sybase db
>
>> On Wed, Aug 12, 2020 at 11:06 AM Sean Owen <sr...@gmail.com> wrote:
>> It's not so much Spark but the data format, whether it supports
>> upserts. Parquet, CSV, JSON, etc would not.
>> That is what Delta, Hudi et al are for, and yes you can upsert them in Spark.
>>
>> On Wed, Aug 12, 2020 at 9:57 AM Siavash Namvar <sn...@gmail.com> wrote:
>> >
>> > Hi,
>> >
>> > I have a use case, and read data from a db table and need to update few rows based on primary key without replacing the entire table.
>> >
>> > for instance if I have 3 following rows
>> >
>> > -------------------
>> > id | fname
>> > -------------------
>> > 1 | john
>> > -------------------
>> > 2 | Steve
>> > -------------------
>> > 3 | Jack
>> > -------------------
>> >
>> > And I would like to update the row with id=2 from Steve to Michael without replacing the entire table and the outpur looks like
>> >
>> > -------------------
>> > id | fname
>> > -------------------
>> > 1 | john
>> > -------------------
>> > 2 | Michael
>> > -------------------
>> > 3 | Jack
>> > -------------------
>> >
>> > Keep in mind the actual db table is so huge and database is old and cannot read and replace entire table
>> >
>> > Thanks
Re: How can I use pyspark to upsert one row without replacing entire table
Posted by Siavash Namvar <sn...@gmail.com>.
That's kind of solution Ed, can you elaborate how can I do this on Spark
side? Or do I need to update table configuration in the DB
Siavash
On Wed, Aug 12, 2020 at 5:55 PM ed elliott <ed...@outlook.com> wrote:
> You’ll need to do an insert and use a trigger on the table to change it
> into an upsert, also make sure your mode is append rather than overwrite.
>
> Ed
>
> ------------------------------
> *From:* Siavash Namvar <sn...@gmail.com>
> *Sent:* Wednesday, August 12, 2020 4:09:07 PM
> *To:* Sean Owen <sr...@gmail.com>
> *Cc:* User <us...@spark.apache.org>
> *Subject:* Re: How can I use pyspark to upsert one row without replacing
> entire table
>
> Thanks Sean,
>
> Do you have any URL or reference to help me how to upsert in Spark? I need
> to update Sybase db
>
> On Wed, Aug 12, 2020 at 11:06 AM Sean Owen <sr...@gmail.com> wrote:
>
> It's not so much Spark but the data format, whether it supports
> upserts. Parquet, CSV, JSON, etc would not.
> That is what Delta, Hudi et al are for, and yes you can upsert them in
> Spark.
>
> On Wed, Aug 12, 2020 at 9:57 AM Siavash Namvar <sn...@gmail.com> wrote:
> >
> > Hi,
> >
> > I have a use case, and read data from a db table and need to update few
> rows based on primary key without replacing the entire table.
> >
> > for instance if I have 3 following rows
> >
> > -------------------
> > id | fname
> > -------------------
> > 1 | john
> > -------------------
> > 2 | Steve
> > -------------------
> > 3 | Jack
> > -------------------
> >
> > And I would like to update the row with id=2 from Steve to Michael
> without replacing the entire table and the outpur looks like
> >
> > -------------------
> > id | fname
> > -------------------
> > 1 | john
> > -------------------
> > 2 | Michael
> > -------------------
> > 3 | Jack
> > -------------------
> >
> > Keep in mind the actual db table is so huge and database is old and
> cannot read and replace entire table
> >
> > Thanks
>
>
Re: How can I use pyspark to upsert one row without replacing entire
table
Posted by ed elliott <ed...@outlook.com>.
You’ll need to do an insert and use a trigger on the table to change it into an upsert, also make sure your mode is append rather than overwrite.
Ed
________________________________
From: Siavash Namvar <sn...@gmail.com>
Sent: Wednesday, August 12, 2020 4:09:07 PM
To: Sean Owen <sr...@gmail.com>
Cc: User <us...@spark.apache.org>
Subject: Re: How can I use pyspark to upsert one row without replacing entire table
Thanks Sean,
Do you have any URL or reference to help me how to upsert in Spark? I need to update Sybase db
On Wed, Aug 12, 2020 at 11:06 AM Sean Owen <sr...@gmail.com>> wrote:
It's not so much Spark but the data format, whether it supports
upserts. Parquet, CSV, JSON, etc would not.
That is what Delta, Hudi et al are for, and yes you can upsert them in Spark.
On Wed, Aug 12, 2020 at 9:57 AM Siavash Namvar <sn...@gmail.com>> wrote:
>
> Hi,
>
> I have a use case, and read data from a db table and need to update few rows based on primary key without replacing the entire table.
>
> for instance if I have 3 following rows
>
> -------------------
> id | fname
> -------------------
> 1 | john
> -------------------
> 2 | Steve
> -------------------
> 3 | Jack
> -------------------
>
> And I would like to update the row with id=2 from Steve to Michael without replacing the entire table and the outpur looks like
>
> -------------------
> id | fname
> -------------------
> 1 | john
> -------------------
> 2 | Michael
> -------------------
> 3 | Jack
> -------------------
>
> Keep in mind the actual db table is so huge and database is old and cannot read and replace entire table
>
> Thanks
Re: How can I use pyspark to upsert one row without replacing entire table
Posted by Siavash Namvar <sn...@gmail.com>.
Thanks Sean,
Do you have any URL or reference to help me how to upsert in Spark? I need
to update Sybase db
On Wed, Aug 12, 2020 at 11:06 AM Sean Owen <sr...@gmail.com> wrote:
> It's not so much Spark but the data format, whether it supports
> upserts. Parquet, CSV, JSON, etc would not.
> That is what Delta, Hudi et al are for, and yes you can upsert them in
> Spark.
>
> On Wed, Aug 12, 2020 at 9:57 AM Siavash Namvar <sn...@gmail.com> wrote:
> >
> > Hi,
> >
> > I have a use case, and read data from a db table and need to update few
> rows based on primary key without replacing the entire table.
> >
> > for instance if I have 3 following rows
> >
> > -------------------
> > id | fname
> > -------------------
> > 1 | john
> > -------------------
> > 2 | Steve
> > -------------------
> > 3 | Jack
> > -------------------
> >
> > And I would like to update the row with id=2 from Steve to Michael
> without replacing the entire table and the outpur looks like
> >
> > -------------------
> > id | fname
> > -------------------
> > 1 | john
> > -------------------
> > 2 | Michael
> > -------------------
> > 3 | Jack
> > -------------------
> >
> > Keep in mind the actual db table is so huge and database is old and
> cannot read and replace entire table
> >
> > Thanks
>
Re: How can I use pyspark to upsert one row without replacing entire table
Posted by Sean Owen <sr...@gmail.com>.
It's not so much Spark but the data format, whether it supports
upserts. Parquet, CSV, JSON, etc would not.
That is what Delta, Hudi et al are for, and yes you can upsert them in Spark.
On Wed, Aug 12, 2020 at 9:57 AM Siavash Namvar <sn...@gmail.com> wrote:
>
> Hi,
>
> I have a use case, and read data from a db table and need to update few rows based on primary key without replacing the entire table.
>
> for instance if I have 3 following rows
>
> -------------------
> id | fname
> -------------------
> 1 | john
> -------------------
> 2 | Steve
> -------------------
> 3 | Jack
> -------------------
>
> And I would like to update the row with id=2 from Steve to Michael without replacing the entire table and the outpur looks like
>
> -------------------
> id | fname
> -------------------
> 1 | john
> -------------------
> 2 | Michael
> -------------------
> 3 | Jack
> -------------------
>
> Keep in mind the actual db table is so huge and database is old and cannot read and replace entire table
>
> Thanks
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org