You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by SRK <sw...@gmail.com> on 2016/02/17 03:45:46 UTC

How to update data saved as parquet in hdfs using Dataframes

Hi,

How do I update data saved as Parquet in hdfs using dataframes? If I use
SaveMode.Append, it just seems to append the data but does not seem to
update if the record is already existing. Do I have to just modify it using
Dataframes api or sql using sqlContext?


Thanks,
Swetha



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-update-data-saved-as-parquet-in-hdfs-using-Dataframes-tp26245.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: How to update data saved as parquet in hdfs using Dataframes

Posted by Arkadiusz Bicz <ar...@gmail.com>.
Hi,

Hdfs is append only, that you need to modify it as you read and write
in other place.

On Wed, Feb 17, 2016 at 2:45 AM, SRK <sw...@gmail.com> wrote:
> Hi,
>
> How do I update data saved as Parquet in hdfs using dataframes? If I use
> SaveMode.Append, it just seems to append the data but does not seem to
> update if the record is already existing. Do I have to just modify it using
> Dataframes api or sql using sqlContext?
>
>
> Thanks,
> Swetha
>
>
>
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-update-data-saved-as-parquet-in-hdfs-using-Dataframes-tp26245.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: How to update data saved as parquet in hdfs using Dataframes

Posted by Takeshi Yamamuro <li...@gmail.com>.
HI,

Even if you update a few rows, you need to read whole data from parquet,
update it, and then save all the data as other new files.

On Tue, Feb 16, 2016 at 9:45 PM, SRK <sw...@gmail.com> wrote:

> Hi,
>
> How do I update data saved as Parquet in hdfs using dataframes? If I use
> SaveMode.Append, it just seems to append the data but does not seem to
> update if the record is already existing. Do I have to just modify it using
> Dataframes api or sql using sqlContext?
>
>
> Thanks,
> Swetha
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-update-data-saved-as-parquet-in-hdfs-using-Dataframes-tp26245.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>


-- 
---
Takeshi Yamamuro