You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by SRK <sw...@gmail.com> on 2016/02/17 03:45:46 UTC
How to update data saved as parquet in hdfs using Dataframes
Hi,
How do I update data saved as Parquet in hdfs using dataframes? If I use
SaveMode.Append, it just seems to append the data but does not seem to
update if the record is already existing. Do I have to just modify it using
Dataframes api or sql using sqlContext?
Thanks,
Swetha
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-update-data-saved-as-parquet-in-hdfs-using-Dataframes-tp26245.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org
Re: How to update data saved as parquet in hdfs using Dataframes
Posted by Arkadiusz Bicz <ar...@gmail.com>.
Hi,
Hdfs is append only, that you need to modify it as you read and write
in other place.
On Wed, Feb 17, 2016 at 2:45 AM, SRK <sw...@gmail.com> wrote:
> Hi,
>
> How do I update data saved as Parquet in hdfs using dataframes? If I use
> SaveMode.Append, it just seems to append the data but does not seem to
> update if the record is already existing. Do I have to just modify it using
> Dataframes api or sql using sqlContext?
>
>
> Thanks,
> Swetha
>
>
>
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-update-data-saved-as-parquet-in-hdfs-using-Dataframes-tp26245.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org
Re: How to update data saved as parquet in hdfs using Dataframes
Posted by Takeshi Yamamuro <li...@gmail.com>.
HI,
Even if you update a few rows, you need to read whole data from parquet,
update it, and then save all the data as other new files.
On Tue, Feb 16, 2016 at 9:45 PM, SRK <sw...@gmail.com> wrote:
> Hi,
>
> How do I update data saved as Parquet in hdfs using dataframes? If I use
> SaveMode.Append, it just seems to append the data but does not seem to
> update if the record is already existing. Do I have to just modify it using
> Dataframes api or sql using sqlContext?
>
>
> Thanks,
> Swetha
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-update-data-saved-as-parquet-in-hdfs-using-Dataframes-tp26245.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>
--
---
Takeshi Yamamuro