You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Lian Jiang <ji...@gmail.com> on 2018/03/14 18:36:09 UTC

retention policy for spark structured streaming dataset

I have a spark structured streaming job which dump data into a parquet
file. To avoid the parquet file grows infinitely, I want to discard 3 month
old data. Does spark streaming supports this? Or I need to stop the
streaming job, trim the parquet file and restart the streaming job? Thanks
for any hints.

Re: retention policy for spark structured streaming dataset

Posted by Lian Jiang <ji...@gmail.com>.
It is already partitioned by timestamp. But is it right retention policy
process to stop the streaming job, trim the parquet file and restart the
streaming job? Thanks.

On Wed, Mar 14, 2018 at 12:51 PM, Sunil Parmar <su...@gmail.com>
wrote:

> Can you use partitioning ( by day ) ? That will  make it easier to drop
> data older than x days outside streaming job.
>
> Sunil Parmar
>
> On Wed, Mar 14, 2018 at 11:36 AM, Lian Jiang <ji...@gmail.com>
> wrote:
>
>> I have a spark structured streaming job which dump data into a parquet
>> file. To avoid the parquet file grows infinitely, I want to discard 3 month
>> old data. Does spark streaming supports this? Or I need to stop the
>> streaming job, trim the parquet file and restart the streaming job? Thanks
>> for any hints.
>>
>
>

Re: retention policy for spark structured streaming dataset

Posted by Sunil Parmar <su...@gmail.com>.
Can you use partitioning ( by day ) ? That will  make it easier to drop
data older than x days outside streaming job.

Sunil Parmar

On Wed, Mar 14, 2018 at 11:36 AM, Lian Jiang <ji...@gmail.com> wrote:

> I have a spark structured streaming job which dump data into a parquet
> file. To avoid the parquet file grows infinitely, I want to discard 3 month
> old data. Does spark streaming supports this? Or I need to stop the
> streaming job, trim the parquet file and restart the streaming job? Thanks
> for any hints.
>