You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Averell <lv...@gmail.com> on 2018/10/04 07:28:58 UTC

Difference between BucketingSink and StreamingFileSink

Hi everyone,

I am trying to persist my stream into parquet files. In the documents, I can
see two different file sinks: BucketingSink (Rolling File Sink) and
StreamingFileSink. I could not see any information regarding the differences
between these two types.
Which one should I choose for writing to parquet? Is that possible to
partition my output basing on event-time?

Thanks and best regards,
Averell



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Difference between BucketingSink and StreamingFileSink

Posted by Aljoscha Krettek <al...@apache.org>.
No worries! :-) it's nice that you also posted the solution

> On 4. Oct 2018, at 13:23, Averell <lv...@gmail.com> wrote:
> 
> Hi,
> 
> Sorry for wasting your time. I found the solution for that question
> regarding event-time: a class that extends BucketAssigner would do the
> needful:
> 
> class SdcTimeBucketAssigner[T <: MyClass](prefix: String, formatString:
> String) extends BucketAssigner[T, String]{
> 	@transient
> 	var dateFormatter = new SimpleDateFormat(formatString)
> 
> 	override def getBucketId(in: T, context: BucketAssigner.Context): String =
> {
> 		if (dateFormatter == null) dateFormatter = new
> SimpleDateFormat(formatString)
> 		s"$prefix${dateFormatter.format(new java.util.Date(in.getTimestamp))}"
> 	}
> 
> 	override def getSerializer = SimpleVersionedStringSerializer.INSTANCE
> }
> 
> Thanks and best regards,
> Averell
> 
> 
> 
> 
> --
> Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Re: Difference between BucketingSink and StreamingFileSink

Posted by Averell <lv...@gmail.com>.
Hi,

Sorry for wasting your time. I found the solution for that question
regarding event-time: a class that extends BucketAssigner would do the
needful:

class SdcTimeBucketAssigner[T <: MyClass](prefix: String, formatString:
String) extends BucketAssigner[T, String]{
	@transient
	var dateFormatter = new SimpleDateFormat(formatString)

	override def getBucketId(in: T, context: BucketAssigner.Context): String =
{
		if (dateFormatter == null) dateFormatter = new
SimpleDateFormat(formatString)
		s"$prefix${dateFormatter.format(new java.util.Date(in.getTimestamp))}"
	}

	override def getSerializer = SimpleVersionedStringSerializer.INSTANCE
}

Thanks and best regards,
Averell




--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Difference between BucketingSink and StreamingFileSink

Posted by Averell <lv...@gmail.com>.
Hi,

https://issues.apache.org/jira/browse/FLINK-9749 <<< as per this ticket,
StreamingFileSink is a newer option, which is better than BucketingSink for
Parquet.
Would love to see some example one using that.

Thanks and best regards,
Averell



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/