You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Averell <lv...@gmail.com> on 2018/10/04 07:28:58 UTC
Difference between BucketingSink and StreamingFileSink
Hi everyone,
I am trying to persist my stream into parquet files. In the documents, I can
see two different file sinks: BucketingSink (Rolling File Sink) and
StreamingFileSink. I could not see any information regarding the differences
between these two types.
Which one should I choose for writing to parquet? Is that possible to
partition my output basing on event-time?
Thanks and best regards,
Averell
--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Re: Difference between BucketingSink and StreamingFileSink
Posted by Aljoscha Krettek <al...@apache.org>.
No worries! :-) it's nice that you also posted the solution
> On 4. Oct 2018, at 13:23, Averell <lv...@gmail.com> wrote:
>
> Hi,
>
> Sorry for wasting your time. I found the solution for that question
> regarding event-time: a class that extends BucketAssigner would do the
> needful:
>
> class SdcTimeBucketAssigner[T <: MyClass](prefix: String, formatString:
> String) extends BucketAssigner[T, String]{
> @transient
> var dateFormatter = new SimpleDateFormat(formatString)
>
> override def getBucketId(in: T, context: BucketAssigner.Context): String =
> {
> if (dateFormatter == null) dateFormatter = new
> SimpleDateFormat(formatString)
> s"$prefix${dateFormatter.format(new java.util.Date(in.getTimestamp))}"
> }
>
> override def getSerializer = SimpleVersionedStringSerializer.INSTANCE
> }
>
> Thanks and best regards,
> Averell
>
>
>
>
> --
> Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Re: Difference between BucketingSink and StreamingFileSink
Posted by Averell <lv...@gmail.com>.
Hi,
Sorry for wasting your time. I found the solution for that question
regarding event-time: a class that extends BucketAssigner would do the
needful:
class SdcTimeBucketAssigner[T <: MyClass](prefix: String, formatString:
String) extends BucketAssigner[T, String]{
@transient
var dateFormatter = new SimpleDateFormat(formatString)
override def getBucketId(in: T, context: BucketAssigner.Context): String =
{
if (dateFormatter == null) dateFormatter = new
SimpleDateFormat(formatString)
s"$prefix${dateFormatter.format(new java.util.Date(in.getTimestamp))}"
}
override def getSerializer = SimpleVersionedStringSerializer.INSTANCE
}
Thanks and best regards,
Averell
--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Re: Difference between BucketingSink and StreamingFileSink
Posted by Averell <lv...@gmail.com>.
Hi,
https://issues.apache.org/jira/browse/FLINK-9749 <<< as per this ticket,
StreamingFileSink is a newer option, which is better than BucketingSink for
Parquet.
Would love to see some example one using that.
Thanks and best regards,
Averell
--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/