You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by "Raja.Aravapalli" <Ra...@target.com> on 2017/08/07 03:46:02 UTC

Re: [EXTERNAL] Re: Help required - "BucketingSink" usage to write HDFS Files

Hi Vinay,

Thanks for the response.

I have NOT enabled any checkpointing.

Files are rolling out correctly for every 2mb, but the files are remaining as below:

-rw-r--r--   3 2097424 2017-08-06 21:10 /xxxx/xxxx/xxxx/Test/part-0-0.pending
-rw-r--r--   3 1431430 2017-08-06 21:12 /xxxx/xxxx/xxxx/Test/part-0-1.pending


Regards,
Raja.

From: vinay patil <vi...@gmail.com>
Date: Sunday, August 6, 2017 at 10:40 PM
To: "user@flink.apache.org" <us...@flink.apache.org>
Subject: [EXTERNAL] Re: Help required - "BucketingSink" usage to write HDFS Files

Hi Raja,

Have you enabled checkpointing?
The files will be rolled to complete state when the batch size is reached (in your case 2 MB) or when the bucket is inactive for a certain amount of time.

Regards,
Vinay Patil

On Mon, Aug 7, 2017 at 7:53 AM, Raja.Aravapalli [via Apache Flink User Mailing List archive.] <[hidden email]<file:////user/SendEmail.jtp%3ftype=node&node=14715&i=0>> wrote:

Hi,

I am working on a poc to write to hdfs files using BucketingSink class. Even thought I am the data is being writing to hdfs files, but the files are lying with “.pending” on hdfs.


Below is the code I am using. Can someone pls help me identify the issue and help me fix this ?


BucketingSink<String> HdfsSink = new BucketingSink<String>("hdfs://xxxx/xxxx/xxxx/Test/");
HdfsSink.setBucketer(new DateTimeBucketer<String>("yyyy-MM-dd--HHmm"));
HdfsSink.setBatchSize(1024 * 1024 * 2); // this is 2 MB,
HdfsSink.setInactiveBucketCheckInterval(10000L);
HdfsSink.setInactiveBucketThreshold(10000L);


Thanks a lot.


Regards,
Raja.

________________________________
If you reply to this email, your message will be added to the discussion below:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Help-required-BucketingSink-usage-to-write-HDFS-Files-tp14714.html
To start a new topic under Apache Flink User Mailing List archive., email [hidden email]<file:////user/SendEmail.jtp%3ftype=node&node=14715&i=1>
To unsubscribe from Apache Flink User Mailing List archive., click here.
NAML<http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>


________________________________
View this message in context: Re: Help required - "BucketingSink" usage to write HDFS Files<http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Help-required-BucketingSink-usage-to-write-HDFS-Files-tp14714p14715.html>
Sent from the Apache Flink User Mailing List archive. mailing list archive<http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/> at Nabble.com.


Re: [EXTERNAL] Re: Help required - "BucketingSink" usage to write HDFS Files

Posted by "Raja.Aravapalli" <Ra...@target.com>.
Thanks very much for the pointers Vinay. That helps ☺


-Raja.

From: vinay patil <vi...@gmail.com>
Date: Monday, August 7, 2017 at 1:56 AM
To: "user@flink.apache.org" <us...@flink.apache.org>
Subject: Re: [EXTERNAL] Re: Help required - "BucketingSink" usage to write HDFS Files

Hi Raja,

That is why they are in the pending state. You can enable checkpointing by setting env.enableCheckpointing(<duration>)

After doing this they will not remain in pending state.

Check this out : https://ci.apache.org/projects/flink/flink-docs-release-1.3/api/java/org/apache/flink/streaming/connectors/fs/bucketing/BucketingSink.html

Regards,
Vinay Patil

On Mon, Aug 7, 2017 at 9:15 AM, Raja.Aravapalli [via Apache Flink User Mailing List archive.] <[hidden email]<file:////user/SendEmail.jtp%3ftype=node&node=14717&i=0>> wrote:
Hi Vinay,

Thanks for the response.

I have NOT enabled any checkpointing.

Files are rolling out correctly for every 2mb, but the files are remaining as below:

-rw-r--r--   3 2097424 2017-08-06 21:10 /xxxx/xxxx/xxxx/Test/part-0-0.pending
-rw-r--r--   3 1431430 2017-08-06 21:12 /xxxx/xxxx/xxxx/Test/part-0-1.pending


Regards,
Raja.

From: vinay patil <[hidden email]<http://user/SendEmail.jtp?type=node&node=14716&i=0>>
Date: Sunday, August 6, 2017 at 10:40 PM
To: "[hidden email]<http://user/SendEmail.jtp?type=node&node=14716&i=1>" <[hidden email]<http://user/SendEmail.jtp?type=node&node=14716&i=2>>
Subject: [EXTERNAL] Re: Help required - "BucketingSink" usage to write HDFS Files

Hi Raja,

Have you enabled checkpointing?
The files will be rolled to complete state when the batch size is reached (in your case 2 MB) or when the bucket is inactive for a certain amount of time.

Regards,
Vinay Patil

On Mon, Aug 7, 2017 at 7:53 AM, Raja.Aravapalli [via Apache Flink User Mailing List archive.] <[hidden email]> wrote:

Hi,

I am working on a poc to write to hdfs files using BucketingSink class. Even thought I am the data is being writing to hdfs files, but the files are lying with “.pending” on hdfs.


Below is the code I am using. Can someone pls help me identify the issue and help me fix this ?


BucketingSink<String> HdfsSink = new BucketingSink<String>("hdfs://xxxx/xxxx/xxxx/Test/");
HdfsSink.setBucketer(new DateTimeBucketer<String>("yyyy-MM-dd--HHmm"));
HdfsSink.setBatchSize(1024 * 1024 * 2); // this is 2 MB,
HdfsSink.setInactiveBucketCheckInterval(10000L);
HdfsSink.setInactiveBucketThreshold(10000L);


Thanks a lot.


Regards,
Raja.

________________________________
If you reply to this email, your message will be added to the discussion below:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Help-required-BucketingSink-usage-to-write-HDFS-Files-tp14714.html
To start a new topic under Apache Flink User Mailing List archive., email [hidden email]
To unsubscribe from Apache Flink User Mailing List archive., click here.
NAML<http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>


________________________________
View this message in context: Re: Help required - "BucketingSink" usage to write HDFS Files<http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Help-required-BucketingSink-usage-to-write-HDFS-Files-tp14714p14715.html>
Sent from the Apache Flink User Mailing List archive. mailing list archive<http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/> at Nabble.com.

________________________________
If you reply to this email, your message will be added to the discussion below:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Help-required-BucketingSink-usage-to-write-HDFS-Files-tp14714p14716.html
To start a new topic under Apache Flink User Mailing List archive., email [hidden email]<file:////user/SendEmail.jtp%3ftype=node&node=14717&i=1>
To unsubscribe from Apache Flink User Mailing List archive., click here.
NAML<http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>


________________________________
View this message in context: Re: [EXTERNAL] Re: Help required - "BucketingSink" usage to write HDFS Files<http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Help-required-BucketingSink-usage-to-write-HDFS-Files-tp14714p14717.html>
Sent from the Apache Flink User Mailing List archive. mailing list archive<http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/> at Nabble.com.


Re: [EXTERNAL] Re: Help required - "BucketingSink" usage to write HDFS Files

Posted by vinay patil <vi...@gmail.com>.
Hi Raja,

That is why they are in the pending state. You can enable checkpointing by
setting env.enableCheckpointing(<duration>)

After doing this they will not remain in pending state.

Check this out :
https://ci.apache.org/projects/flink/flink-docs-release-1.3/api/java/org/apache/flink/streaming/connectors/fs/bucketing/BucketingSink.html

Regards,
Vinay Patil

On Mon, Aug 7, 2017 at 9:15 AM, Raja.Aravapalli [via Apache Flink User
Mailing List archive.] <ml...@n4.nabble.com> wrote:

> Hi Vinay,
>
>
>
> Thanks for the response.
>
>
>
> I have NOT enabled any checkpointing.
>
>
>
> Files are rolling out correctly for every 2mb, but the files are remaining
> as below:
>
>
>
> -rw-r--r--   3 2097424 2017-08-06 21:10 */xxxx/xxxx/xxxx*/Test/part-0-0.
> pending
>
> -rw-r--r--   3 1431430 2017-08-06 21:12 */xxxx/xxxx/xxxx*/Test/part-0-1.
> pending
>
>
>
>
>
> Regards,
>
> Raja.
>
>
>
> *From: *vinay patil <[hidden email]
> <http:///user/SendEmail.jtp?type=node&node=14716&i=0>>
> *Date: *Sunday, August 6, 2017 at 10:40 PM
> *To: *"[hidden email]
> <http:///user/SendEmail.jtp?type=node&node=14716&i=1>" <[hidden email]
> <http:///user/SendEmail.jtp?type=node&node=14716&i=2>>
> *Subject: *[EXTERNAL] Re: Help required - "BucketingSink" usage to write
> HDFS Files
>
>
>
> Hi Raja,
>
> Have you enabled checkpointing?
>
> The files will be rolled to complete state when the batch size is reached
> (in your case 2 MB) or when the bucket is inactive for a certain amount of
> time.
>
>
> Regards,
>
> Vinay Patil
>
>
>
> On Mon, Aug 7, 2017 at 7:53 AM, Raja.Aravapalli [via Apache Flink User
> Mailing List archive.] <[hidden email]> wrote:
>
>
>
> Hi,
>
>
>
> I am working on a poc to write to hdfs files using BucketingSink class.
> Even thought I am the data is being writing to hdfs files, but the files
> are lying with “.pending” on hdfs.
>
>
>
>
>
> Below is the code I am using. Can someone pls help me identify the issue
> and help me fix this ?
>
>
>
>
>
> BucketingSink<String> HdfsSink = *new *BucketingSink<String>(
> *"hdfs://xxxx/xxxx/xxxx/Test/"*);
>
>
>
> *HdfsSink.setBucketer(new DateTimeBucketer<String>("yyyy-MM-dd--HHmm"));
> HdfsSink.setBatchSize(1024 * 1024 * 2); // this is 2 MB,
> HdfsSink.setInactiveBucketCheckInterval(10000L);
> HdfsSink.setInactiveBucketThreshold(10000L);*
>
>
>
>
>
> Thanks a lot.
>
>
>
>
>
> Regards,
>
> Raja.
>
>
> ------------------------------
>
> *If you reply to this email, your message will be added to the discussion
> below:*
>
> http://apache-flink-user-mailing-list-archive.2336050.
> n4.nabble.com/Help-required-BucketingSink-usage-to-write-
> HDFS-Files-tp14714.html
>
> To start a new topic under Apache Flink User Mailing List archive., email [hidden
> email]
> To unsubscribe from Apache Flink User Mailing List archive., click here.
> NAML
> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>
>
>
>
> ------------------------------
>
> View this message in context: Re: Help required - "BucketingSink" usage
> to write HDFS Files
> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Help-required-BucketingSink-usage-to-write-HDFS-Files-tp14714p14715.html>
> Sent from the Apache Flink User Mailing List archive. mailing list archive
> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/> at
> Nabble.com.
>
>
>
> ------------------------------
> If you reply to this email, your message will be added to the discussion
> below:
> http://apache-flink-user-mailing-list-archive.2336050.
> n4.nabble.com/Help-required-BucketingSink-usage-to-write-
> HDFS-Files-tp14714p14716.html
> To start a new topic under Apache Flink User Mailing List archive., email
> ml+s2336050n1h83@n4.nabble.com
> To unsubscribe from Apache Flink User Mailing List archive., click here
> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=dmluYXkxOC5wYXRpbEBnbWFpbC5jb218MXwxODExMDE2NjAx>
> .
> NAML
> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>




--
View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Help-required-BucketingSink-usage-to-write-HDFS-Files-tp14714p14717.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.