You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by 马哲超 <ma...@gmail.com> on 2016/03/15 07:55:11 UTC

Re: HDFS Bolts -- partitioning output

I'm also looking forward for this partitioning function. The issue title
has been changed to STORM-1464.

2016-01-26 1:38 GMT+08:00 Aaron.Dossett <Aa...@target.com>:

> Erik — It turned that we did need this in production after all.  I updated
> STORM-1494 to include partitioning and I will have an initial PR soon for
> review.
>
> From: Erik Weathers <ew...@groupon.com>
> Reply-To: "user@storm.apache.org" <us...@storm.apache.org>
> Date: Monday, January 11, 2016 at 6:00 PM
> To: "user@storm.apache.org" <us...@storm.apache.org>
> Cc: "dev@storm.apache.org" <de...@storm.apache.org>
> Subject: Re: HDFS Bolts -- partitioning output
>
> Awesome Aaron, I can send you what we have done offline!
>
> - Erik
>
> On Thu, Jan 7, 2016 at 11:12 AM, Aaron.Dossett <Aa...@target.com>
> wrote:
>
>> Thanks, Erik.  Your “Partitioner” is exactly what I had in mind and even
>> what I named my stubbed out interface :-)  Since Target has decided against
>> this approach for other reasons, it will have to be a side project for me
>> for now.
>>
>> Best, Aaron
>>
>> From: Erik Weathers <ew...@groupon.com>
>> Reply-To: "user@storm.apache.org" <us...@storm.apache.org>
>> Date: Wednesday, January 6, 2016 at 5:48 PM
>> To: "user@storm.apache.org" <us...@storm.apache.org>
>> Cc: "dev@storm.apache.org" <de...@storm.apache.org>
>> Subject: Re: HDFS Bolts -- partitioning output
>>
>> hey Aaron,
>>
>> We've also written a similar bolt at Groupon, we aren't super satisfied
>> with the implementation though. :)  We are begrudgingly using it because
>> there is no partitioning support in the OSS storm-hdfs bolt.
>>
>> Though one thing I do like about our implementation is having the ability
>> to define your own "Partitioner" in each topology to do various types of
>> partitioning (date-based, message ID-based, topic-based, whatever).  It
>> would be great if your implementation had such logic too.  e.g., when
>> deciding the HDFS path for a tuple's data, the Partitioner is called to
>> determine the HDFS path.  For example, it can take the Tuple object and an
>> opaque key/value Configuration hash that can pass items like a kafka topic
>> name to be included into the HDFS path.
>>
>> - Erik
>>
>> On Tue, Dec 29, 2015 at 7:12 AM, Aaron.Dossett <Aa...@target.com>
>> wrote:
>>
>>> Hi,
>>>
>>> My team was exploring changes to the HDFS bolts that would allow for
>>> partitioning the output, for example into directories corresponding to
>>> day.  This is different that the existing functionality to rotate files
>>> based on a set length of time.  For unrelated reasons, we are probably not
>>> going to pursue this further.  However, I have some code changes that
>>> implement most of this functionality for at least some partitioning use
>>> cases.  If there is interest from the user or developer community for this
>>> feature, I could get in shape for a PR to get feedback about our
>>> implementation approach.
>>>
>>> Any feedback on this idea is welcome.  Thanks! -Aaron
>>>
>>
>>
>

Re: HDFS Bolts -- partitioning output

Posted by "Aaron.Dossett" <Aa...@target.com>.
The PR is being actively reviewed right now.  Have a look and let me know what you think :-)

https://github.com/apache/storm/pull/1044

From: Rajasekhar <ra...@gmail.com>>
Reply-To: "user@storm.apache.org<ma...@storm.apache.org>" <us...@storm.apache.org>>
Date: Tuesday, March 15, 2016 at 3:20 PM
To: "user@storm.apache.org<ma...@storm.apache.org>" <us...@storm.apache.org>>
Subject: Re: HDFS Bolts -- partitioning output

Hi Aaron/Erik,

We need this approach as well. Can you please include the implementation or design of it.

On Mon, Mar 14, 2016 at 11:55 PM, 马哲超 <ma...@gmail.com>> wrote:
I'm also looking forward for this partitioning function. The issue title has been changed to STORM-1464.

2016-01-26 1:38 GMT+08:00 Aaron.Dossett <Aa...@target.com>>:
Erik ― It turned that we did need this in production after all.  I updated STORM-1494 to include partitioning and I will have an initial PR soon for review.

From: Erik Weathers <ew...@groupon.com>>
Reply-To: "user@storm.apache.org<ma...@storm.apache.org>" <us...@storm.apache.org>>
Date: Monday, January 11, 2016 at 6:00 PM
To: "user@storm.apache.org<ma...@storm.apache.org>" <us...@storm.apache.org>>
Cc: "dev@storm.apache.org<ma...@storm.apache.org>" <de...@storm.apache.org>>
Subject: Re: HDFS Bolts -- partitioning output

Awesome Aaron, I can send you what we have done offline!

- Erik

On Thu, Jan 7, 2016 at 11:12 AM, Aaron.Dossett <Aa...@target.com>> wrote:
Thanks, Erik.  Your “Partitioner” is exactly what I had in mind and even what I named my stubbed out interface :-)  Since Target has decided against this approach for other reasons, it will have to be a side project for me for now.

Best, Aaron

From: Erik Weathers <ew...@groupon.com>>
Reply-To: "user@storm.apache.org<ma...@storm.apache.org>" <us...@storm.apache.org>>
Date: Wednesday, January 6, 2016 at 5:48 PM
To: "user@storm.apache.org<ma...@storm.apache.org>" <us...@storm.apache.org>>
Cc: "dev@storm.apache.org<ma...@storm.apache.org>" <de...@storm.apache.org>>
Subject: Re: HDFS Bolts -- partitioning output

hey Aaron,

We've also written a similar bolt at Groupon, we aren't super satisfied with the implementation though. :)  We are begrudgingly using it because there is no partitioning support in the OSS storm-hdfs bolt.

Though one thing I do like about our implementation is having the ability to define your own "Partitioner" in each topology to do various types of partitioning (date-based, message ID-based, topic-based, whatever).  It would be great if your implementation had such logic too.  e.g., when deciding the HDFS path for a tuple's data, the Partitioner is called to determine the HDFS path.  For example, it can take the Tuple object and an opaque key/value Configuration hash that can pass items like a kafka topic name to be included into the HDFS path.

- Erik

On Tue, Dec 29, 2015 at 7:12 AM, Aaron.Dossett <Aa...@target.com>> wrote:
Hi,

My team was exploring changes to the HDFS bolts that would allow for partitioning the output, for example into directories corresponding to day.  This is different that the existing functionality to rotate files based on a set length of time.  For unrelated reasons, we are probably not going to pursue this further.  However, I have some code changes that implement most of this functionality for at least some partitioning use cases.  If there is interest from the user or developer community for this feature, I could get in shape for a PR to get feedback about our implementation approach.

Any feedback on this idea is welcome.  Thanks! -Aaron






--
Thanks & Regards
Rajasekhar

Re: HDFS Bolts -- partitioning output

Posted by Rajasekhar <ra...@gmail.com>.
Hi Aaron/Erik,

We need this approach as well. Can you please include the implementation or
design of it.

On Mon, Mar 14, 2016 at 11:55 PM, 马哲超 <ma...@gmail.com> wrote:

> I'm also looking forward for this partitioning function. The issue title
> has been changed to STORM-1464.
>
> 2016-01-26 1:38 GMT+08:00 Aaron.Dossett <Aa...@target.com>:
>
>> Erik — It turned that we did need this in production after all.  I
>> updated STORM-1494 to include partitioning and I will have an initial PR
>> soon for review.
>>
>> From: Erik Weathers <ew...@groupon.com>
>> Reply-To: "user@storm.apache.org" <us...@storm.apache.org>
>> Date: Monday, January 11, 2016 at 6:00 PM
>> To: "user@storm.apache.org" <us...@storm.apache.org>
>> Cc: "dev@storm.apache.org" <de...@storm.apache.org>
>> Subject: Re: HDFS Bolts -- partitioning output
>>
>> Awesome Aaron, I can send you what we have done offline!
>>
>> - Erik
>>
>> On Thu, Jan 7, 2016 at 11:12 AM, Aaron.Dossett <Aa...@target.com>
>> wrote:
>>
>>> Thanks, Erik.  Your “Partitioner” is exactly what I had in mind and even
>>> what I named my stubbed out interface :-)  Since Target has decided against
>>> this approach for other reasons, it will have to be a side project for me
>>> for now.
>>>
>>> Best, Aaron
>>>
>>> From: Erik Weathers <ew...@groupon.com>
>>> Reply-To: "user@storm.apache.org" <us...@storm.apache.org>
>>> Date: Wednesday, January 6, 2016 at 5:48 PM
>>> To: "user@storm.apache.org" <us...@storm.apache.org>
>>> Cc: "dev@storm.apache.org" <de...@storm.apache.org>
>>> Subject: Re: HDFS Bolts -- partitioning output
>>>
>>> hey Aaron,
>>>
>>> We've also written a similar bolt at Groupon, we aren't super satisfied
>>> with the implementation though. :)  We are begrudgingly using it because
>>> there is no partitioning support in the OSS storm-hdfs bolt.
>>>
>>> Though one thing I do like about our implementation is having the
>>> ability to define your own "Partitioner" in each topology to do various
>>> types of partitioning (date-based, message ID-based, topic-based,
>>> whatever).  It would be great if your implementation had such logic too.
>>>  e.g., when deciding the HDFS path for a tuple's data, the Partitioner is
>>> called to determine the HDFS path.  For example, it can take the Tuple
>>> object and an opaque key/value Configuration hash that can pass items like
>>> a kafka topic name to be included into the HDFS path.
>>>
>>> - Erik
>>>
>>> On Tue, Dec 29, 2015 at 7:12 AM, Aaron.Dossett <Aaron.Dossett@target.com
>>> > wrote:
>>>
>>>> Hi,
>>>>
>>>> My team was exploring changes to the HDFS bolts that would allow for
>>>> partitioning the output, for example into directories corresponding to
>>>> day.  This is different that the existing functionality to rotate files
>>>> based on a set length of time.  For unrelated reasons, we are probably not
>>>> going to pursue this further.  However, I have some code changes that
>>>> implement most of this functionality for at least some partitioning use
>>>> cases.  If there is interest from the user or developer community for this
>>>> feature, I could get in shape for a PR to get feedback about our
>>>> implementation approach.
>>>>
>>>> Any feedback on this idea is welcome.  Thanks! -Aaron
>>>>
>>>
>>>
>>
>


-- 
Thanks & Regards
Rajasekhar