You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@helix.apache.org by Varun Sharma <va...@pinterest.com> on 2014/06/17 00:55:43 UTC

Using Helix for HDFS serving

Hi folks,

We are looking at helix for building a serving solution on top of HDFS for
data generated from mapreduce jobs. The files will be smaller than the HDFS
block size and hence each file will be on 3 replicas with each replica
having the whole file in entirety. A set of files output by MR would be the
resource and each file (or group of X files) would be a partition.

We can assume that there is a container which can serve these immutable
files for lookups. Since we have 3 replicas, we were wondering if we could
use helix for serving these files with 3 logically equivalent replicas. We
need a few things:

a) In the steady state, when HDFS blocks are all triplicated, the logical
assigment of the 3 replicas should respect block affinity.

b) When a node crashes, some blocks become under replicated both physically
and logically (from helix point of view). In such a case, we don't want to
carry out any transitions. Finally, over time (~ 20 minutes), HDFS will re
replicate blocks so that physical replication factor of 3 is attained. Once
this happens, we want the logical replication to catch up to 3 and also
respect hdfs block placement.

So there are two aspects, one is to retain block locality by doing logical
assigment in a way that the logical partition comes up on the same nodes
hosting the physical partition. Secondly, we want the logical placement to
trail the physical placement (as determined by HDFS). So we could have the
cluster in a non ideal state for a long period of time - say 20-30 minutes.

Please let us know if these are feasible with helix and if yes, what would
be the recommended practices.

Thanks
Varun

Re: Using Helix for HDFS serving

Posted by Varun Sharma <va...@pinterest.com>.
We would not be using YARN. Rather a custom built standalone server process
which has the ability to serve multiple partitions, close/open files on
HDFS.

Thanks
Varun


On Tue, Jun 17, 2014 at 2:07 PM, kishore g <g....@gmail.com> wrote:

> That is correct. Do you plan to use YARN to launch the application server
> or start the servers manually?
>
> thanks
> Kishore G
>
>
> On Tue, Jun 17, 2014 at 1:23 PM, Varun Sharma <va...@pinterest.com> wrote:
>
>> Affinity can be best effort and in the case of machine failure, we are
>> fine with loss of affinity for say, 20-30 minutes. Getting the logical
>> assignment precisely mimic HDFS block placement would be hard. So, some
>> skew for short time periods is acceptable.
>>
>> For the serving side, we would use a standalone application for serving
>> the data. The standalone application will expose the following API(s) over
>> RPCs:
>> openPartition(String pathOnHDFS);
>> closePartition(String pathOnHDFS);
>>
>> So, I guess one option would be to have the "controller" keep a
>> reasonably up to date view of HDFS (refreshed every half an hour) and that
>> becomes the ideal state of the system. And the above two API(s)
>> openPartition and closePartition can be used for moving partitions around
>> the containers.
>>
>> Thanks !
>>
>>
>> On Tue, Jun 17, 2014 at 6:49 AM, kishore g <g....@gmail.com> wrote:
>>
>>> Hi Varun,
>>>
>>> As Kanak mentioned, we will need to write a rebalancer that monitors
>>> HDFS changes.
>>>
>>> Have few questions on the design.
>>>
>>> - Are you planning to use YARN or a standalone helix application
>>> - Is it a must to have affinity or is it a best effort.
>>> - If affinity is a must
>>> --- then worst case we will have as need as many servers as the number
>>> of partitions.
>>> --- Another problem would be uneven distribution of partition to server.
>>> - Apart from failure, what is the expected behavior when an rebalance is
>>> done on the HDFS. Do you want the servers to move ?
>>>
>>> Once we get more info, we can quickly write a simple recipe to
>>> demonstrate this ( excluding the serving part).
>>>
>>> thanks,
>>> Kishore G
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Jun 16, 2014 at 4:29 PM, Kanak Biscuitwala <ka...@hotmail.com>
>>> wrote:
>>>
>>>> Hi Varun,
>>>>
>>>> If you would like to do this using Helix today, there are ways to
>>>> support this, but you will have to write your own rebalancer. Essentially
>>>> what you need is to put your resource ideal state in CUSTOMIZED rebalance
>>>> mode, and then have some code that will listen on HDFS changes and computes
>>>> the logical partition assignment and writes the ideal state based on what
>>>> HDFS looks like at that moment.
>>>>
>>>> Eventually we want to define affinity-based assignments in our
>>>> FULL_AUTO mode, but the challenge here is being able to represent that 30
>>>> minute delay, and changing that affinity through some configuration.
>>>>
>>>> Does that make sense? Perhaps others on this list have more ideas.
>>>>
>>>> Kanak
>>>> ________________________________
>>>> > Date: Mon, 16 Jun 2014 15:55:43 -0700
>>>> > Subject: Using Helix for HDFS serving
>>>> > From: varun@pinterest.com
>>>> > To: user@helix.apache.org
>>>> >
>>>> > Hi folks,
>>>> >
>>>> > We are looking at helix for building a serving solution on top of HDFS
>>>> > for data generated from mapreduce jobs. The files will be smaller than
>>>> > the HDFS block size and hence each file will be on 3 replicas with
>>>> each
>>>> > replica having the whole file in entirety. A set of files output by MR
>>>> > would be the resource and each file (or group of X files) would be a
>>>> > partition.
>>>> >
>>>> > We can assume that there is a container which can serve these
>>>> immutable
>>>> > files for lookups. Since we have 3 replicas, we were wondering if we
>>>> > could use helix for serving these files with 3 logically equivalent
>>>> > replicas. We need a few things:
>>>> >
>>>> > a) In the steady state, when HDFS blocks are all triplicated, the
>>>> > logical assigment of the 3 replicas should respect block affinity.
>>>> >
>>>> > b) When a node crashes, some blocks become under replicated both
>>>> > physically and logically (from helix point of view). In such a case,
>>>> we
>>>> > don't want to carry out any transitions. Finally, over time (~ 20
>>>> > minutes), HDFS will re replicate blocks so that physical replication
>>>> > factor of 3 is attained. Once this happens, we want the logical
>>>> > replication to catch up to 3 and also respect hdfs block placement.
>>>> >
>>>> > So there are two aspects, one is to retain block locality by doing
>>>> > logical assigment in a way that the logical partition comes up on the
>>>> > same nodes hosting the physical partition. Secondly, we want the
>>>> > logical placement to trail the physical placement (as determined by
>>>> > HDFS). So we could have the cluster in a non ideal state for a long
>>>> > period of time - say 20-30 minutes.
>>>> >
>>>> > Please let us know if these are feasible with helix and if yes, what
>>>> > would be the recommended practices.
>>>> >
>>>> > Thanks
>>>> > Varun
>>>> >
>>>> >
>>>>
>>>>
>>>
>>>
>>
>

Re: Using Helix for HDFS serving

Posted by kishore g <g....@gmail.com>.
That is correct. Do you plan to use YARN to launch the application server
or start the servers manually?

thanks
Kishore G


On Tue, Jun 17, 2014 at 1:23 PM, Varun Sharma <va...@pinterest.com> wrote:

> Affinity can be best effort and in the case of machine failure, we are
> fine with loss of affinity for say, 20-30 minutes. Getting the logical
> assignment precisely mimic HDFS block placement would be hard. So, some
> skew for short time periods is acceptable.
>
> For the serving side, we would use a standalone application for serving
> the data. The standalone application will expose the following API(s) over
> RPCs:
> openPartition(String pathOnHDFS);
> closePartition(String pathOnHDFS);
>
> So, I guess one option would be to have the "controller" keep a reasonably
> up to date view of HDFS (refreshed every half an hour) and that becomes the
> ideal state of the system. And the above two API(s) openPartition and
> closePartition can be used for moving partitions around the containers.
>
> Thanks !
>
>
> On Tue, Jun 17, 2014 at 6:49 AM, kishore g <g....@gmail.com> wrote:
>
>> Hi Varun,
>>
>> As Kanak mentioned, we will need to write a rebalancer that monitors HDFS
>> changes.
>>
>> Have few questions on the design.
>>
>> - Are you planning to use YARN or a standalone helix application
>> - Is it a must to have affinity or is it a best effort.
>> - If affinity is a must
>> --- then worst case we will have as need as many servers as the number of
>> partitions.
>> --- Another problem would be uneven distribution of partition to server.
>> - Apart from failure, what is the expected behavior when an rebalance is
>> done on the HDFS. Do you want the servers to move ?
>>
>> Once we get more info, we can quickly write a simple recipe to
>> demonstrate this ( excluding the serving part).
>>
>> thanks,
>> Kishore G
>>
>>
>>
>>
>>
>>
>>
>>
>> On Mon, Jun 16, 2014 at 4:29 PM, Kanak Biscuitwala <ka...@hotmail.com>
>> wrote:
>>
>>> Hi Varun,
>>>
>>> If you would like to do this using Helix today, there are ways to
>>> support this, but you will have to write your own rebalancer. Essentially
>>> what you need is to put your resource ideal state in CUSTOMIZED rebalance
>>> mode, and then have some code that will listen on HDFS changes and computes
>>> the logical partition assignment and writes the ideal state based on what
>>> HDFS looks like at that moment.
>>>
>>> Eventually we want to define affinity-based assignments in our FULL_AUTO
>>> mode, but the challenge here is being able to represent that 30 minute
>>> delay, and changing that affinity through some configuration.
>>>
>>> Does that make sense? Perhaps others on this list have more ideas.
>>>
>>> Kanak
>>> ________________________________
>>> > Date: Mon, 16 Jun 2014 15:55:43 -0700
>>> > Subject: Using Helix for HDFS serving
>>> > From: varun@pinterest.com
>>> > To: user@helix.apache.org
>>> >
>>> > Hi folks,
>>> >
>>> > We are looking at helix for building a serving solution on top of HDFS
>>> > for data generated from mapreduce jobs. The files will be smaller than
>>> > the HDFS block size and hence each file will be on 3 replicas with each
>>> > replica having the whole file in entirety. A set of files output by MR
>>> > would be the resource and each file (or group of X files) would be a
>>> > partition.
>>> >
>>> > We can assume that there is a container which can serve these immutable
>>> > files for lookups. Since we have 3 replicas, we were wondering if we
>>> > could use helix for serving these files with 3 logically equivalent
>>> > replicas. We need a few things:
>>> >
>>> > a) In the steady state, when HDFS blocks are all triplicated, the
>>> > logical assigment of the 3 replicas should respect block affinity.
>>> >
>>> > b) When a node crashes, some blocks become under replicated both
>>> > physically and logically (from helix point of view). In such a case, we
>>> > don't want to carry out any transitions. Finally, over time (~ 20
>>> > minutes), HDFS will re replicate blocks so that physical replication
>>> > factor of 3 is attained. Once this happens, we want the logical
>>> > replication to catch up to 3 and also respect hdfs block placement.
>>> >
>>> > So there are two aspects, one is to retain block locality by doing
>>> > logical assigment in a way that the logical partition comes up on the
>>> > same nodes hosting the physical partition. Secondly, we want the
>>> > logical placement to trail the physical placement (as determined by
>>> > HDFS). So we could have the cluster in a non ideal state for a long
>>> > period of time - say 20-30 minutes.
>>> >
>>> > Please let us know if these are feasible with helix and if yes, what
>>> > would be the recommended practices.
>>> >
>>> > Thanks
>>> > Varun
>>> >
>>> >
>>>
>>>
>>
>>
>

Re: Using Helix for HDFS serving

Posted by Varun Sharma <va...@pinterest.com>.
Affinity can be best effort and in the case of machine failure, we are fine
with loss of affinity for say, 20-30 minutes. Getting the logical
assignment precisely mimic HDFS block placement would be hard. So, some
skew for short time periods is acceptable.

For the serving side, we would use a standalone application for serving the
data. The standalone application will expose the following API(s) over RPCs:
openPartition(String pathOnHDFS);
closePartition(String pathOnHDFS);

So, I guess one option would be to have the "controller" keep a reasonably
up to date view of HDFS (refreshed every half an hour) and that becomes the
ideal state of the system. And the above two API(s) openPartition and
closePartition can be used for moving partitions around the containers.

Thanks !


On Tue, Jun 17, 2014 at 6:49 AM, kishore g <g....@gmail.com> wrote:

> Hi Varun,
>
> As Kanak mentioned, we will need to write a rebalancer that monitors HDFS
> changes.
>
> Have few questions on the design.
>
> - Are you planning to use YARN or a standalone helix application
> - Is it a must to have affinity or is it a best effort.
> - If affinity is a must
> --- then worst case we will have as need as many servers as the number of
> partitions.
> --- Another problem would be uneven distribution of partition to server.
> - Apart from failure, what is the expected behavior when an rebalance is
> done on the HDFS. Do you want the servers to move ?
>
> Once we get more info, we can quickly write a simple recipe to demonstrate
> this ( excluding the serving part).
>
> thanks,
> Kishore G
>
>
>
>
>
>
>
>
> On Mon, Jun 16, 2014 at 4:29 PM, Kanak Biscuitwala <ka...@hotmail.com>
> wrote:
>
>> Hi Varun,
>>
>> If you would like to do this using Helix today, there are ways to support
>> this, but you will have to write your own rebalancer. Essentially what you
>> need is to put your resource ideal state in CUSTOMIZED rebalance mode, and
>> then have some code that will listen on HDFS changes and computes the
>> logical partition assignment and writes the ideal state based on what HDFS
>> looks like at that moment.
>>
>> Eventually we want to define affinity-based assignments in our FULL_AUTO
>> mode, but the challenge here is being able to represent that 30 minute
>> delay, and changing that affinity through some configuration.
>>
>> Does that make sense? Perhaps others on this list have more ideas.
>>
>> Kanak
>> ________________________________
>> > Date: Mon, 16 Jun 2014 15:55:43 -0700
>> > Subject: Using Helix for HDFS serving
>> > From: varun@pinterest.com
>> > To: user@helix.apache.org
>> >
>> > Hi folks,
>> >
>> > We are looking at helix for building a serving solution on top of HDFS
>> > for data generated from mapreduce jobs. The files will be smaller than
>> > the HDFS block size and hence each file will be on 3 replicas with each
>> > replica having the whole file in entirety. A set of files output by MR
>> > would be the resource and each file (or group of X files) would be a
>> > partition.
>> >
>> > We can assume that there is a container which can serve these immutable
>> > files for lookups. Since we have 3 replicas, we were wondering if we
>> > could use helix for serving these files with 3 logically equivalent
>> > replicas. We need a few things:
>> >
>> > a) In the steady state, when HDFS blocks are all triplicated, the
>> > logical assigment of the 3 replicas should respect block affinity.
>> >
>> > b) When a node crashes, some blocks become under replicated both
>> > physically and logically (from helix point of view). In such a case, we
>> > don't want to carry out any transitions. Finally, over time (~ 20
>> > minutes), HDFS will re replicate blocks so that physical replication
>> > factor of 3 is attained. Once this happens, we want the logical
>> > replication to catch up to 3 and also respect hdfs block placement.
>> >
>> > So there are two aspects, one is to retain block locality by doing
>> > logical assigment in a way that the logical partition comes up on the
>> > same nodes hosting the physical partition. Secondly, we want the
>> > logical placement to trail the physical placement (as determined by
>> > HDFS). So we could have the cluster in a non ideal state for a long
>> > period of time - say 20-30 minutes.
>> >
>> > Please let us know if these are feasible with helix and if yes, what
>> > would be the recommended practices.
>> >
>> > Thanks
>> > Varun
>> >
>> >
>>
>>
>
>

Re: Using Helix for HDFS serving

Posted by kishore g <g....@gmail.com>.
Hi Varun,

As Kanak mentioned, we will need to write a rebalancer that monitors HDFS
changes.

Have few questions on the design.

- Are you planning to use YARN or a standalone helix application
- Is it a must to have affinity or is it a best effort.
- If affinity is a must
--- then worst case we will have as need as many servers as the number of
partitions.
--- Another problem would be uneven distribution of partition to server.
- Apart from failure, what is the expected behavior when an rebalance is
done on the HDFS. Do you want the servers to move ?

Once we get more info, we can quickly write a simple recipe to demonstrate
this ( excluding the serving part).

thanks,
Kishore G








On Mon, Jun 16, 2014 at 4:29 PM, Kanak Biscuitwala <ka...@hotmail.com>
wrote:

> Hi Varun,
>
> If you would like to do this using Helix today, there are ways to support
> this, but you will have to write your own rebalancer. Essentially what you
> need is to put your resource ideal state in CUSTOMIZED rebalance mode, and
> then have some code that will listen on HDFS changes and computes the
> logical partition assignment and writes the ideal state based on what HDFS
> looks like at that moment.
>
> Eventually we want to define affinity-based assignments in our FULL_AUTO
> mode, but the challenge here is being able to represent that 30 minute
> delay, and changing that affinity through some configuration.
>
> Does that make sense? Perhaps others on this list have more ideas.
>
> Kanak
> ________________________________
> > Date: Mon, 16 Jun 2014 15:55:43 -0700
> > Subject: Using Helix for HDFS serving
> > From: varun@pinterest.com
> > To: user@helix.apache.org
> >
> > Hi folks,
> >
> > We are looking at helix for building a serving solution on top of HDFS
> > for data generated from mapreduce jobs. The files will be smaller than
> > the HDFS block size and hence each file will be on 3 replicas with each
> > replica having the whole file in entirety. A set of files output by MR
> > would be the resource and each file (or group of X files) would be a
> > partition.
> >
> > We can assume that there is a container which can serve these immutable
> > files for lookups. Since we have 3 replicas, we were wondering if we
> > could use helix for serving these files with 3 logically equivalent
> > replicas. We need a few things:
> >
> > a) In the steady state, when HDFS blocks are all triplicated, the
> > logical assigment of the 3 replicas should respect block affinity.
> >
> > b) When a node crashes, some blocks become under replicated both
> > physically and logically (from helix point of view). In such a case, we
> > don't want to carry out any transitions. Finally, over time (~ 20
> > minutes), HDFS will re replicate blocks so that physical replication
> > factor of 3 is attained. Once this happens, we want the logical
> > replication to catch up to 3 and also respect hdfs block placement.
> >
> > So there are two aspects, one is to retain block locality by doing
> > logical assigment in a way that the logical partition comes up on the
> > same nodes hosting the physical partition. Secondly, we want the
> > logical placement to trail the physical placement (as determined by
> > HDFS). So we could have the cluster in a non ideal state for a long
> > period of time - say 20-30 minutes.
> >
> > Please let us know if these are feasible with helix and if yes, what
> > would be the recommended practices.
> >
> > Thanks
> > Varun
> >
> >
>
>

RE: Using Helix for HDFS serving

Posted by Kanak Biscuitwala <ka...@hotmail.com>.
Hi Varun,

If you would like to do this using Helix today, there are ways to support this, but you will have to write your own rebalancer. Essentially what you need is to put your resource ideal state in CUSTOMIZED rebalance mode, and then have some code that will listen on HDFS changes and computes the logical partition assignment and writes the ideal state based on what HDFS looks like at that moment.

Eventually we want to define affinity-based assignments in our FULL_AUTO mode, but the challenge here is being able to represent that 30 minute delay, and changing that affinity through some configuration.

Does that make sense? Perhaps others on this list have more ideas.

Kanak
________________________________
> Date: Mon, 16 Jun 2014 15:55:43 -0700 
> Subject: Using Helix for HDFS serving 
> From: varun@pinterest.com 
> To: user@helix.apache.org 
> 
> Hi folks, 
> 
> We are looking at helix for building a serving solution on top of HDFS 
> for data generated from mapreduce jobs. The files will be smaller than 
> the HDFS block size and hence each file will be on 3 replicas with each 
> replica having the whole file in entirety. A set of files output by MR 
> would be the resource and each file (or group of X files) would be a 
> partition. 
> 
> We can assume that there is a container which can serve these immutable 
> files for lookups. Since we have 3 replicas, we were wondering if we 
> could use helix for serving these files with 3 logically equivalent 
> replicas. We need a few things: 
> 
> a) In the steady state, when HDFS blocks are all triplicated, the 
> logical assigment of the 3 replicas should respect block affinity. 
> 
> b) When a node crashes, some blocks become under replicated both 
> physically and logically (from helix point of view). In such a case, we 
> don't want to carry out any transitions. Finally, over time (~ 20 
> minutes), HDFS will re replicate blocks so that physical replication 
> factor of 3 is attained. Once this happens, we want the logical 
> replication to catch up to 3 and also respect hdfs block placement. 
> 
> So there are two aspects, one is to retain block locality by doing 
> logical assigment in a way that the logical partition comes up on the 
> same nodes hosting the physical partition. Secondly, we want the 
> logical placement to trail the physical placement (as determined by 
> HDFS). So we could have the cluster in a non ideal state for a long 
> period of time - say 20-30 minutes. 
> 
> Please let us know if these are feasible with helix and if yes, what 
> would be the recommended practices. 
> 
> Thanks 
> Varun 
> 
>