You are viewing a plain text version of this content. The canonical link for it is here.

Posted to blur-user@incubator.apache.org by Ravikumar Govindarajan <ra...@gmail.com> on 2017/04/21 16:44:22 UTC

Shard Server addition/removal

We have been facing lot of slowdown in production, whenever a shard-server
is added or removed...

Shards which were locally served via short-circuit suddenly becomes fully
remote & at scale, this melts down.

Block cache is kind of reactive cache & takes a lot of time to settle down
(at-least for us!!)

Have been thinking of handling this locality issue for some time now..

1. For every shard, Blur can map a primary server & a secondary server in
ZooKeeper
2. File-writes can use the favored nodes hint of Hadoop & write to both
these servers [https://issues.apache.org/jira/browse/HDFS-2576]
3. When a machine goes down, instead of randomly assigning shards to
different shard-servers, Blur can decide to allocate shards to designated
secondary servers.

Adding a new machine is another problem, where it will immediately start
serving shards from remote machines. It must need data copies of all
primary shards it is supposed serve from local disk..

hadoop has something called BlockPlacementPolicy that can be hacked into.
[
http://hadoopblog.blogspot.in/2009/09/hdfs-block-replica-placement-in-your.html
]

When booting a new machine, lets say we increase replication-factor from 3
to 4, for shards that will be hosted by new machine (setrep command from
hdfs console)

Now hadoop will call our CustomBlockPlacementPolicy class to arrange extra
replication, where we can sneak in the new IP..

Once all shards to be hosted by this new machine are replicated, we can
close these shards, update the mappings in ZK & open them. Data will be
served locally

Similarly, when restoring replication-factor from 4 to 3, our
CustomBlockPlacementPolicy class can hook up to ZK, find out which node to
delete the data & proceed...

Do let know your thoughts on this...

Re: Shard Server addition/removal

Posted by Ravikumar Govindarajan <ra...@gmail.com>.

Wow, this has been thought about 2 years back!! Amazing.

I checked the code. One nit. Favorite-node must be avoided if it's already
marked as an excluded-node

The use-case I stated above is a little complex than this. In-addition to
primary node (local machine serving the shard), we will now need to
instruct hadoop to favor the secondary too, during re-replication

Also, we have a mixed storage implementation. 1st replica will be on SSD of
local node, 2 & 3 on HDD of random nodes. So, need to handle it accordingly

But this is a great implementation to build on. Many thanks for this

--
Ravi

On Fri, Apr 28, 2017 at 1:15 AM, Tim Williams <wi...@gmail.com> wrote:

> Have you looked in /contrib for the block placement stuff?  Maybe it
> provides some ideas?
>
> https://git1-us-west.apache.org/repos/asf?p=incubator-
> blur.git;a=tree;f=contrib/blur-block-placement-policy;h=
> 743a50d6431f4f8cecbb0f55d75baf187da7f755;hb=HEAD
>
> Thanks,
> --tim
>
>
> On Wed, Apr 26, 2017 at 9:40 AM, Ravikumar Govindarajan
> <ra...@gmail.com> wrote:
> >>
> >> In case of HDFS or MAPRF can we dynamically assign
> >> shards to shardservers based on the data locality (using block
> locations)?
> >
> >
> > I was exploring the reverse option. Blur will suggest the set of
> > hadoop-datanodes to replicate while writing index files.
> >
> > Blur will also explicitly control bootstrapping a new datanode &
> > load-balancing it, as well as removing a datanode from cluster..
> >
> > Such fine control is possible by customizing BlockPlacementPolicy API...
> >
> > Have started exploring it. Changes look big. Will keep the group posted
> on
> > progress
> >
> > On Fri, Apr 21, 2017 at 10:42 PM, rahul challapalli <
> > challapallirahul@gmail.com> wrote:
> >
> >> Its been a while since I looked at the code, but I believe a shard
> server
> >> has a list of shards which it can serve. Now maintaining this static
> >> mapping (or tight coupling) between shard servers and shards is a design
> >> decision which makes complete sense for clusters where nodes do not
> share a
> >> distributed file system. In case of HDFS or MAPRF can we dynamically
> assign
> >> shards to shardservers based on the data locality (using block
> locations)?
> >> Obviously this hasn't been well thought out as a lot of components
> would be
> >> affected. Just dumping a few thoughts from my brain.
> >>
> >> - Rahul
> >>
> >> On Fri, Apr 21, 2017 at 9:44 AM, Ravikumar Govindarajan <
> >> ravikumar.govindarajan@gmail.com> wrote:
> >>
> >> > We have been facing lot of slowdown in production, whenever a
> >> shard-server
> >> > is added or removed...
> >> >
> >> > Shards which were locally served via short-circuit suddenly becomes
> fully
> >> > remote & at scale, this melts down.
> >> >
> >> > Block cache is kind of reactive cache & takes a lot of time to settle
> >> down
> >> > (at-least for us!!)
> >> >
> >> > Have been thinking of handling this locality issue for some time now..
> >> >
> >> > 1. For every shard, Blur can map a primary server & a secondary
> server in
> >> > ZooKeeper
> >> > 2. File-writes can use the favored nodes hint of Hadoop & write to
> both
> >> > these servers [https://issues.apache.org/jira/browse/HDFS-2576]
> >> > 3. When a machine goes down, instead of randomly assigning shards to
> >> > different shard-servers, Blur can decide to allocate shards to
> designated
> >> > secondary servers.
> >> >
> >> > Adding a new machine is another problem, where it will immediately
> start
> >> > serving shards from remote machines. It must need data copies of all
> >> > primary shards it is supposed serve from local disk..
> >> >
> >> > hadoop has something called BlockPlacementPolicy that can be hacked
> into.
> >> > [
> >> > http://hadoopblog.blogspot.in/2009/09/hdfs-block-replica-
> >> > placement-in-your.html
> >> > ]
> >> >
> >> > When booting a new machine, lets say we increase replication-factor
> from
> >> 3
> >> > to 4, for shards that will be hosted by new machine (setrep command
> from
> >> > hdfs console)
> >> >
> >> > Now hadoop will call our CustomBlockPlacementPolicy class to arrange
> >> extra
> >> > replication, where we can sneak in the new IP..
> >> >
> >> > Once all shards to be hosted by this new machine are replicated, we
> can
> >> > close these shards, update the mappings in ZK & open them. Data will
> be
> >> > served locally
> >> >
> >> > Similarly, when restoring replication-factor from 4 to 3, our
> >> > CustomBlockPlacementPolicy class can hook up to ZK, find out which
> node
> >> to
> >> > delete the data & proceed...
> >> >
> >> > Do let know your thoughts on this...
> >> >
> >>
>

Re: Shard Server addition/removal

Posted by Tim Williams <wi...@gmail.com>.

Have you looked in /contrib for the block placement stuff?  Maybe it
provides some ideas?

https://git1-us-west.apache.org/repos/asf?p=incubator-blur.git;a=tree;f=contrib/blur-block-placement-policy;h=743a50d6431f4f8cecbb0f55d75baf187da7f755;hb=HEAD

Thanks,
--tim


On Wed, Apr 26, 2017 at 9:40 AM, Ravikumar Govindarajan
<ra...@gmail.com> wrote:
>>
>> In case of HDFS or MAPRF can we dynamically assign
>> shards to shardservers based on the data locality (using block locations)?
>
>
> I was exploring the reverse option. Blur will suggest the set of
> hadoop-datanodes to replicate while writing index files.
>
> Blur will also explicitly control bootstrapping a new datanode &
> load-balancing it, as well as removing a datanode from cluster..
>
> Such fine control is possible by customizing BlockPlacementPolicy API...
>
> Have started exploring it. Changes look big. Will keep the group posted on
> progress
>
> On Fri, Apr 21, 2017 at 10:42 PM, rahul challapalli <
> challapallirahul@gmail.com> wrote:
>
>> Its been a while since I looked at the code, but I believe a shard server
>> has a list of shards which it can serve. Now maintaining this static
>> mapping (or tight coupling) between shard servers and shards is a design
>> decision which makes complete sense for clusters where nodes do not share a
>> distributed file system. In case of HDFS or MAPRF can we dynamically assign
>> shards to shardservers based on the data locality (using block locations)?
>> Obviously this hasn't been well thought out as a lot of components would be
>> affected. Just dumping a few thoughts from my brain.
>>
>> - Rahul
>>
>> On Fri, Apr 21, 2017 at 9:44 AM, Ravikumar Govindarajan <
>> ravikumar.govindarajan@gmail.com> wrote:
>>
>> > We have been facing lot of slowdown in production, whenever a
>> shard-server
>> > is added or removed...
>> >
>> > Shards which were locally served via short-circuit suddenly becomes fully
>> > remote & at scale, this melts down.
>> >
>> > Block cache is kind of reactive cache & takes a lot of time to settle
>> down
>> > (at-least for us!!)
>> >
>> > Have been thinking of handling this locality issue for some time now..
>> >
>> > 1. For every shard, Blur can map a primary server & a secondary server in
>> > ZooKeeper
>> > 2. File-writes can use the favored nodes hint of Hadoop & write to both
>> > these servers [https://issues.apache.org/jira/browse/HDFS-2576]
>> > 3. When a machine goes down, instead of randomly assigning shards to
>> > different shard-servers, Blur can decide to allocate shards to designated
>> > secondary servers.
>> >
>> > Adding a new machine is another problem, where it will immediately start
>> > serving shards from remote machines. It must need data copies of all
>> > primary shards it is supposed serve from local disk..
>> >
>> > hadoop has something called BlockPlacementPolicy that can be hacked into.
>> > [
>> > http://hadoopblog.blogspot.in/2009/09/hdfs-block-replica-
>> > placement-in-your.html
>> > ]
>> >
>> > When booting a new machine, lets say we increase replication-factor from
>> 3
>> > to 4, for shards that will be hosted by new machine (setrep command from
>> > hdfs console)
>> >
>> > Now hadoop will call our CustomBlockPlacementPolicy class to arrange
>> extra
>> > replication, where we can sneak in the new IP..
>> >
>> > Once all shards to be hosted by this new machine are replicated, we can
>> > close these shards, update the mappings in ZK & open them. Data will be
>> > served locally
>> >
>> > Similarly, when restoring replication-factor from 4 to 3, our
>> > CustomBlockPlacementPolicy class can hook up to ZK, find out which node
>> to
>> > delete the data & proceed...
>> >
>> > Do let know your thoughts on this...
>> >
>>

Re: Shard Server addition/removal

Posted by Ravikumar Govindarajan <ra...@gmail.com>.

>
> In case of HDFS or MAPRF can we dynamically assign
> shards to shardservers based on the data locality (using block locations)?


I was exploring the reverse option. Blur will suggest the set of
hadoop-datanodes to replicate while writing index files.

Blur will also explicitly control bootstrapping a new datanode &
load-balancing it, as well as removing a datanode from cluster..

Such fine control is possible by customizing BlockPlacementPolicy API...

Have started exploring it. Changes look big. Will keep the group posted on
progress

On Fri, Apr 21, 2017 at 10:42 PM, rahul challapalli <
challapallirahul@gmail.com> wrote:

> Its been a while since I looked at the code, but I believe a shard server
> has a list of shards which it can serve. Now maintaining this static
> mapping (or tight coupling) between shard servers and shards is a design
> decision which makes complete sense for clusters where nodes do not share a
> distributed file system. In case of HDFS or MAPRF can we dynamically assign
> shards to shardservers based on the data locality (using block locations)?
> Obviously this hasn't been well thought out as a lot of components would be
> affected. Just dumping a few thoughts from my brain.
>
> - Rahul
>
> On Fri, Apr 21, 2017 at 9:44 AM, Ravikumar Govindarajan <
> ravikumar.govindarajan@gmail.com> wrote:
>
> > We have been facing lot of slowdown in production, whenever a
> shard-server
> > is added or removed...
> >
> > Shards which were locally served via short-circuit suddenly becomes fully
> > remote & at scale, this melts down.
> >
> > Block cache is kind of reactive cache & takes a lot of time to settle
> down
> > (at-least for us!!)
> >
> > Have been thinking of handling this locality issue for some time now..
> >
> > 1. For every shard, Blur can map a primary server & a secondary server in
> > ZooKeeper
> > 2. File-writes can use the favored nodes hint of Hadoop & write to both
> > these servers [https://issues.apache.org/jira/browse/HDFS-2576]
> > 3. When a machine goes down, instead of randomly assigning shards to
> > different shard-servers, Blur can decide to allocate shards to designated
> > secondary servers.
> >
> > Adding a new machine is another problem, where it will immediately start
> > serving shards from remote machines. It must need data copies of all
> > primary shards it is supposed serve from local disk..
> >
> > hadoop has something called BlockPlacementPolicy that can be hacked into.
> > [
> > http://hadoopblog.blogspot.in/2009/09/hdfs-block-replica-
> > placement-in-your.html
> > ]
> >
> > When booting a new machine, lets say we increase replication-factor from
> 3
> > to 4, for shards that will be hosted by new machine (setrep command from
> > hdfs console)
> >
> > Now hadoop will call our CustomBlockPlacementPolicy class to arrange
> extra
> > replication, where we can sneak in the new IP..
> >
> > Once all shards to be hosted by this new machine are replicated, we can
> > close these shards, update the mappings in ZK & open them. Data will be
> > served locally
> >
> > Similarly, when restoring replication-factor from 4 to 3, our
> > CustomBlockPlacementPolicy class can hook up to ZK, find out which node
> to
> > delete the data & proceed...
> >
> > Do let know your thoughts on this...
> >
>

Re: Shard Server addition/removal

Posted by rahul challapalli <ch...@gmail.com>.

Its been a while since I looked at the code, but I believe a shard server
has a list of shards which it can serve. Now maintaining this static
mapping (or tight coupling) between shard servers and shards is a design
decision which makes complete sense for clusters where nodes do not share a
distributed file system. In case of HDFS or MAPRF can we dynamically assign
shards to shardservers based on the data locality (using block locations)?
Obviously this hasn't been well thought out as a lot of components would be
affected. Just dumping a few thoughts from my brain.

- Rahul

On Fri, Apr 21, 2017 at 9:44 AM, Ravikumar Govindarajan <
ravikumar.govindarajan@gmail.com> wrote:

> We have been facing lot of slowdown in production, whenever a shard-server
> is added or removed...
>
> Shards which were locally served via short-circuit suddenly becomes fully
> remote & at scale, this melts down.
>
> Block cache is kind of reactive cache & takes a lot of time to settle down
> (at-least for us!!)
>
> Have been thinking of handling this locality issue for some time now..
>
> 1. For every shard, Blur can map a primary server & a secondary server in
> ZooKeeper
> 2. File-writes can use the favored nodes hint of Hadoop & write to both
> these servers [https://issues.apache.org/jira/browse/HDFS-2576]
> 3. When a machine goes down, instead of randomly assigning shards to
> different shard-servers, Blur can decide to allocate shards to designated
> secondary servers.
>
> Adding a new machine is another problem, where it will immediately start
> serving shards from remote machines. It must need data copies of all
> primary shards it is supposed serve from local disk..
>
> hadoop has something called BlockPlacementPolicy that can be hacked into.
> [
> http://hadoopblog.blogspot.in/2009/09/hdfs-block-replica-
> placement-in-your.html
> ]
>
> When booting a new machine, lets say we increase replication-factor from 3
> to 4, for shards that will be hosted by new machine (setrep command from
> hdfs console)
>
> Now hadoop will call our CustomBlockPlacementPolicy class to arrange extra
> replication, where we can sneak in the new IP..
>
> Once all shards to be hosted by this new machine are replicated, we can
> close these shards, update the mappings in ZK & open them. Data will be
> served locally
>
> Similarly, when restoring replication-factor from 4 to 3, our
> CustomBlockPlacementPolicy class can hook up to ZK, find out which node to
> delete the data & proceed...
>
> Do let know your thoughts on this...
>