You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by Karthiek C <ka...@gmail.com> on 2013/02/22 19:13:34 UTC

APIs to move data blocks within HDFS

Hi,

Is there any APIs to move data blocks in HDFS from one node to another *
after* they have been added to HDFS? Also can we write some sort of
pluggable module (like scheduler) that controls how data gets placed in
hadoop cluster? I am working with hadoop-1.0.3 version and I couldn't find
any filesystem APIs available to do that.

PS: I am working on a research project where we want to investigate how to
optimally place data in hadoop.

Thanks,
Karthiek

Re: APIs to move data blocks within HDFS

Posted by Karthiek C <ka...@gmail.com>.
Thank you Harsh and Chris. This really helps!

-Karthiek

On Fri, Feb 22, 2013 at 2:46 PM, Chris Nauroth <cn...@hortonworks.com>wrote:

> Regarding your question about a pluggable module to control placement of
> data, try taking a look at the abstract class BlockPlacementPolicy and
> BlockPlacementPolicyDefault, which is its default implementation.
>
> On branch-1, you can find these classes
> at src/hdfs/org/apache/hadoop/hdfs/server/namenode.  On trunk, the package
> structure is different, and these classes are
> at
> hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement.
>
> Best of luck with your research!
>
> --Chris
>
>
> On Fri, Feb 22, 2013 at 11:17 AM, Harsh J <ha...@cloudera.com> wrote:
>
> > There's no filesystem (i.e. client) level APIs to do this, but the
> > Balancer tool of HDFS does exactly this. Reading its sources should
> > let you understand what kinda calls you need to make to reuse the
> > balancer protocol and achieve what you need.
> >
> > In trunk, the balancer is at
> >
> >
> hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java
> >
> > HTH, and feel free to ask any relevant follow up questions.
> >
> > On Fri, Feb 22, 2013 at 11:43 PM, Karthiek C <ka...@gmail.com>
> wrote:
> > > Hi,
> > >
> > > Is there any APIs to move data blocks in HDFS from one node to another
> *
> > > after* they have been added to HDFS? Also can we write some sort of
> > > pluggable module (like scheduler) that controls how data gets placed in
> > > hadoop cluster? I am working with hadoop-1.0.3 version and I couldn't
> > find
> > > any filesystem APIs available to do that.
> > >
> > > PS: I am working on a research project where we want to investigate how
> > to
> > > optimally place data in hadoop.
> > >
> > > Thanks,
> > > Karthiek
> >
> >
> >
> > --
> > Harsh J
> >
>

Re: APIs to move data blocks within HDFS

Posted by Chris Nauroth <cn...@hortonworks.com>.
Regarding your question about a pluggable module to control placement of
data, try taking a look at the abstract class BlockPlacementPolicy and
BlockPlacementPolicyDefault, which is its default implementation.

On branch-1, you can find these classes
at src/hdfs/org/apache/hadoop/hdfs/server/namenode.  On trunk, the package
structure is different, and these classes are
at hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement.

Best of luck with your research!

--Chris


On Fri, Feb 22, 2013 at 11:17 AM, Harsh J <ha...@cloudera.com> wrote:

> There's no filesystem (i.e. client) level APIs to do this, but the
> Balancer tool of HDFS does exactly this. Reading its sources should
> let you understand what kinda calls you need to make to reuse the
> balancer protocol and achieve what you need.
>
> In trunk, the balancer is at
>
> hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java
>
> HTH, and feel free to ask any relevant follow up questions.
>
> On Fri, Feb 22, 2013 at 11:43 PM, Karthiek C <ka...@gmail.com> wrote:
> > Hi,
> >
> > Is there any APIs to move data blocks in HDFS from one node to another *
> > after* they have been added to HDFS? Also can we write some sort of
> > pluggable module (like scheduler) that controls how data gets placed in
> > hadoop cluster? I am working with hadoop-1.0.3 version and I couldn't
> find
> > any filesystem APIs available to do that.
> >
> > PS: I am working on a research project where we want to investigate how
> to
> > optimally place data in hadoop.
> >
> > Thanks,
> > Karthiek
>
>
>
> --
> Harsh J
>

Re: APIs to move data blocks within HDFS

Posted by Harsh J <ha...@cloudera.com>.
There's no filesystem (i.e. client) level APIs to do this, but the
Balancer tool of HDFS does exactly this. Reading its sources should
let you understand what kinda calls you need to make to reuse the
balancer protocol and achieve what you need.

In trunk, the balancer is at
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java

HTH, and feel free to ask any relevant follow up questions.

On Fri, Feb 22, 2013 at 11:43 PM, Karthiek C <ka...@gmail.com> wrote:
> Hi,
>
> Is there any APIs to move data blocks in HDFS from one node to another *
> after* they have been added to HDFS? Also can we write some sort of
> pluggable module (like scheduler) that controls how data gets placed in
> hadoop cluster? I am working with hadoop-1.0.3 version and I couldn't find
> any filesystem APIs available to do that.
>
> PS: I am working on a research project where we want to investigate how to
> optimally place data in hadoop.
>
> Thanks,
> Karthiek



--
Harsh J