You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Alex Baranau <al...@gmail.com> on 2012/07/18 17:45:49 UTC

Bulk Import & Data Locality

Hello,

As far as I understand Bulk Import functionality will not take into account
the Data Locality question. MR job will create number of reducer tasks same
as regions to write into, but it will not "advice" on which nodes to run
these tasks. In that case Reducer task which writes HFiles of some region
may not be physically located at the same node as RS that serves that
region. The way HDFS writes data, there will be (likely) one full replica
of bolcks of HFiles of this Region written on the node where Reducer task
was run and other replicas (if replication >1) will be distributed randomly
over the cluster. Thus, RS while serving data of that region will (most
likely) not look at local data (data will be transferred from other
datanodes). I.e. data locality will be broken.

Is this correct?

If yes, I guess, if we could tell MR framework where (which nodes) to
launch certain Reducer tasks, this would help us. I believe this is not
possible with MR1, please correct me if I'm wrong. Perhaps, this is this
possible with MR2?

I assume there's no way to provide a "hint" to a NameNode where to place
blocks of a new File too, right?

Thank you,
-- 
Alex Baranau
------
Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
Solr

Fwd: Bulk Import & Data Locality

Posted by Alex Baranau <al...@gmail.com>.
Thank you a lot for the replies.

To me it is clear when data locality gets broken though (and it is not only
the failure of the RS, there are other cases). I was hoping more for
suggestions around this particular use-case: assuming that nodes/RSs are
stable, how to make sure to achieve the data locality when doing bulk
import (writing HFiles directly from MR job). Running major compaction
helps here (as new files are created instead of old ones *on the DataNode
local to RS where region is being compacted), but I'd really want to not do
it. This is quite resource intensive and thus expensive process...

I was hoping also guys from HDFS/MapReduce teams would comment on my latter
Qs.

I heard that there is some work in HBase community to allow "asking" HDFS
to replicate blocks of the files together (so that there are full replicas
on other nodes, which helps as Lars noted) too. I also heard from a HDFS
guy that there are ideas around better replication logic.

Little offtop:

>> >>>> Also is it correct to say that if i set smaller data block size data
>> >>>> locality gets worse, and if data block size gets bigger  data
>> locality
>> >>> gets
>> >>>> better.

*Theoretically* if your region data stored in one HFile (say one flush
occurred or major compaction caused that, given that there's one CF) and
this HFile is smaller than the configured block size on HDFS, then we can
say that 3  (or whatever is replication) replicas of this file (and hence
of this region) are "full" replicas, which makes it easier to preserve data
locality if RS fails down (or when anything else cause re-assigning the
region). But since Region size is usually much bigger (usually 10-20 times
bigger at least), this fact doesn't buy you something.

Alex Baranau
------
Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
Solr

On Wed, Jul 18, 2012 at 9:43 PM, Ben Kim <be...@gmail.com> wrote:

> I added some Q&A's went with Lars. Hope this is somewhat related to your
> data locality questions.
>
>  >>>
> >> >>> On Jun 15, 2012, at 6:56 AM, Ben Kim wrote:
> >> >>>
> >> >>>> Hi,
> >> >>>>
> >> >>>> I've been posting questions in the mailing-list quiet often lately,
> >> and
> >> >>>> here goes another one about data locality
> >> >>>> I read the excellent blog post about data locality that Lars George
> >> wrote
> >> >>>> at
> >> http://www.larsgeorge.com/2010/05/hbase-file-locality-in-hdfs.html
> >> >>>>
> >> >>>> I understand data locality in hbase as locating a region in a
> >> >>> region-server
> >> >>>> where most of its data blocks reside.
> >> >>>
> >> >>> The opposite is happening, i.e. the region server process triggers
> >> for all
> >> >>> data it writes to be located on the same physical machine.
> >> >>>
> >> >>>> So that way fast data access is guranteed when running a MR because
> >> each
> >> >>>> map/reduce task is run for each region in the tasktracker where the
> >> >>> region
> >> >>>> co-locates.
> >> >>>
> >> >>> Correct.
> >> >>>
> >> >>>> But what if the data blocks of the region are evenly spread over
> >> multiple
> >> >>>> region-servers?
> >> >>>
> >> >>> This will not happen, unless the original server fails. Then the
> >> region is
> >> >>> moved to another that now needs to do a lot of remote reads over the
> >> >>> network. This is way there is work being done to allow for custom
> >> placement
> >> >>> policies in HDFS. That way you can store the entire region and all
> >> copies
> >> >>> as complete units on three data nodes. In case of a failure you can
> >> then
> >> >>> move the region to one of the two copies. This is not available yet
> >> though,
> >> >>> but it is being worked on (so I heard).
> >> >>>
> >> >>>> Does a MR task has to remotely access the data blocks from other
> >> >>>> regionservers?
> >> >>>
> >> >>> For the above failure case, it would be the region server accessing
> >> the
> >> >>> remote data, yes.
> >> >>>
> >> >>>> How good is hbase locating datablocks where a region resides?
> >> >>>
> >> >>> That is again the wrong way around. HBase has no clue as to where
> >> blocks
> >> >>> reside, nor does it know that the file system in fact uses separate
> >> blocks.
> >> >>> HBase stores files, HDFS does the block magic underneath the hood,
> and
> >> >>> transparent to HBase.
> >> >>>
> >> >>>> Also is it correct to say that if i set smaller data block size
> data
> >> >>>> locality gets worse, and if data block size gets bigger  data
> >> locality
> >> >>> gets
> >> >>>> better.
> >> >>>
> >> >>> This is not applicable here, I am assuming this stems from the above
> >> >>> confusion about which system is handling the blocks, HBase or HDFS.
> >> See
> >> >>> above.
> >> >>>
> >> >>> HTH,
> >> >>> Lars
> >>
> >
>
>
>
> On Thu, Jul 19, 2012 at 6:39 AM, Cristofer Weber <
> cristofer.weber@neogrid.com> wrote:
>
> > Hi Alex,
> >
> > I ran one of our bulk import jobs with partial payload, without
> proceeding
> > with major compaction, and you are right: Some hdfs blocks are in a
> > different datanode.
> >
> > -----Mensagem original-----
> > De: Alex Baranau [mailto:alex.baranov.v@gmail.com]
> > Enviada em: quarta-feira, 18 de julho de 2012 12:46
> > Para: hbase-user@hadoop.apache.org; mapreduce-user@hadoop.apache.org;
> > hdfs-user@hadoop.apache.org
> > Assunto: Bulk Import & Data Locality
> >
> > Hello,
> >
> > As far as I understand Bulk Import functionality will not take into
> > account the Data Locality question. MR job will create number of reducer
> > tasks same as regions to write into, but it will not "advice" on which
> > nodes to run these tasks. In that case Reducer task which writes HFiles
> of
> > some region may not be physically located at the same node as RS that
> > serves that region. The way HDFS writes data, there will be (likely) one
> > full replica of bolcks of HFiles of this Region written on the node where
> > Reducer task was run and other replicas (if replication >1) will be
> > distributed randomly over the cluster. Thus, RS while serving data of
> that
> > region will (most
> > likely) not look at local data (data will be transferred from other
> > datanodes). I.e. data locality will be broken.
> >
> > Is this correct?
> >
> > If yes, I guess, if we could tell MR framework where (which nodes) to
> > launch certain Reducer tasks, this would help us. I believe this is not
> > possible with MR1, please correct me if I'm wrong. Perhaps, this is this
> > possible with MR2?
> >
> > I assume there's no way to provide a "hint" to a NameNode where to place
> > blocks of a new File too, right?
> >
> > Thank you,
> > --
> > Alex Baranau
> > ------
> > Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch
> -
> > Solr
> >
>
>
>
> --
>
> *Benjamin Kim*
> *benkimkimben at gmail*
>



-- 
Alex Baranau
------
Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
Solr




-- 
Alex Baranau
------
Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
Solr

Fwd: Bulk Import & Data Locality

Posted by Alex Baranau <al...@gmail.com>.
Thank you a lot for the replies.

To me it is clear when data locality gets broken though (and it is not only
the failure of the RS, there are other cases). I was hoping more for
suggestions around this particular use-case: assuming that nodes/RSs are
stable, how to make sure to achieve the data locality when doing bulk
import (writing HFiles directly from MR job). Running major compaction
helps here (as new files are created instead of old ones *on the DataNode
local to RS where region is being compacted), but I'd really want to not do
it. This is quite resource intensive and thus expensive process...

I was hoping also guys from HDFS/MapReduce teams would comment on my latter
Qs.

I heard that there is some work in HBase community to allow "asking" HDFS
to replicate blocks of the files together (so that there are full replicas
on other nodes, which helps as Lars noted) too. I also heard from a HDFS
guy that there are ideas around better replication logic.

Little offtop:

>> >>>> Also is it correct to say that if i set smaller data block size data
>> >>>> locality gets worse, and if data block size gets bigger  data
>> locality
>> >>> gets
>> >>>> better.

*Theoretically* if your region data stored in one HFile (say one flush
occurred or major compaction caused that, given that there's one CF) and
this HFile is smaller than the configured block size on HDFS, then we can
say that 3  (or whatever is replication) replicas of this file (and hence
of this region) are "full" replicas, which makes it easier to preserve data
locality if RS fails down (or when anything else cause re-assigning the
region). But since Region size is usually much bigger (usually 10-20 times
bigger at least), this fact doesn't buy you something.

Alex Baranau
------
Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
Solr

On Wed, Jul 18, 2012 at 9:43 PM, Ben Kim <be...@gmail.com> wrote:

> I added some Q&A's went with Lars. Hope this is somewhat related to your
> data locality questions.
>
>  >>>
> >> >>> On Jun 15, 2012, at 6:56 AM, Ben Kim wrote:
> >> >>>
> >> >>>> Hi,
> >> >>>>
> >> >>>> I've been posting questions in the mailing-list quiet often lately,
> >> and
> >> >>>> here goes another one about data locality
> >> >>>> I read the excellent blog post about data locality that Lars George
> >> wrote
> >> >>>> at
> >> http://www.larsgeorge.com/2010/05/hbase-file-locality-in-hdfs.html
> >> >>>>
> >> >>>> I understand data locality in hbase as locating a region in a
> >> >>> region-server
> >> >>>> where most of its data blocks reside.
> >> >>>
> >> >>> The opposite is happening, i.e. the region server process triggers
> >> for all
> >> >>> data it writes to be located on the same physical machine.
> >> >>>
> >> >>>> So that way fast data access is guranteed when running a MR because
> >> each
> >> >>>> map/reduce task is run for each region in the tasktracker where the
> >> >>> region
> >> >>>> co-locates.
> >> >>>
> >> >>> Correct.
> >> >>>
> >> >>>> But what if the data blocks of the region are evenly spread over
> >> multiple
> >> >>>> region-servers?
> >> >>>
> >> >>> This will not happen, unless the original server fails. Then the
> >> region is
> >> >>> moved to another that now needs to do a lot of remote reads over the
> >> >>> network. This is way there is work being done to allow for custom
> >> placement
> >> >>> policies in HDFS. That way you can store the entire region and all
> >> copies
> >> >>> as complete units on three data nodes. In case of a failure you can
> >> then
> >> >>> move the region to one of the two copies. This is not available yet
> >> though,
> >> >>> but it is being worked on (so I heard).
> >> >>>
> >> >>>> Does a MR task has to remotely access the data blocks from other
> >> >>>> regionservers?
> >> >>>
> >> >>> For the above failure case, it would be the region server accessing
> >> the
> >> >>> remote data, yes.
> >> >>>
> >> >>>> How good is hbase locating datablocks where a region resides?
> >> >>>
> >> >>> That is again the wrong way around. HBase has no clue as to where
> >> blocks
> >> >>> reside, nor does it know that the file system in fact uses separate
> >> blocks.
> >> >>> HBase stores files, HDFS does the block magic underneath the hood,
> and
> >> >>> transparent to HBase.
> >> >>>
> >> >>>> Also is it correct to say that if i set smaller data block size
> data
> >> >>>> locality gets worse, and if data block size gets bigger  data
> >> locality
> >> >>> gets
> >> >>>> better.
> >> >>>
> >> >>> This is not applicable here, I am assuming this stems from the above
> >> >>> confusion about which system is handling the blocks, HBase or HDFS.
> >> See
> >> >>> above.
> >> >>>
> >> >>> HTH,
> >> >>> Lars
> >>
> >
>
>
>
> On Thu, Jul 19, 2012 at 6:39 AM, Cristofer Weber <
> cristofer.weber@neogrid.com> wrote:
>
> > Hi Alex,
> >
> > I ran one of our bulk import jobs with partial payload, without
> proceeding
> > with major compaction, and you are right: Some hdfs blocks are in a
> > different datanode.
> >
> > -----Mensagem original-----
> > De: Alex Baranau [mailto:alex.baranov.v@gmail.com]
> > Enviada em: quarta-feira, 18 de julho de 2012 12:46
> > Para: hbase-user@hadoop.apache.org; mapreduce-user@hadoop.apache.org;
> > hdfs-user@hadoop.apache.org
> > Assunto: Bulk Import & Data Locality
> >
> > Hello,
> >
> > As far as I understand Bulk Import functionality will not take into
> > account the Data Locality question. MR job will create number of reducer
> > tasks same as regions to write into, but it will not "advice" on which
> > nodes to run these tasks. In that case Reducer task which writes HFiles
> of
> > some region may not be physically located at the same node as RS that
> > serves that region. The way HDFS writes data, there will be (likely) one
> > full replica of bolcks of HFiles of this Region written on the node where
> > Reducer task was run and other replicas (if replication >1) will be
> > distributed randomly over the cluster. Thus, RS while serving data of
> that
> > region will (most
> > likely) not look at local data (data will be transferred from other
> > datanodes). I.e. data locality will be broken.
> >
> > Is this correct?
> >
> > If yes, I guess, if we could tell MR framework where (which nodes) to
> > launch certain Reducer tasks, this would help us. I believe this is not
> > possible with MR1, please correct me if I'm wrong. Perhaps, this is this
> > possible with MR2?
> >
> > I assume there's no way to provide a "hint" to a NameNode where to place
> > blocks of a new File too, right?
> >
> > Thank you,
> > --
> > Alex Baranau
> > ------
> > Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch
> -
> > Solr
> >
>
>
>
> --
>
> *Benjamin Kim*
> *benkimkimben at gmail*
>



-- 
Alex Baranau
------
Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
Solr




-- 
Alex Baranau
------
Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
Solr

Fwd: Bulk Import & Data Locality

Posted by Alex Baranau <al...@gmail.com>.
Thank you a lot for the replies.

To me it is clear when data locality gets broken though (and it is not only
the failure of the RS, there are other cases). I was hoping more for
suggestions around this particular use-case: assuming that nodes/RSs are
stable, how to make sure to achieve the data locality when doing bulk
import (writing HFiles directly from MR job). Running major compaction
helps here (as new files are created instead of old ones *on the DataNode
local to RS where region is being compacted), but I'd really want to not do
it. This is quite resource intensive and thus expensive process...

I was hoping also guys from HDFS/MapReduce teams would comment on my latter
Qs.

I heard that there is some work in HBase community to allow "asking" HDFS
to replicate blocks of the files together (so that there are full replicas
on other nodes, which helps as Lars noted) too. I also heard from a HDFS
guy that there are ideas around better replication logic.

Little offtop:

>> >>>> Also is it correct to say that if i set smaller data block size data
>> >>>> locality gets worse, and if data block size gets bigger  data
>> locality
>> >>> gets
>> >>>> better.

*Theoretically* if your region data stored in one HFile (say one flush
occurred or major compaction caused that, given that there's one CF) and
this HFile is smaller than the configured block size on HDFS, then we can
say that 3  (or whatever is replication) replicas of this file (and hence
of this region) are "full" replicas, which makes it easier to preserve data
locality if RS fails down (or when anything else cause re-assigning the
region). But since Region size is usually much bigger (usually 10-20 times
bigger at least), this fact doesn't buy you something.

Alex Baranau
------
Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
Solr

On Wed, Jul 18, 2012 at 9:43 PM, Ben Kim <be...@gmail.com> wrote:

> I added some Q&A's went with Lars. Hope this is somewhat related to your
> data locality questions.
>
>  >>>
> >> >>> On Jun 15, 2012, at 6:56 AM, Ben Kim wrote:
> >> >>>
> >> >>>> Hi,
> >> >>>>
> >> >>>> I've been posting questions in the mailing-list quiet often lately,
> >> and
> >> >>>> here goes another one about data locality
> >> >>>> I read the excellent blog post about data locality that Lars George
> >> wrote
> >> >>>> at
> >> http://www.larsgeorge.com/2010/05/hbase-file-locality-in-hdfs.html
> >> >>>>
> >> >>>> I understand data locality in hbase as locating a region in a
> >> >>> region-server
> >> >>>> where most of its data blocks reside.
> >> >>>
> >> >>> The opposite is happening, i.e. the region server process triggers
> >> for all
> >> >>> data it writes to be located on the same physical machine.
> >> >>>
> >> >>>> So that way fast data access is guranteed when running a MR because
> >> each
> >> >>>> map/reduce task is run for each region in the tasktracker where the
> >> >>> region
> >> >>>> co-locates.
> >> >>>
> >> >>> Correct.
> >> >>>
> >> >>>> But what if the data blocks of the region are evenly spread over
> >> multiple
> >> >>>> region-servers?
> >> >>>
> >> >>> This will not happen, unless the original server fails. Then the
> >> region is
> >> >>> moved to another that now needs to do a lot of remote reads over the
> >> >>> network. This is way there is work being done to allow for custom
> >> placement
> >> >>> policies in HDFS. That way you can store the entire region and all
> >> copies
> >> >>> as complete units on three data nodes. In case of a failure you can
> >> then
> >> >>> move the region to one of the two copies. This is not available yet
> >> though,
> >> >>> but it is being worked on (so I heard).
> >> >>>
> >> >>>> Does a MR task has to remotely access the data blocks from other
> >> >>>> regionservers?
> >> >>>
> >> >>> For the above failure case, it would be the region server accessing
> >> the
> >> >>> remote data, yes.
> >> >>>
> >> >>>> How good is hbase locating datablocks where a region resides?
> >> >>>
> >> >>> That is again the wrong way around. HBase has no clue as to where
> >> blocks
> >> >>> reside, nor does it know that the file system in fact uses separate
> >> blocks.
> >> >>> HBase stores files, HDFS does the block magic underneath the hood,
> and
> >> >>> transparent to HBase.
> >> >>>
> >> >>>> Also is it correct to say that if i set smaller data block size
> data
> >> >>>> locality gets worse, and if data block size gets bigger  data
> >> locality
> >> >>> gets
> >> >>>> better.
> >> >>>
> >> >>> This is not applicable here, I am assuming this stems from the above
> >> >>> confusion about which system is handling the blocks, HBase or HDFS.
> >> See
> >> >>> above.
> >> >>>
> >> >>> HTH,
> >> >>> Lars
> >>
> >
>
>
>
> On Thu, Jul 19, 2012 at 6:39 AM, Cristofer Weber <
> cristofer.weber@neogrid.com> wrote:
>
> > Hi Alex,
> >
> > I ran one of our bulk import jobs with partial payload, without
> proceeding
> > with major compaction, and you are right: Some hdfs blocks are in a
> > different datanode.
> >
> > -----Mensagem original-----
> > De: Alex Baranau [mailto:alex.baranov.v@gmail.com]
> > Enviada em: quarta-feira, 18 de julho de 2012 12:46
> > Para: hbase-user@hadoop.apache.org; mapreduce-user@hadoop.apache.org;
> > hdfs-user@hadoop.apache.org
> > Assunto: Bulk Import & Data Locality
> >
> > Hello,
> >
> > As far as I understand Bulk Import functionality will not take into
> > account the Data Locality question. MR job will create number of reducer
> > tasks same as regions to write into, but it will not "advice" on which
> > nodes to run these tasks. In that case Reducer task which writes HFiles
> of
> > some region may not be physically located at the same node as RS that
> > serves that region. The way HDFS writes data, there will be (likely) one
> > full replica of bolcks of HFiles of this Region written on the node where
> > Reducer task was run and other replicas (if replication >1) will be
> > distributed randomly over the cluster. Thus, RS while serving data of
> that
> > region will (most
> > likely) not look at local data (data will be transferred from other
> > datanodes). I.e. data locality will be broken.
> >
> > Is this correct?
> >
> > If yes, I guess, if we could tell MR framework where (which nodes) to
> > launch certain Reducer tasks, this would help us. I believe this is not
> > possible with MR1, please correct me if I'm wrong. Perhaps, this is this
> > possible with MR2?
> >
> > I assume there's no way to provide a "hint" to a NameNode where to place
> > blocks of a new File too, right?
> >
> > Thank you,
> > --
> > Alex Baranau
> > ------
> > Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch
> -
> > Solr
> >
>
>
>
> --
>
> *Benjamin Kim*
> *benkimkimben at gmail*
>



-- 
Alex Baranau
------
Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
Solr




-- 
Alex Baranau
------
Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
Solr

Re: Bulk Import & Data Locality

Posted by Alex Baranau <al...@gmail.com>.
Thank you a lot for the replies.

To me it is clear when data locality gets broken though (and it is not only
the failure of the RS, there are other cases). I was hoping more for
suggestions around this particular use-case: assuming that nodes/RSs are
stable, how to make sure to achieve the data locality when doing bulk
import (writing HFiles directly from MR job). Running major compaction
helps here (as new files are created instead of old ones *on the DataNode
local to RS where region is being compacted), but I'd really want to not do
it. This is quite resource intensive and thus expensive process...

I was hoping also guys from HDFS/MapReduce teams would comment on my latter
Qs.

I heard that there is some work in HBase community to allow "asking" HDFS
to replicate blocks of the files together (so that there are full replicas
on other nodes, which helps as Lars noted) too. I also heard from a HDFS
guy that there are ideas around better replication logic.

Little offtop:

>> >>>> Also is it correct to say that if i set smaller data block size data
>> >>>> locality gets worse, and if data block size gets bigger  data
>> locality
>> >>> gets
>> >>>> better.

*Theoretically* if your region data stored in one HFile (say one flush
occurred or major compaction caused that, given that there's one CF) and
this HFile is smaller than the configured block size on HDFS, then we can
say that 3  (or whatever is replication) replicas of this file (and hence
of this region) are "full" replicas, which makes it easier to preserve data
locality if RS fails down (or when anything else cause re-assigning the
region). But since Region size is usually much bigger (usually 10-20 times
bigger at least), this fact doesn't buy you something.

Alex Baranau
------
Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
Solr

On Wed, Jul 18, 2012 at 9:43 PM, Ben Kim <be...@gmail.com> wrote:

> I added some Q&A's went with Lars. Hope this is somewhat related to your
> data locality questions.
>
>  >>>
> >> >>> On Jun 15, 2012, at 6:56 AM, Ben Kim wrote:
> >> >>>
> >> >>>> Hi,
> >> >>>>
> >> >>>> I've been posting questions in the mailing-list quiet often lately,
> >> and
> >> >>>> here goes another one about data locality
> >> >>>> I read the excellent blog post about data locality that Lars George
> >> wrote
> >> >>>> at
> >> http://www.larsgeorge.com/2010/05/hbase-file-locality-in-hdfs.html
> >> >>>>
> >> >>>> I understand data locality in hbase as locating a region in a
> >> >>> region-server
> >> >>>> where most of its data blocks reside.
> >> >>>
> >> >>> The opposite is happening, i.e. the region server process triggers
> >> for all
> >> >>> data it writes to be located on the same physical machine.
> >> >>>
> >> >>>> So that way fast data access is guranteed when running a MR because
> >> each
> >> >>>> map/reduce task is run for each region in the tasktracker where the
> >> >>> region
> >> >>>> co-locates.
> >> >>>
> >> >>> Correct.
> >> >>>
> >> >>>> But what if the data blocks of the region are evenly spread over
> >> multiple
> >> >>>> region-servers?
> >> >>>
> >> >>> This will not happen, unless the original server fails. Then the
> >> region is
> >> >>> moved to another that now needs to do a lot of remote reads over the
> >> >>> network. This is way there is work being done to allow for custom
> >> placement
> >> >>> policies in HDFS. That way you can store the entire region and all
> >> copies
> >> >>> as complete units on three data nodes. In case of a failure you can
> >> then
> >> >>> move the region to one of the two copies. This is not available yet
> >> though,
> >> >>> but it is being worked on (so I heard).
> >> >>>
> >> >>>> Does a MR task has to remotely access the data blocks from other
> >> >>>> regionservers?
> >> >>>
> >> >>> For the above failure case, it would be the region server accessing
> >> the
> >> >>> remote data, yes.
> >> >>>
> >> >>>> How good is hbase locating datablocks where a region resides?
> >> >>>
> >> >>> That is again the wrong way around. HBase has no clue as to where
> >> blocks
> >> >>> reside, nor does it know that the file system in fact uses separate
> >> blocks.
> >> >>> HBase stores files, HDFS does the block magic underneath the hood,
> and
> >> >>> transparent to HBase.
> >> >>>
> >> >>>> Also is it correct to say that if i set smaller data block size
> data
> >> >>>> locality gets worse, and if data block size gets bigger  data
> >> locality
> >> >>> gets
> >> >>>> better.
> >> >>>
> >> >>> This is not applicable here, I am assuming this stems from the above
> >> >>> confusion about which system is handling the blocks, HBase or HDFS.
> >> See
> >> >>> above.
> >> >>>
> >> >>> HTH,
> >> >>> Lars
> >>
> >
>
>
>
> On Thu, Jul 19, 2012 at 6:39 AM, Cristofer Weber <
> cristofer.weber@neogrid.com> wrote:
>
> > Hi Alex,
> >
> > I ran one of our bulk import jobs with partial payload, without
> proceeding
> > with major compaction, and you are right: Some hdfs blocks are in a
> > different datanode.
> >
> > -----Mensagem original-----
> > De: Alex Baranau [mailto:alex.baranov.v@gmail.com]
> > Enviada em: quarta-feira, 18 de julho de 2012 12:46
> > Para: hbase-user@hadoop.apache.org; mapreduce-user@hadoop.apache.org;
> > hdfs-user@hadoop.apache.org
> > Assunto: Bulk Import & Data Locality
> >
> > Hello,
> >
> > As far as I understand Bulk Import functionality will not take into
> > account the Data Locality question. MR job will create number of reducer
> > tasks same as regions to write into, but it will not "advice" on which
> > nodes to run these tasks. In that case Reducer task which writes HFiles
> of
> > some region may not be physically located at the same node as RS that
> > serves that region. The way HDFS writes data, there will be (likely) one
> > full replica of bolcks of HFiles of this Region written on the node where
> > Reducer task was run and other replicas (if replication >1) will be
> > distributed randomly over the cluster. Thus, RS while serving data of
> that
> > region will (most
> > likely) not look at local data (data will be transferred from other
> > datanodes). I.e. data locality will be broken.
> >
> > Is this correct?
> >
> > If yes, I guess, if we could tell MR framework where (which nodes) to
> > launch certain Reducer tasks, this would help us. I believe this is not
> > possible with MR1, please correct me if I'm wrong. Perhaps, this is this
> > possible with MR2?
> >
> > I assume there's no way to provide a "hint" to a NameNode where to place
> > blocks of a new File too, right?
> >
> > Thank you,
> > --
> > Alex Baranau
> > ------
> > Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch
> -
> > Solr
> >
>
>
>
> --
>
> *Benjamin Kim*
> *benkimkimben at gmail*
>



-- 
Alex Baranau
------
Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
Solr

Re: Bulk Import & Data Locality

Posted by Ben Kim <be...@gmail.com>.
I added some Q&A's went with Lars. Hope this is somewhat related to your
data locality questions.

 >>>
>> >>> On Jun 15, 2012, at 6:56 AM, Ben Kim wrote:
>> >>>
>> >>>> Hi,
>> >>>>
>> >>>> I've been posting questions in the mailing-list quiet often lately,
>> and
>> >>>> here goes another one about data locality
>> >>>> I read the excellent blog post about data locality that Lars George
>> wrote
>> >>>> at
>> http://www.larsgeorge.com/2010/05/hbase-file-locality-in-hdfs.html
>> >>>>
>> >>>> I understand data locality in hbase as locating a region in a
>> >>> region-server
>> >>>> where most of its data blocks reside.
>> >>>
>> >>> The opposite is happening, i.e. the region server process triggers
>> for all
>> >>> data it writes to be located on the same physical machine.
>> >>>
>> >>>> So that way fast data access is guranteed when running a MR because
>> each
>> >>>> map/reduce task is run for each region in the tasktracker where the
>> >>> region
>> >>>> co-locates.
>> >>>
>> >>> Correct.
>> >>>
>> >>>> But what if the data blocks of the region are evenly spread over
>> multiple
>> >>>> region-servers?
>> >>>
>> >>> This will not happen, unless the original server fails. Then the
>> region is
>> >>> moved to another that now needs to do a lot of remote reads over the
>> >>> network. This is way there is work being done to allow for custom
>> placement
>> >>> policies in HDFS. That way you can store the entire region and all
>> copies
>> >>> as complete units on three data nodes. In case of a failure you can
>> then
>> >>> move the region to one of the two copies. This is not available yet
>> though,
>> >>> but it is being worked on (so I heard).
>> >>>
>> >>>> Does a MR task has to remotely access the data blocks from other
>> >>>> regionservers?
>> >>>
>> >>> For the above failure case, it would be the region server accessing
>> the
>> >>> remote data, yes.
>> >>>
>> >>>> How good is hbase locating datablocks where a region resides?
>> >>>
>> >>> That is again the wrong way around. HBase has no clue as to where
>> blocks
>> >>> reside, nor does it know that the file system in fact uses separate
>> blocks.
>> >>> HBase stores files, HDFS does the block magic underneath the hood, and
>> >>> transparent to HBase.
>> >>>
>> >>>> Also is it correct to say that if i set smaller data block size data
>> >>>> locality gets worse, and if data block size gets bigger  data
>> locality
>> >>> gets
>> >>>> better.
>> >>>
>> >>> This is not applicable here, I am assuming this stems from the above
>> >>> confusion about which system is handling the blocks, HBase or HDFS.
>> See
>> >>> above.
>> >>>
>> >>> HTH,
>> >>> Lars
>>
>



On Thu, Jul 19, 2012 at 6:39 AM, Cristofer Weber <
cristofer.weber@neogrid.com> wrote:

> Hi Alex,
>
> I ran one of our bulk import jobs with partial payload, without proceeding
> with major compaction, and you are right: Some hdfs blocks are in a
> different datanode.
>
> -----Mensagem original-----
> De: Alex Baranau [mailto:alex.baranov.v@gmail.com]
> Enviada em: quarta-feira, 18 de julho de 2012 12:46
> Para: hbase-user@hadoop.apache.org; mapreduce-user@hadoop.apache.org;
> hdfs-user@hadoop.apache.org
> Assunto: Bulk Import & Data Locality
>
> Hello,
>
> As far as I understand Bulk Import functionality will not take into
> account the Data Locality question. MR job will create number of reducer
> tasks same as regions to write into, but it will not "advice" on which
> nodes to run these tasks. In that case Reducer task which writes HFiles of
> some region may not be physically located at the same node as RS that
> serves that region. The way HDFS writes data, there will be (likely) one
> full replica of bolcks of HFiles of this Region written on the node where
> Reducer task was run and other replicas (if replication >1) will be
> distributed randomly over the cluster. Thus, RS while serving data of that
> region will (most
> likely) not look at local data (data will be transferred from other
> datanodes). I.e. data locality will be broken.
>
> Is this correct?
>
> If yes, I guess, if we could tell MR framework where (which nodes) to
> launch certain Reducer tasks, this would help us. I believe this is not
> possible with MR1, please correct me if I'm wrong. Perhaps, this is this
> possible with MR2?
>
> I assume there's no way to provide a "hint" to a NameNode where to place
> blocks of a new File too, right?
>
> Thank you,
> --
> Alex Baranau
> ------
> Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
> Solr
>



-- 

*Benjamin Kim*
*benkimkimben at gmail*

RES: Bulk Import & Data Locality

Posted by Cristofer Weber <cr...@neogrid.com>.
Hi Alex,

I ran one of our bulk import jobs with partial payload, without proceeding with major compaction, and you are right: Some hdfs blocks are in a different datanode.

-----Mensagem original-----
De: Alex Baranau [mailto:alex.baranov.v@gmail.com] 
Enviada em: quarta-feira, 18 de julho de 2012 12:46
Para: hbase-user@hadoop.apache.org; mapreduce-user@hadoop.apache.org; hdfs-user@hadoop.apache.org
Assunto: Bulk Import & Data Locality

Hello,

As far as I understand Bulk Import functionality will not take into account the Data Locality question. MR job will create number of reducer tasks same as regions to write into, but it will not "advice" on which nodes to run these tasks. In that case Reducer task which writes HFiles of some region may not be physically located at the same node as RS that serves that region. The way HDFS writes data, there will be (likely) one full replica of bolcks of HFiles of this Region written on the node where Reducer task was run and other replicas (if replication >1) will be distributed randomly over the cluster. Thus, RS while serving data of that region will (most
likely) not look at local data (data will be transferred from other datanodes). I.e. data locality will be broken.

Is this correct?

If yes, I guess, if we could tell MR framework where (which nodes) to launch certain Reducer tasks, this would help us. I believe this is not possible with MR1, please correct me if I'm wrong. Perhaps, this is this possible with MR2?

I assume there's no way to provide a "hint" to a NameNode where to place blocks of a new File too, right?

Thank you,
--
Alex Baranau
------
Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr

RES: Bulk Import & Data Locality

Posted by Cristofer Weber <cr...@neogrid.com>.
Hi Alex

Here we worked with bulk import creating the HFiles in a MR job and we finish the load calling doBulkLoad method of LoadIncrementalHFiles class (probably the same method used by completebulkload tool) and HFiles generated by reducer tasks are correctly 'adopted' by each corresponding region server because these files got placed in correct directories. 

I never wondered if doBulkLoad is aware of region locations when copying files because our major compaction runs right after bulk load, but what occurs me right now is that it is possible to check block locations using the namenode UI, as region names matches region directories inside your table dir in HDFS. 

Tried it here and in fact they match, but we ran major compaction and for sure hfiles must be collocated with correspondent RS.

Regards,
Cristofer


-----Mensagem original-----
De: Alex Baranau [mailto:alex.baranov.v@gmail.com] 
Enviada em: quarta-feira, 18 de julho de 2012 12:46
Para: hbase-user@hadoop.apache.org; mapreduce-user@hadoop.apache.org; hdfs-user@hadoop.apache.org
Assunto: Bulk Import & Data Locality

Hello,

As far as I understand Bulk Import functionality will not take into account the Data Locality question. MR job will create number of reducer tasks same as regions to write into, but it will not "advice" on which nodes to run these tasks. In that case Reducer task which writes HFiles of some region may not be physically located at the same node as RS that serves that region. The way HDFS writes data, there will be (likely) one full replica of bolcks of HFiles of this Region written on the node where Reducer task was run and other replicas (if replication >1) will be distributed randomly over the cluster. Thus, RS while serving data of that region will (most
likely) not look at local data (data will be transferred from other datanodes). I.e. data locality will be broken.

Is this correct?

If yes, I guess, if we could tell MR framework where (which nodes) to launch certain Reducer tasks, this would help us. I believe this is not possible with MR1, please correct me if I'm wrong. Perhaps, this is this possible with MR2?

I assume there's no way to provide a "hint" to a NameNode where to place blocks of a new File too, right?

Thank you,
--
Alex Baranau
------
Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr