You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Rahul Malviya <ma...@gmail.com> on 2015/05/08 17:16:56 UTC

MapReduce on Sanpshots

Hi All,

I have been MapReduce on HBase for a while now and this has been really successful for me. But I have some MRs jobs in my pipeline which scan every row on hbase. So I was wondering I move this MR job to run on snapshot that will reduce the load on RSs and HBase direct operations mostly get wont be affected by my long running full scan MR job.

Let me know if my understanding is correct and what are possible drawbacks of doing this. I am mostly worried that my job will start talking longer time as compared to MR directly in HBase table. Please provide some details on this.

Thanks,
Rahul

Re: MapReduce on Sanpshots

Posted by rahul malviya <ma...@gmail.com>.
Currently we have only one user accessing all the big data stack -
hdfs/hbase etc ... so security is not a issue for us but thanks for
stressing on this and I will keep this in mind when we move to secured
cluster.

Apart from security there are no other concerns ?

Thanks,
Rahul

On Fri, May 8, 2015 at 11:23 AM, Michael Segel <mi...@hotmail.com>
wrote:

>
> > On May 8, 2015, at 11:04 AM, Ted Yu <yu...@gmail.com> wrote:
> >
> > HBASE-8369
>
>
> "
> WARNING: This feature bypasses HBase-level security completely since the
> files are read from the hdfs directly. The user who is running the scan /
> job has to have read permissions to the data files and snapshot files.
> "
>
> I think that says it all.
> Do you really want to open up your HBase snapshots to anyone?
>
>
> The opinions expressed here are mine, while they may reflect a cognitive
> thought, that is purely accidental.
> Use at your own risk.
> Michael Segel
> michael_segel (AT) hotmail.com
>
>
>
>
>
>

Re: MapReduce on Sanpshots

Posted by Michael Segel <mi...@hotmail.com>.
> On May 8, 2015, at 11:04 AM, Ted Yu <yu...@gmail.com> wrote:
> 
> HBASE-8369


"
WARNING: This feature bypasses HBase-level security completely since the files are read from the hdfs directly. The user who is running the scan / job has to have read permissions to the data files and snapshot files. 
"
 
I think that says it all. 
Do you really want to open up your HBase snapshots to anyone? 


The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com






Re: MapReduce on Sanpshots

Posted by Rahul Malviya <ma...@gmail.com>.
Thanks Ted.

Seems to answer my question.

Rahul

> On May 8, 2015, at 9:04 AM, Ted Yu <yu...@gmail.com> wrote:
> 
> Please see Release Note of HBASE-8369
> 
> Cheers
> 
> On Fri, May 8, 2015 at 8:16 AM, Rahul Malviya <ma...@gmail.com>
> wrote:
> 
>> Hi All,
>> 
>> I have been MapReduce on HBase for a while now and this has been really
>> successful for me. But I have some MRs jobs in my pipeline which scan every
>> row on hbase. So I was wondering I move this MR job to run on snapshot that
>> will reduce the load on RSs and HBase direct operations mostly get wont be
>> affected by my long running full scan MR job.
>> 
>> Let me know if my understanding is correct and what are possible drawbacks
>> of doing this. I am mostly worried that my job will start talking longer
>> time as compared to MR directly in HBase table. Please provide some details
>> on this.
>> 
>> Thanks,
>> Rahul


Re: MapReduce on Sanpshots

Posted by Ted Yu <yu...@gmail.com>.
Please see Release Note of HBASE-8369

Cheers

On Fri, May 8, 2015 at 8:16 AM, Rahul Malviya <ma...@gmail.com>
wrote:

> Hi All,
>
> I have been MapReduce on HBase for a while now and this has been really
> successful for me. But I have some MRs jobs in my pipeline which scan every
> row on hbase. So I was wondering I move this MR job to run on snapshot that
> will reduce the load on RSs and HBase direct operations mostly get wont be
> affected by my long running full scan MR job.
>
> Let me know if my understanding is correct and what are possible drawbacks
> of doing this. I am mostly worried that my job will start talking longer
> time as compared to MR directly in HBase table. Please provide some details
> on this.
>
> Thanks,
> Rahul