You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by "Shi, Shaofeng" <sh...@ebay.com> on 2015/05/22 11:35:18 UTC

Can TableSnapshotInputFormat support multiple snapshots as the MR input?

Hello,

We have a scenario which need merge multiple Hbase tables into one table periodically; To gain better performance and minimal the impact to HBase server, we are evaluating the method of using TableSnapshotInputFormat (http://www.slideshare.net/enissoz/mapreduce-over-snapshots); But from the API we see it only allows one snapshot as input; Is it possible to change it to allow multiple snapshots?

Thanks in advance for any advise;

Shaofeng Shi
Apache Kylin

Re: Can TableSnapshotInputFormat support multiple snapshots as the MR input?

Posted by Andrew Mains <an...@kontagent.com>.
Hi Shaofeng,

Sorry about the delayed response; I was on vacation last week.

We (Upsight) are actually also on 0.98, and I have a version of the 
patch rebased against 0.98.12, which I'll upload to the JIRA ticket. 
We've had success with running just the patched hbase-server jar with 
our mapreduce jobs (deploying it without touching our server 
installations), so I imagine it should work for you as well (in 
particular if you happen to be already building/maintaining an HBase fork).

Let me know if you run into any issues!

Andrew

On 5/22/15 8:11 PM, Shi, Shaofeng wrote:
> Hi Andrew, this is what we need, thank you! In which version will this
> feature be released? Our hbase is v0.98, is it possible that just patch
> this to get the feature?
>
> On 5/22/15, 6:06 PM, "Andrew Mains"<an...@kontagent.com>  wrote:
>
>> In the latest release, no; however I've filed a ticket here
>> https://issues.apache.org/jira/browse/HBASE-13356  for this feature, and
>> uploaded a patch for review.
>>
>> The patch provides a MultiTableSnapshotInputFormat which can run a list
>> of scans over multiple snapshots. Jobs can be initialized using:
>>
>>   public static void initMultiTableSnapshotMapperJob(Map<String,
>> Collection<Scan>> snapshotScans,
>>       Class<? extends TableMapper> mapper, Class<?> outputKeyClass,
>> Class<?> outputValueClass,
>>        Job job, boolean addDependencyJars, Path tmpRestoreDir) throws
>> IOException {
>>
>>
>> Hope this helps!
>>
>> Andrew
>>
>> On 5/22/15 2:35 AM, Shi, Shaofeng wrote:
>>> Hello,
>>>
>>> We have a scenario which need merge multiple Hbase tables into one
>>> table periodically; To gain better performance and minimal the impact to
>>> HBase server, we are evaluating the method of using
>>> TableSnapshotInputFormat
>>> (http://www.slideshare.net/enissoz/mapreduce-over-snapshots); But from
>>> the API we see it only allows one snapshot as input; Is it possible to
>>> change it to allow multiple snapshots?
>>>
>>> Thanks in advance for any advise;
>>>
>>> Shaofeng Shi
>>> Apache Kylin
>>>


Re: Can TableSnapshotInputFormat support multiple snapshots as the MR input?

Posted by "Shi, Shaofeng" <sh...@ebay.com>.
Hi Andrew, this is what we need, thank you! In which version will this
feature be released? Our hbase is v0.98, is it possible that just patch
this to get the feature?

On 5/22/15, 6:06 PM, "Andrew Mains" <an...@kontagent.com> wrote:

>In the latest release, no; however I've filed a ticket here
>https://issues.apache.org/jira/browse/HBASE-13356 for this feature, and
>uploaded a patch for review.
>
>The patch provides a MultiTableSnapshotInputFormat which can run a list
>of scans over multiple snapshots. Jobs can be initialized using:
>
>  public static void initMultiTableSnapshotMapperJob(Map<String,
>Collection<Scan>> snapshotScans,
>      Class<? extends TableMapper> mapper, Class<?> outputKeyClass,
>Class<?> outputValueClass,
>       Job job, boolean addDependencyJars, Path tmpRestoreDir) throws
>IOException {
>
>
>Hope this helps!
>
>Andrew
>
>On 5/22/15 2:35 AM, Shi, Shaofeng wrote:
>> Hello,
>>
>> We have a scenario which need merge multiple Hbase tables into one
>>table periodically; To gain better performance and minimal the impact to
>>HBase server, we are evaluating the method of using
>>TableSnapshotInputFormat
>>(http://www.slideshare.net/enissoz/mapreduce-over-snapshots); But from
>>the API we see it only allows one snapshot as input; Is it possible to
>>change it to allow multiple snapshots?
>>
>> Thanks in advance for any advise;
>>
>> Shaofeng Shi
>> Apache Kylin
>>
>


Re: Can TableSnapshotInputFormat support multiple snapshots as the MR input?

Posted by Andrew Mains <an...@kontagent.com>.
In the latest release, no; however I've filed a ticket here 
https://issues.apache.org/jira/browse/HBASE-13356 for this feature, and 
uploaded a patch for review.

The patch provides a MultiTableSnapshotInputFormat which can run a list 
of scans over multiple snapshots. Jobs can be initialized using:

  public static void initMultiTableSnapshotMapperJob(Map<String, Collection<Scan>> snapshotScans,
      Class<? extends TableMapper> mapper, Class<?> outputKeyClass, Class<?> outputValueClass,
       Job job, boolean addDependencyJars, Path tmpRestoreDir) throws IOException {


Hope this helps!

Andrew

On 5/22/15 2:35 AM, Shi, Shaofeng wrote:
> Hello,
>
> We have a scenario which need merge multiple Hbase tables into one table periodically; To gain better performance and minimal the impact to HBase server, we are evaluating the method of using TableSnapshotInputFormat (http://www.slideshare.net/enissoz/mapreduce-over-snapshots); But from the API we see it only allows one snapshot as input; Is it possible to change it to allow multiple snapshots?
>
> Thanks in advance for any advise;
>
> Shaofeng Shi
> Apache Kylin
>