You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by Flavio Pompermaier <po...@okkam.it> on 2015/04/10 11:55:41 UTC

Hadoop compatibility and HBase bulk loading

Hi guys,

I have a nice question about Hadoop compatibility.
In https://flink.apache.org/news/2014/11/18/hadoop-compatibility.html you
say that you can reuse existing mapreduce programs.
Could it be possible to manage also complex mapreduce programs like HBase
BulkImport that use for example a custom partioner
(org.apache.hadoop.mapreduce.Partitioner)?

In the bulk-import examples the call
HFileOutputFormat2.configureIncrementalLoadMap that sets a series of job
parameters (like partitioner, mapper, reducers, etc) ->
http://pastebin.com/8VXjYAEf.
The full code of it can be seen at
https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat2.java
.

Do you think there's any change to make it run in flink?

Best,
Flavio

Re: Hadoop compatibility and HBase bulk loading

Posted by Fabian Hueske <fh...@gmail.com>.

Looking at my previous mail which mentions changes to API, optimizer, and
runtime code of the DataSet API this would be a major and non-trivial
effort and also require that a committer spends a good amount of time for
this.


2018-01-16 10:07 GMT+01:00 Flavio Pompermaier <po...@okkam.it>:

> Do you think is that complex to support it? I think we can try to
> implement it if someone could give us some support (at least some big
> picture)
>
> On Tue, Jan 16, 2018 at 10:02 AM, Fabian Hueske <fh...@gmail.com> wrote:
>
>> No, I'm not aware of anybody working on extending the Hadoop
>> compatibility support.
>> I'll also have no time to work on this any time soon :-(
>>
>> 2018-01-13 1:34 GMT+01:00 Flavio Pompermaier <po...@okkam.it>:
>>
>>> Any progress on this Fabian? HBase bulk loading is a common task for us
>>> and it's very annoying and uncomfortable to run a separate YARN job to
>>> accomplish it...
>>>
>>> On 10 Apr 2015 12:26, "Flavio Pompermaier" <po...@okkam.it> wrote:
>>>
>>> Great! That will be awesome.
>>> Thank you Fabian
>>>
>>> On Fri, Apr 10, 2015 at 12:14 PM, Fabian Hueske <fh...@gmail.com>
>>> wrote:
>>>
>>>> Hmm, that's a tricky question ;-) I would need to have a closer look.
>>>> But getting custom comparators for sorting and grouping into the Combiner
>>>> is not that trivial because it touches API, Optimizer, and Runtime code.
>>>> However, I did that before for the Reducer and with the recent addition of
>>>> groupCombine the Reducer changes might be just applied to combine.
>>>>
>>>> I'll be gone next week, but if you want to, we can have a closer look
>>>> at the problem after that.
>>>>
>>>> 2015-04-10 12:07 GMT+02:00 Flavio Pompermaier <po...@okkam.it>:
>>>>
>>>>> I think I could also take care of it if somebody can help me and guide
>>>>> me a little bit..
>>>>> How long do you think it will require to complete such a task?
>>>>>
>>>>> On Fri, Apr 10, 2015 at 12:02 PM, Fabian Hueske <fh...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> We had an effort to execute any HadoopMR program by simply specifying
>>>>>> the JobConf and execute it (even embedded in regular Flink programs).
>>>>>> We got quite far but did not complete (counters and custom grouping /
>>>>>> sorting functions for Combiners are missing if I remember correctly).
>>>>>> I don't think that anybody is working on that right now, but it would
>>>>>> definitely be a cool feature.
>>>>>>
>>>>>> 2015-04-10 11:55 GMT+02:00 Flavio Pompermaier <po...@okkam.it>:
>>>>>>
>>>>>>> Hi guys,
>>>>>>>
>>>>>>> I have a nice question about Hadoop compatibility.
>>>>>>> In https://flink.apache.org/news/2014/11/18/hadoop-compatibilit
>>>>>>> y.html you say that you can reuse existing mapreduce programs.
>>>>>>> Could it be possible to manage also complex mapreduce programs like
>>>>>>> HBase BulkImport that use for example a custom partioner
>>>>>>> (org.apache.hadoop.mapreduce.Partitioner)?
>>>>>>>
>>>>>>> In the bulk-import examples the call HFileOutputFormat2.configureIncrementalLoadMap
>>>>>>> that sets a series of job parameters (like partitioner, mapper, reducers,
>>>>>>> etc) -> http://pastebin.com/8VXjYAEf.
>>>>>>> The full code of it can be seen at https://github.com/apache/h
>>>>>>> base/blob/master/hbase-server/src/main/java/org/apache/hadoo
>>>>>>> p/hbase/mapreduce/HFileOutputFormat2.java.
>>>>>>>
>>>>>>> Do you think there's any change to make it run in flink?
>>>>>>>
>>>>>>> Best,
>>>>>>> Flavio
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>
>
>
> --
> Flavio Pompermaier
> Development Department
>
> OKKAM S.r.l.
> Tel. +(39) 0461 041809 <+39%200461%20041809>
>

Re: Hadoop compatibility and HBase bulk loading

Posted by Flavio Pompermaier <po...@okkam.it>.

Do you think is that complex to support it? I think we can try to implement
it if someone could give us some support (at least some big picture)

On Tue, Jan 16, 2018 at 10:02 AM, Fabian Hueske <fh...@gmail.com> wrote:

> No, I'm not aware of anybody working on extending the Hadoop compatibility
> support.
> I'll also have no time to work on this any time soon :-(
>
> 2018-01-13 1:34 GMT+01:00 Flavio Pompermaier <po...@okkam.it>:
>
>> Any progress on this Fabian? HBase bulk loading is a common task for us
>> and it's very annoying and uncomfortable to run a separate YARN job to
>> accomplish it...
>>
>> On 10 Apr 2015 12:26, "Flavio Pompermaier" <po...@okkam.it> wrote:
>>
>> Great! That will be awesome.
>> Thank you Fabian
>>
>> On Fri, Apr 10, 2015 at 12:14 PM, Fabian Hueske <fh...@gmail.com>
>> wrote:
>>
>>> Hmm, that's a tricky question ;-) I would need to have a closer look.
>>> But getting custom comparators for sorting and grouping into the Combiner
>>> is not that trivial because it touches API, Optimizer, and Runtime code.
>>> However, I did that before for the Reducer and with the recent addition of
>>> groupCombine the Reducer changes might be just applied to combine.
>>>
>>> I'll be gone next week, but if you want to, we can have a closer look at
>>> the problem after that.
>>>
>>> 2015-04-10 12:07 GMT+02:00 Flavio Pompermaier <po...@okkam.it>:
>>>
>>>> I think I could also take care of it if somebody can help me and guide
>>>> me a little bit..
>>>> How long do you think it will require to complete such a task?
>>>>
>>>> On Fri, Apr 10, 2015 at 12:02 PM, Fabian Hueske <fh...@gmail.com>
>>>> wrote:
>>>>
>>>>> We had an effort to execute any HadoopMR program by simply specifying
>>>>> the JobConf and execute it (even embedded in regular Flink programs).
>>>>> We got quite far but did not complete (counters and custom grouping /
>>>>> sorting functions for Combiners are missing if I remember correctly).
>>>>> I don't think that anybody is working on that right now, but it would
>>>>> definitely be a cool feature.
>>>>>
>>>>> 2015-04-10 11:55 GMT+02:00 Flavio Pompermaier <po...@okkam.it>:
>>>>>
>>>>>> Hi guys,
>>>>>>
>>>>>> I have a nice question about Hadoop compatibility.
>>>>>> In https://flink.apache.org/news/2014/11/18/hadoop-compatibility.html
>>>>>> you say that you can reuse existing mapreduce programs.
>>>>>> Could it be possible to manage also complex mapreduce programs like
>>>>>> HBase BulkImport that use for example a custom partioner
>>>>>> (org.apache.hadoop.mapreduce.Partitioner)?
>>>>>>
>>>>>> In the bulk-import examples the call HFileOutputFormat2.configureIncrementalLoadMap
>>>>>> that sets a series of job parameters (like partitioner, mapper, reducers,
>>>>>> etc) -> http://pastebin.com/8VXjYAEf.
>>>>>> The full code of it can be seen at https://github.com/apache/h
>>>>>> base/blob/master/hbase-server/src/main/java/org/apache/hadoo
>>>>>> p/hbase/mapreduce/HFileOutputFormat2.java.
>>>>>>
>>>>>> Do you think there's any change to make it run in flink?
>>>>>>
>>>>>> Best,
>>>>>> Flavio
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>>
>


-- 
Flavio Pompermaier
Development Department

OKKAM S.r.l.
Tel. +(39) 0461 041809 <+39%200461%20041809>

Re: Hadoop compatibility and HBase bulk loading

Posted by Fabian Hueske <fh...@gmail.com>.

No, I'm not aware of anybody working on extending the Hadoop compatibility
support.
I'll also have no time to work on this any time soon :-(

2018-01-13 1:34 GMT+01:00 Flavio Pompermaier <po...@okkam.it>:

> Any progress on this Fabian? HBase bulk loading is a common task for us
> and it's very annoying and uncomfortable to run a separate YARN job to
> accomplish it...
>
> On 10 Apr 2015 12:26, "Flavio Pompermaier" <po...@okkam.it> wrote:
>
> Great! That will be awesome.
> Thank you Fabian
>
> On Fri, Apr 10, 2015 at 12:14 PM, Fabian Hueske <fh...@gmail.com> wrote:
>
>> Hmm, that's a tricky question ;-) I would need to have a closer look. But
>> getting custom comparators for sorting and grouping into the Combiner is
>> not that trivial because it touches API, Optimizer, and Runtime code.
>> However, I did that before for the Reducer and with the recent addition of
>> groupCombine the Reducer changes might be just applied to combine.
>>
>> I'll be gone next week, but if you want to, we can have a closer look at
>> the problem after that.
>>
>> 2015-04-10 12:07 GMT+02:00 Flavio Pompermaier <po...@okkam.it>:
>>
>>> I think I could also take care of it if somebody can help me and guide
>>> me a little bit..
>>> How long do you think it will require to complete such a task?
>>>
>>> On Fri, Apr 10, 2015 at 12:02 PM, Fabian Hueske <fh...@gmail.com>
>>> wrote:
>>>
>>>> We had an effort to execute any HadoopMR program by simply specifying
>>>> the JobConf and execute it (even embedded in regular Flink programs).
>>>> We got quite far but did not complete (counters and custom grouping /
>>>> sorting functions for Combiners are missing if I remember correctly).
>>>> I don't think that anybody is working on that right now, but it would
>>>> definitely be a cool feature.
>>>>
>>>> 2015-04-10 11:55 GMT+02:00 Flavio Pompermaier <po...@okkam.it>:
>>>>
>>>>> Hi guys,
>>>>>
>>>>> I have a nice question about Hadoop compatibility.
>>>>> In https://flink.apache.org/news/2014/11/18/hadoop-compatibility.html
>>>>> you say that you can reuse existing mapreduce programs.
>>>>> Could it be possible to manage also complex mapreduce programs like
>>>>> HBase BulkImport that use for example a custom partioner
>>>>> (org.apache.hadoop.mapreduce.Partitioner)?
>>>>>
>>>>> In the bulk-import examples the call HFileOutputFormat2.configureIncrementalLoadMap
>>>>> that sets a series of job parameters (like partitioner, mapper, reducers,
>>>>> etc) -> http://pastebin.com/8VXjYAEf.
>>>>> The full code of it can be seen at https://github.com/apache/h
>>>>> base/blob/master/hbase-server/src/main/java/org/apache/
>>>>> hadoop/hbase/mapreduce/HFileOutputFormat2.java.
>>>>>
>>>>> Do you think there's any change to make it run in flink?
>>>>>
>>>>> Best,
>>>>> Flavio
>>>>>
>>>>
>>>>
>>>
>>
>
>

Re: Hadoop compatibility and HBase bulk loading

Posted by Flavio Pompermaier <po...@okkam.it>.

Any progress on this Fabian? HBase bulk loading is a common task for us and
it's very annoying and uncomfortable to run a separate YARN job to
accomplish it...

On 10 Apr 2015 12:26, "Flavio Pompermaier" <po...@okkam.it> wrote:

Great! That will be awesome.
Thank you Fabian

On Fri, Apr 10, 2015 at 12:14 PM, Fabian Hueske <fh...@gmail.com> wrote:

> Hmm, that's a tricky question ;-) I would need to have a closer look. But
> getting custom comparators for sorting and grouping into the Combiner is
> not that trivial because it touches API, Optimizer, and Runtime code.
> However, I did that before for the Reducer and with the recent addition of
> groupCombine the Reducer changes might be just applied to combine.
>
> I'll be gone next week, but if you want to, we can have a closer look at
> the problem after that.
>
> 2015-04-10 12:07 GMT+02:00 Flavio Pompermaier <po...@okkam.it>:
>
>> I think I could also take care of it if somebody can help me and guide me
>> a little bit..
>> How long do you think it will require to complete such a task?
>>
>> On Fri, Apr 10, 2015 at 12:02 PM, Fabian Hueske <fh...@gmail.com>
>> wrote:
>>
>>> We had an effort to execute any HadoopMR program by simply specifying
>>> the JobConf and execute it (even embedded in regular Flink programs).
>>> We got quite far but did not complete (counters and custom grouping /
>>> sorting functions for Combiners are missing if I remember correctly).
>>> I don't think that anybody is working on that right now, but it would
>>> definitely be a cool feature.
>>>
>>> 2015-04-10 11:55 GMT+02:00 Flavio Pompermaier <po...@okkam.it>:
>>>
>>>> Hi guys,
>>>>
>>>> I have a nice question about Hadoop compatibility.
>>>> In https://flink.apache.org/news/2014/11/18/hadoop-compatibility.html
>>>> you say that you can reuse existing mapreduce programs.
>>>> Could it be possible to manage also complex mapreduce programs like
>>>> HBase BulkImport that use for example a custom partioner
>>>> (org.apache.hadoop.mapreduce.Partitioner)?
>>>>
>>>> In the bulk-import examples the call HFileOutputFormat2.configureIncrementalLoadMap
>>>> that sets a series of job parameters (like partitioner, mapper, reducers,
>>>> etc) -> http://pastebin.com/8VXjYAEf.
>>>> The full code of it can be seen at https://github.com/apache/
>>>> hbase/blob/master/hbase-server/src/main/java/org/
>>>> apache/hadoop/hbase/mapreduce/HFileOutputFormat2.java.
>>>>
>>>> Do you think there's any change to make it run in flink?
>>>>
>>>> Best,
>>>> Flavio
>>>>
>>>
>>>
>>
>

Re: Hadoop compatibility and HBase bulk loading

Posted by Flavio Pompermaier <po...@okkam.it>.

Great! That will be awesome.
Thank you Fabian

On Fri, Apr 10, 2015 at 12:14 PM, Fabian Hueske <fh...@gmail.com> wrote:

> Hmm, that's a tricky question ;-) I would need to have a closer look. But
> getting custom comparators for sorting and grouping into the Combiner is
> not that trivial because it touches API, Optimizer, and Runtime code.
> However, I did that before for the Reducer and with the recent addition of
> groupCombine the Reducer changes might be just applied to combine.
>
> I'll be gone next week, but if you want to, we can have a closer look at
> the problem after that.
>
> 2015-04-10 12:07 GMT+02:00 Flavio Pompermaier <po...@okkam.it>:
>
>> I think I could also take care of it if somebody can help me and guide me
>> a little bit..
>> How long do you think it will require to complete such a task?
>>
>> On Fri, Apr 10, 2015 at 12:02 PM, Fabian Hueske <fh...@gmail.com>
>> wrote:
>>
>>> We had an effort to execute any HadoopMR program by simply specifying
>>> the JobConf and execute it (even embedded in regular Flink programs).
>>> We got quite far but did not complete (counters and custom grouping /
>>> sorting functions for Combiners are missing if I remember correctly).
>>> I don't think that anybody is working on that right now, but it would
>>> definitely be a cool feature.
>>>
>>> 2015-04-10 11:55 GMT+02:00 Flavio Pompermaier <po...@okkam.it>:
>>>
>>>> Hi guys,
>>>>
>>>> I have a nice question about Hadoop compatibility.
>>>> In https://flink.apache.org/news/2014/11/18/hadoop-compatibility.html
>>>> you say that you can reuse existing mapreduce programs.
>>>> Could it be possible to manage also complex mapreduce programs like
>>>> HBase BulkImport that use for example a custom partioner
>>>> (org.apache.hadoop.mapreduce.Partitioner)?
>>>>
>>>> In the bulk-import examples the call
>>>> HFileOutputFormat2.configureIncrementalLoadMap that sets a series of job
>>>> parameters (like partitioner, mapper, reducers, etc) ->
>>>> http://pastebin.com/8VXjYAEf.
>>>> The full code of it can be seen at
>>>> https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat2.java
>>>> .
>>>>
>>>> Do you think there's any change to make it run in flink?
>>>>
>>>> Best,
>>>> Flavio
>>>>
>>>
>>>
>>
>

Re: Hadoop compatibility and HBase bulk loading

Posted by Fabian Hueske <fh...@gmail.com>.

Hmm, that's a tricky question ;-) I would need to have a closer look. But
getting custom comparators for sorting and grouping into the Combiner is
not that trivial because it touches API, Optimizer, and Runtime code.
However, I did that before for the Reducer and with the recent addition of
groupCombine the Reducer changes might be just applied to combine.

I'll be gone next week, but if you want to, we can have a closer look at
the problem after that.

2015-04-10 12:07 GMT+02:00 Flavio Pompermaier <po...@okkam.it>:

> I think I could also take care of it if somebody can help me and guide me
> a little bit..
> How long do you think it will require to complete such a task?
>
> On Fri, Apr 10, 2015 at 12:02 PM, Fabian Hueske <fh...@gmail.com> wrote:
>
>> We had an effort to execute any HadoopMR program by simply specifying the
>> JobConf and execute it (even embedded in regular Flink programs).
>> We got quite far but did not complete (counters and custom grouping /
>> sorting functions for Combiners are missing if I remember correctly).
>> I don't think that anybody is working on that right now, but it would
>> definitely be a cool feature.
>>
>> 2015-04-10 11:55 GMT+02:00 Flavio Pompermaier <po...@okkam.it>:
>>
>>> Hi guys,
>>>
>>> I have a nice question about Hadoop compatibility.
>>> In https://flink.apache.org/news/2014/11/18/hadoop-compatibility.html
>>> you say that you can reuse existing mapreduce programs.
>>> Could it be possible to manage also complex mapreduce programs like
>>> HBase BulkImport that use for example a custom partioner
>>> (org.apache.hadoop.mapreduce.Partitioner)?
>>>
>>> In the bulk-import examples the call
>>> HFileOutputFormat2.configureIncrementalLoadMap that sets a series of job
>>> parameters (like partitioner, mapper, reducers, etc) ->
>>> http://pastebin.com/8VXjYAEf.
>>> The full code of it can be seen at
>>> https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat2.java
>>> .
>>>
>>> Do you think there's any change to make it run in flink?
>>>
>>> Best,
>>> Flavio
>>>
>>
>>
>

Re: Hadoop compatibility and HBase bulk loading

Posted by Flavio Pompermaier <po...@okkam.it>.

I think I could also take care of it if somebody can help me and guide me a
little bit..
How long do you think it will require to complete such a task?

On Fri, Apr 10, 2015 at 12:02 PM, Fabian Hueske <fh...@gmail.com> wrote:

> We had an effort to execute any HadoopMR program by simply specifying the
> JobConf and execute it (even embedded in regular Flink programs).
> We got quite far but did not complete (counters and custom grouping /
> sorting functions for Combiners are missing if I remember correctly).
> I don't think that anybody is working on that right now, but it would
> definitely be a cool feature.
>
> 2015-04-10 11:55 GMT+02:00 Flavio Pompermaier <po...@okkam.it>:
>
>> Hi guys,
>>
>> I have a nice question about Hadoop compatibility.
>> In https://flink.apache.org/news/2014/11/18/hadoop-compatibility.html
>> you say that you can reuse existing mapreduce programs.
>> Could it be possible to manage also complex mapreduce programs like HBase
>> BulkImport that use for example a custom partioner
>> (org.apache.hadoop.mapreduce.Partitioner)?
>>
>> In the bulk-import examples the call
>> HFileOutputFormat2.configureIncrementalLoadMap that sets a series of job
>> parameters (like partitioner, mapper, reducers, etc) ->
>> http://pastebin.com/8VXjYAEf.
>> The full code of it can be seen at
>> https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat2.java
>> .
>>
>> Do you think there's any change to make it run in flink?
>>
>> Best,
>> Flavio
>>
>
>

Re: Hadoop compatibility and HBase bulk loading

Posted by Fabian Hueske <fh...@gmail.com>.

We had an effort to execute any HadoopMR program by simply specifying the
JobConf and execute it (even embedded in regular Flink programs).
We got quite far but did not complete (counters and custom grouping /
sorting functions for Combiners are missing if I remember correctly).
I don't think that anybody is working on that right now, but it would
definitely be a cool feature.

2015-04-10 11:55 GMT+02:00 Flavio Pompermaier <po...@okkam.it>:

> Hi guys,
>
> I have a nice question about Hadoop compatibility.
> In https://flink.apache.org/news/2014/11/18/hadoop-compatibility.html you
> say that you can reuse existing mapreduce programs.
> Could it be possible to manage also complex mapreduce programs like HBase
> BulkImport that use for example a custom partioner
> (org.apache.hadoop.mapreduce.Partitioner)?
>
> In the bulk-import examples the call
> HFileOutputFormat2.configureIncrementalLoadMap that sets a series of job
> parameters (like partitioner, mapper, reducers, etc) ->
> http://pastebin.com/8VXjYAEf.
> The full code of it can be seen at
> https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat2.java
> .
>
> Do you think there's any change to make it run in flink?
>
> Best,
> Flavio
>