You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hadoop.apache.org by 임정택 <ka...@gmail.com> on 2015/02/04 03:44:51 UTC

Can I configure multiple M/Rs and normal processes to one workflow?

Hello all.

We're periodically scan HBase tables to aggregate statistic information,
and store it to MySQL.

We have 3 kinds of CP (kind of data source), each has one Channel and one
Article table.
(Channel : Article is 1:N relation.)

All CPs table schema are different a bit, so in order to aggregate we
should apply different logics, with joining Channel and Article.

I've thought about workflow like this, but I wonder it can make sense.

1. run single process which initializes MySQL by creating table, deleting
row, etc.
2. run 3 M/Rs simultaneously to aggregate statistic information for each
CP, and insert rows  per Channel to MySQL.
3. run single process which finalizes whole aggregation - runs aggregation
query from MySQL to insert new row to MySQL, rolling table, etc.

Definitely 1,2,3 should be run in a row.

Any helps are really appreciated!
Thanks.

Regards.
Jungtaek Lim (HeartSaVioR)

Re: Can I configure multiple M/Rs and normal processes to one workflow?

Posted by Ted Yu <yu...@gmail.com>.

bq. Can Oozie handle this workflow?

I think so.
Better confirm on oozie mailing list.

Cheers

On Wed, Feb 4, 2015 at 2:30 PM, 임정택 <ka...@gmail.com> wrote:

> This cluster is in service for manipulating OLTP (HBase), so I'm finding
> simpler solution which may not required to modify cluster.
>
> Can Oozie handle this workflow?
>
> On 2015년 2월 5일 (목) at 오전 5:03 Ted Yu <yu...@gmail.com> wrote:
>
>> Have you considered using Apache Phoenix ?
>> That way all your data is stored in one place.
>>
>> See http://phoenix.apache.org/
>>
>> Cheers
>>
>> On Tue, Feb 3, 2015 at 6:44 PM, 임정택 <ka...@gmail.com> wrote:
>>
>>> Hello all.
>>>
>>> We're periodically scan HBase tables to aggregate statistic information,
>>> and store it to MySQL.
>>>
>>> We have 3 kinds of CP (kind of data source), each has one Channel and
>>> one Article table.
>>> (Channel : Article is 1:N relation.)
>>>
>>> All CPs table schema are different a bit, so in order to aggregate we
>>> should apply different logics, with joining Channel and Article.
>>>
>>> I've thought about workflow like this, but I wonder it can make sense.
>>>
>>> 1. run single process which initializes MySQL by creating table,
>>> deleting row, etc.
>>> 2. run 3 M/Rs simultaneously to aggregate statistic information for each
>>> CP, and insert rows  per Channel to MySQL.
>>> 3. run single process which finalizes whole aggregation - runs
>>> aggregation query from MySQL to insert new row to MySQL, rolling table, etc.
>>>
>>> Definitely 1,2,3 should be run in a row.
>>>
>>> Any helps are really appreciated!
>>> Thanks.
>>>
>>> Regards.
>>> Jungtaek Lim (HeartSaVioR)
>>>
>>

Re: Can I configure multiple M/Rs and normal processes to one workflow?

Posted by Ted Yu <yu...@gmail.com>.

bq. Can Oozie handle this workflow?

I think so.
Better confirm on oozie mailing list.

Cheers

On Wed, Feb 4, 2015 at 2:30 PM, 임정택 <ka...@gmail.com> wrote:

> This cluster is in service for manipulating OLTP (HBase), so I'm finding
> simpler solution which may not required to modify cluster.
>
> Can Oozie handle this workflow?
>
> On 2015년 2월 5일 (목) at 오전 5:03 Ted Yu <yu...@gmail.com> wrote:
>
>> Have you considered using Apache Phoenix ?
>> That way all your data is stored in one place.
>>
>> See http://phoenix.apache.org/
>>
>> Cheers
>>
>> On Tue, Feb 3, 2015 at 6:44 PM, 임정택 <ka...@gmail.com> wrote:
>>
>>> Hello all.
>>>
>>> We're periodically scan HBase tables to aggregate statistic information,
>>> and store it to MySQL.
>>>
>>> We have 3 kinds of CP (kind of data source), each has one Channel and
>>> one Article table.
>>> (Channel : Article is 1:N relation.)
>>>
>>> All CPs table schema are different a bit, so in order to aggregate we
>>> should apply different logics, with joining Channel and Article.
>>>
>>> I've thought about workflow like this, but I wonder it can make sense.
>>>
>>> 1. run single process which initializes MySQL by creating table,
>>> deleting row, etc.
>>> 2. run 3 M/Rs simultaneously to aggregate statistic information for each
>>> CP, and insert rows  per Channel to MySQL.
>>> 3. run single process which finalizes whole aggregation - runs
>>> aggregation query from MySQL to insert new row to MySQL, rolling table, etc.
>>>
>>> Definitely 1,2,3 should be run in a row.
>>>
>>> Any helps are really appreciated!
>>> Thanks.
>>>
>>> Regards.
>>> Jungtaek Lim (HeartSaVioR)
>>>
>>

Re: Can I configure multiple M/Rs and normal processes to one workflow?

Posted by Ted Yu <yu...@gmail.com>.

bq. Can Oozie handle this workflow?

I think so.
Better confirm on oozie mailing list.

Cheers

On Wed, Feb 4, 2015 at 2:30 PM, 임정택 <ka...@gmail.com> wrote:

> This cluster is in service for manipulating OLTP (HBase), so I'm finding
> simpler solution which may not required to modify cluster.
>
> Can Oozie handle this workflow?
>
> On 2015년 2월 5일 (목) at 오전 5:03 Ted Yu <yu...@gmail.com> wrote:
>
>> Have you considered using Apache Phoenix ?
>> That way all your data is stored in one place.
>>
>> See http://phoenix.apache.org/
>>
>> Cheers
>>
>> On Tue, Feb 3, 2015 at 6:44 PM, 임정택 <ka...@gmail.com> wrote:
>>
>>> Hello all.
>>>
>>> We're periodically scan HBase tables to aggregate statistic information,
>>> and store it to MySQL.
>>>
>>> We have 3 kinds of CP (kind of data source), each has one Channel and
>>> one Article table.
>>> (Channel : Article is 1:N relation.)
>>>
>>> All CPs table schema are different a bit, so in order to aggregate we
>>> should apply different logics, with joining Channel and Article.
>>>
>>> I've thought about workflow like this, but I wonder it can make sense.
>>>
>>> 1. run single process which initializes MySQL by creating table,
>>> deleting row, etc.
>>> 2. run 3 M/Rs simultaneously to aggregate statistic information for each
>>> CP, and insert rows  per Channel to MySQL.
>>> 3. run single process which finalizes whole aggregation - runs
>>> aggregation query from MySQL to insert new row to MySQL, rolling table, etc.
>>>
>>> Definitely 1,2,3 should be run in a row.
>>>
>>> Any helps are really appreciated!
>>> Thanks.
>>>
>>> Regards.
>>> Jungtaek Lim (HeartSaVioR)
>>>
>>

Re: Can I configure multiple M/Rs and normal processes to one workflow?

Posted by Ted Yu <yu...@gmail.com>.

bq. Can Oozie handle this workflow?

I think so.
Better confirm on oozie mailing list.

Cheers

On Wed, Feb 4, 2015 at 2:30 PM, 임정택 <ka...@gmail.com> wrote:

> This cluster is in service for manipulating OLTP (HBase), so I'm finding
> simpler solution which may not required to modify cluster.
>
> Can Oozie handle this workflow?
>
> On 2015년 2월 5일 (목) at 오전 5:03 Ted Yu <yu...@gmail.com> wrote:
>
>> Have you considered using Apache Phoenix ?
>> That way all your data is stored in one place.
>>
>> See http://phoenix.apache.org/
>>
>> Cheers
>>
>> On Tue, Feb 3, 2015 at 6:44 PM, 임정택 <ka...@gmail.com> wrote:
>>
>>> Hello all.
>>>
>>> We're periodically scan HBase tables to aggregate statistic information,
>>> and store it to MySQL.
>>>
>>> We have 3 kinds of CP (kind of data source), each has one Channel and
>>> one Article table.
>>> (Channel : Article is 1:N relation.)
>>>
>>> All CPs table schema are different a bit, so in order to aggregate we
>>> should apply different logics, with joining Channel and Article.
>>>
>>> I've thought about workflow like this, but I wonder it can make sense.
>>>
>>> 1. run single process which initializes MySQL by creating table,
>>> deleting row, etc.
>>> 2. run 3 M/Rs simultaneously to aggregate statistic information for each
>>> CP, and insert rows  per Channel to MySQL.
>>> 3. run single process which finalizes whole aggregation - runs
>>> aggregation query from MySQL to insert new row to MySQL, rolling table, etc.
>>>
>>> Definitely 1,2,3 should be run in a row.
>>>
>>> Any helps are really appreciated!
>>> Thanks.
>>>
>>> Regards.
>>> Jungtaek Lim (HeartSaVioR)
>>>
>>

Re: Can I configure multiple M/Rs and normal processes to one workflow?

Posted by 임정택 <ka...@gmail.com>.

This cluster is in service for manipulating OLTP (HBase), so I'm finding
simpler solution which may not required to modify cluster.

Can Oozie handle this workflow?
On 2015년 2월 5일 (목) at 오전 5:03 Ted Yu <yu...@gmail.com> wrote:

> Have you considered using Apache Phoenix ?
> That way all your data is stored in one place.
>
> See http://phoenix.apache.org/
>
> Cheers
>
> On Tue, Feb 3, 2015 at 6:44 PM, 임정택 <ka...@gmail.com> wrote:
>
>> Hello all.
>>
>> We're periodically scan HBase tables to aggregate statistic information,
>> and store it to MySQL.
>>
>> We have 3 kinds of CP (kind of data source), each has one Channel and one
>> Article table.
>> (Channel : Article is 1:N relation.)
>>
>> All CPs table schema are different a bit, so in order to aggregate we
>> should apply different logics, with joining Channel and Article.
>>
>> I've thought about workflow like this, but I wonder it can make sense.
>>
>> 1. run single process which initializes MySQL by creating table, deleting
>> row, etc.
>> 2. run 3 M/Rs simultaneously to aggregate statistic information for each
>> CP, and insert rows  per Channel to MySQL.
>> 3. run single process which finalizes whole aggregation - runs
>> aggregation query from MySQL to insert new row to MySQL, rolling table, etc.
>>
>> Definitely 1,2,3 should be run in a row.
>>
>> Any helps are really appreciated!
>> Thanks.
>>
>> Regards.
>> Jungtaek Lim (HeartSaVioR)
>>
>

Re: Can I configure multiple M/Rs and normal processes to one workflow?

Posted by 임정택 <ka...@gmail.com>.

This cluster is in service for manipulating OLTP (HBase), so I'm finding
simpler solution which may not required to modify cluster.

Can Oozie handle this workflow?
On 2015년 2월 5일 (목) at 오전 5:03 Ted Yu <yu...@gmail.com> wrote:

> Have you considered using Apache Phoenix ?
> That way all your data is stored in one place.
>
> See http://phoenix.apache.org/
>
> Cheers
>
> On Tue, Feb 3, 2015 at 6:44 PM, 임정택 <ka...@gmail.com> wrote:
>
>> Hello all.
>>
>> We're periodically scan HBase tables to aggregate statistic information,
>> and store it to MySQL.
>>
>> We have 3 kinds of CP (kind of data source), each has one Channel and one
>> Article table.
>> (Channel : Article is 1:N relation.)
>>
>> All CPs table schema are different a bit, so in order to aggregate we
>> should apply different logics, with joining Channel and Article.
>>
>> I've thought about workflow like this, but I wonder it can make sense.
>>
>> 1. run single process which initializes MySQL by creating table, deleting
>> row, etc.
>> 2. run 3 M/Rs simultaneously to aggregate statistic information for each
>> CP, and insert rows  per Channel to MySQL.
>> 3. run single process which finalizes whole aggregation - runs
>> aggregation query from MySQL to insert new row to MySQL, rolling table, etc.
>>
>> Definitely 1,2,3 should be run in a row.
>>
>> Any helps are really appreciated!
>> Thanks.
>>
>> Regards.
>> Jungtaek Lim (HeartSaVioR)
>>
>

Re: Can I configure multiple M/Rs and normal processes to one workflow?

Posted by 임정택 <ka...@gmail.com>.

This cluster is in service for manipulating OLTP (HBase), so I'm finding
simpler solution which may not required to modify cluster.

Can Oozie handle this workflow?
On 2015년 2월 5일 (목) at 오전 5:03 Ted Yu <yu...@gmail.com> wrote:

> Have you considered using Apache Phoenix ?
> That way all your data is stored in one place.
>
> See http://phoenix.apache.org/
>
> Cheers
>
> On Tue, Feb 3, 2015 at 6:44 PM, 임정택 <ka...@gmail.com> wrote:
>
>> Hello all.
>>
>> We're periodically scan HBase tables to aggregate statistic information,
>> and store it to MySQL.
>>
>> We have 3 kinds of CP (kind of data source), each has one Channel and one
>> Article table.
>> (Channel : Article is 1:N relation.)
>>
>> All CPs table schema are different a bit, so in order to aggregate we
>> should apply different logics, with joining Channel and Article.
>>
>> I've thought about workflow like this, but I wonder it can make sense.
>>
>> 1. run single process which initializes MySQL by creating table, deleting
>> row, etc.
>> 2. run 3 M/Rs simultaneously to aggregate statistic information for each
>> CP, and insert rows  per Channel to MySQL.
>> 3. run single process which finalizes whole aggregation - runs
>> aggregation query from MySQL to insert new row to MySQL, rolling table, etc.
>>
>> Definitely 1,2,3 should be run in a row.
>>
>> Any helps are really appreciated!
>> Thanks.
>>
>> Regards.
>> Jungtaek Lim (HeartSaVioR)
>>
>

Re: Can I configure multiple M/Rs and normal processes to one workflow?

Posted by 임정택 <ka...@gmail.com>.

This cluster is in service for manipulating OLTP (HBase), so I'm finding
simpler solution which may not required to modify cluster.

Can Oozie handle this workflow?
On 2015년 2월 5일 (목) at 오전 5:03 Ted Yu <yu...@gmail.com> wrote:

> Have you considered using Apache Phoenix ?
> That way all your data is stored in one place.
>
> See http://phoenix.apache.org/
>
> Cheers
>
> On Tue, Feb 3, 2015 at 6:44 PM, 임정택 <ka...@gmail.com> wrote:
>
>> Hello all.
>>
>> We're periodically scan HBase tables to aggregate statistic information,
>> and store it to MySQL.
>>
>> We have 3 kinds of CP (kind of data source), each has one Channel and one
>> Article table.
>> (Channel : Article is 1:N relation.)
>>
>> All CPs table schema are different a bit, so in order to aggregate we
>> should apply different logics, with joining Channel and Article.
>>
>> I've thought about workflow like this, but I wonder it can make sense.
>>
>> 1. run single process which initializes MySQL by creating table, deleting
>> row, etc.
>> 2. run 3 M/Rs simultaneously to aggregate statistic information for each
>> CP, and insert rows  per Channel to MySQL.
>> 3. run single process which finalizes whole aggregation - runs
>> aggregation query from MySQL to insert new row to MySQL, rolling table, etc.
>>
>> Definitely 1,2,3 should be run in a row.
>>
>> Any helps are really appreciated!
>> Thanks.
>>
>> Regards.
>> Jungtaek Lim (HeartSaVioR)
>>
>

Re: Can I configure multiple M/Rs and normal processes to one workflow?

Posted by Ted Yu <yu...@gmail.com>.

Have you considered using Apache Phoenix ?
That way all your data is stored in one place.

See http://phoenix.apache.org/

Cheers

On Tue, Feb 3, 2015 at 6:44 PM, 임정택 <ka...@gmail.com> wrote:

> Hello all.
>
> We're periodically scan HBase tables to aggregate statistic information,
> and store it to MySQL.
>
> We have 3 kinds of CP (kind of data source), each has one Channel and one
> Article table.
> (Channel : Article is 1:N relation.)
>
> All CPs table schema are different a bit, so in order to aggregate we
> should apply different logics, with joining Channel and Article.
>
> I've thought about workflow like this, but I wonder it can make sense.
>
> 1. run single process which initializes MySQL by creating table, deleting
> row, etc.
> 2. run 3 M/Rs simultaneously to aggregate statistic information for each
> CP, and insert rows  per Channel to MySQL.
> 3. run single process which finalizes whole aggregation - runs aggregation
> query from MySQL to insert new row to MySQL, rolling table, etc.
>
> Definitely 1,2,3 should be run in a row.
>
> Any helps are really appreciated!
> Thanks.
>
> Regards.
> Jungtaek Lim (HeartSaVioR)
>

Re: Can I configure multiple M/Rs and normal processes to one workflow?

Posted by Ted Yu <yu...@gmail.com>.

Have you considered using Apache Phoenix ?
That way all your data is stored in one place.

See http://phoenix.apache.org/

Cheers

On Tue, Feb 3, 2015 at 6:44 PM, 임정택 <ka...@gmail.com> wrote:

> Hello all.
>
> We're periodically scan HBase tables to aggregate statistic information,
> and store it to MySQL.
>
> We have 3 kinds of CP (kind of data source), each has one Channel and one
> Article table.
> (Channel : Article is 1:N relation.)
>
> All CPs table schema are different a bit, so in order to aggregate we
> should apply different logics, with joining Channel and Article.
>
> I've thought about workflow like this, but I wonder it can make sense.
>
> 1. run single process which initializes MySQL by creating table, deleting
> row, etc.
> 2. run 3 M/Rs simultaneously to aggregate statistic information for each
> CP, and insert rows  per Channel to MySQL.
> 3. run single process which finalizes whole aggregation - runs aggregation
> query from MySQL to insert new row to MySQL, rolling table, etc.
>
> Definitely 1,2,3 should be run in a row.
>
> Any helps are really appreciated!
> Thanks.
>
> Regards.
> Jungtaek Lim (HeartSaVioR)
>

Re: Can I configure multiple M/Rs and normal processes to one workflow?

Posted by Ted Yu <yu...@gmail.com>.

Have you considered using Apache Phoenix ?
That way all your data is stored in one place.

See http://phoenix.apache.org/

Cheers

On Tue, Feb 3, 2015 at 6:44 PM, 임정택 <ka...@gmail.com> wrote:

> Hello all.
>
> We're periodically scan HBase tables to aggregate statistic information,
> and store it to MySQL.
>
> We have 3 kinds of CP (kind of data source), each has one Channel and one
> Article table.
> (Channel : Article is 1:N relation.)
>
> All CPs table schema are different a bit, so in order to aggregate we
> should apply different logics, with joining Channel and Article.
>
> I've thought about workflow like this, but I wonder it can make sense.
>
> 1. run single process which initializes MySQL by creating table, deleting
> row, etc.
> 2. run 3 M/Rs simultaneously to aggregate statistic information for each
> CP, and insert rows  per Channel to MySQL.
> 3. run single process which finalizes whole aggregation - runs aggregation
> query from MySQL to insert new row to MySQL, rolling table, etc.
>
> Definitely 1,2,3 should be run in a row.
>
> Any helps are really appreciated!
> Thanks.
>
> Regards.
> Jungtaek Lim (HeartSaVioR)
>

Re: Can I configure multiple M/Rs and normal processes to one workflow?

Posted by daemeon reiydelle <da...@gmail.com>.

I see this frequently as long runninng output phase to relational db's. So
your experience is reasonable. Sometimes it is possible to partition the
mysequel table, but if you need agreggates over the whole, you are sort of
stuck.

(Good luck,  may your business case never require you to run a single long
query ;{)



*.......*






*“Life should not be a journey to the grave with the intention of arriving
safely in apretty and well preserved body, but rather to skid in broadside
in a cloud of smoke,thoroughly used up, totally worn out, and loudly
proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
(+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Wed, Feb 4, 2015 at 2:15 PM, 임정택 <ka...@gmail.com> wrote:

> Yes, it takes more than 10 hours in per CP.
> And we don't have enough resource to run all regions concurrently, it
> needs about one day to complete.
> On 2015년 2월 5일 (목) at 오전 4:51 daemeon reiydelle <da...@gmail.com>
> wrote:
>
>> Null map step (at a guess?), 3 step reduce. No problem. Suspect 3 may be
>> rather long running?
>>
>>
>>
>> *.......*
>>
>>
>>
>>
>>
>>
>> *“Life should not be a journey to the grave with the intention of
>> arriving safely in apretty and well preserved body, but rather to skid in
>> broadside in a cloud of smoke,thoroughly used up, totally worn out, and
>> loudly proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M.
>> ReiydelleUSA (+1) 415.501.0198 <%28%2B1%29%20415.501.0198>London (+44) (0)
>> 20 8144 9872 <%28%2B44%29%20%280%29%2020%208144%209872>*
>>
>> On Tue, Feb 3, 2015 at 6:44 PM, 임정택 <ka...@gmail.com> wrote:
>>
>>> Hello all.
>>>
>>> We're periodically scan HBase tables to aggregate statistic information,
>>> and store it to MySQL.
>>>
>>> We have 3 kinds of CP (kind of data source), each has one Channel and
>>> one Article table.
>>> (Channel : Article is 1:N relation.)
>>>
>>> All CPs table schema are different a bit, so in order to aggregate we
>>> should apply different logics, with joining Channel and Article.
>>>
>>> I've thought about workflow like this, but I wonder it can make sense.
>>>
>>> 1. run single process which initializes MySQL by creating table,
>>> deleting row, etc.
>>> 2. run 3 M/Rs simultaneously to aggregate statistic information for each
>>> CP, and insert rows  per Channel to MySQL.
>>> 3. run single process which finalizes whole aggregation - runs
>>> aggregation query from MySQL to insert new row to MySQL, rolling table, etc.
>>>
>>> Definitely 1,2,3 should be run in a row.
>>>
>>> Any helps are really appreciated!
>>> Thanks.
>>>
>>> Regards.
>>> Jungtaek Lim (HeartSaVioR)
>>>
>>
>>

Re: Can I configure multiple M/Rs and normal processes to one workflow?

Posted by daemeon reiydelle <da...@gmail.com>.

I see this frequently as long runninng output phase to relational db's. So
your experience is reasonable. Sometimes it is possible to partition the
mysequel table, but if you need agreggates over the whole, you are sort of
stuck.

(Good luck,  may your business case never require you to run a single long
query ;{)



*.......*






*“Life should not be a journey to the grave with the intention of arriving
safely in apretty and well preserved body, but rather to skid in broadside
in a cloud of smoke,thoroughly used up, totally worn out, and loudly
proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
(+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Wed, Feb 4, 2015 at 2:15 PM, 임정택 <ka...@gmail.com> wrote:

> Yes, it takes more than 10 hours in per CP.
> And we don't have enough resource to run all regions concurrently, it
> needs about one day to complete.
> On 2015년 2월 5일 (목) at 오전 4:51 daemeon reiydelle <da...@gmail.com>
> wrote:
>
>> Null map step (at a guess?), 3 step reduce. No problem. Suspect 3 may be
>> rather long running?
>>
>>
>>
>> *.......*
>>
>>
>>
>>
>>
>>
>> *“Life should not be a journey to the grave with the intention of
>> arriving safely in apretty and well preserved body, but rather to skid in
>> broadside in a cloud of smoke,thoroughly used up, totally worn out, and
>> loudly proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M.
>> ReiydelleUSA (+1) 415.501.0198 <%28%2B1%29%20415.501.0198>London (+44) (0)
>> 20 8144 9872 <%28%2B44%29%20%280%29%2020%208144%209872>*
>>
>> On Tue, Feb 3, 2015 at 6:44 PM, 임정택 <ka...@gmail.com> wrote:
>>
>>> Hello all.
>>>
>>> We're periodically scan HBase tables to aggregate statistic information,
>>> and store it to MySQL.
>>>
>>> We have 3 kinds of CP (kind of data source), each has one Channel and
>>> one Article table.
>>> (Channel : Article is 1:N relation.)
>>>
>>> All CPs table schema are different a bit, so in order to aggregate we
>>> should apply different logics, with joining Channel and Article.
>>>
>>> I've thought about workflow like this, but I wonder it can make sense.
>>>
>>> 1. run single process which initializes MySQL by creating table,
>>> deleting row, etc.
>>> 2. run 3 M/Rs simultaneously to aggregate statistic information for each
>>> CP, and insert rows  per Channel to MySQL.
>>> 3. run single process which finalizes whole aggregation - runs
>>> aggregation query from MySQL to insert new row to MySQL, rolling table, etc.
>>>
>>> Definitely 1,2,3 should be run in a row.
>>>
>>> Any helps are really appreciated!
>>> Thanks.
>>>
>>> Regards.
>>> Jungtaek Lim (HeartSaVioR)
>>>
>>
>>

Re: Can I configure multiple M/Rs and normal processes to one workflow?

Posted by daemeon reiydelle <da...@gmail.com>.

I see this frequently as long runninng output phase to relational db's. So
your experience is reasonable. Sometimes it is possible to partition the
mysequel table, but if you need agreggates over the whole, you are sort of
stuck.

(Good luck,  may your business case never require you to run a single long
query ;{)



*.......*






*“Life should not be a journey to the grave with the intention of arriving
safely in apretty and well preserved body, but rather to skid in broadside
in a cloud of smoke,thoroughly used up, totally worn out, and loudly
proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
(+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Wed, Feb 4, 2015 at 2:15 PM, 임정택 <ka...@gmail.com> wrote:

> Yes, it takes more than 10 hours in per CP.
> And we don't have enough resource to run all regions concurrently, it
> needs about one day to complete.
> On 2015년 2월 5일 (목) at 오전 4:51 daemeon reiydelle <da...@gmail.com>
> wrote:
>
>> Null map step (at a guess?), 3 step reduce. No problem. Suspect 3 may be
>> rather long running?
>>
>>
>>
>> *.......*
>>
>>
>>
>>
>>
>>
>> *“Life should not be a journey to the grave with the intention of
>> arriving safely in apretty and well preserved body, but rather to skid in
>> broadside in a cloud of smoke,thoroughly used up, totally worn out, and
>> loudly proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M.
>> ReiydelleUSA (+1) 415.501.0198 <%28%2B1%29%20415.501.0198>London (+44) (0)
>> 20 8144 9872 <%28%2B44%29%20%280%29%2020%208144%209872>*
>>
>> On Tue, Feb 3, 2015 at 6:44 PM, 임정택 <ka...@gmail.com> wrote:
>>
>>> Hello all.
>>>
>>> We're periodically scan HBase tables to aggregate statistic information,
>>> and store it to MySQL.
>>>
>>> We have 3 kinds of CP (kind of data source), each has one Channel and
>>> one Article table.
>>> (Channel : Article is 1:N relation.)
>>>
>>> All CPs table schema are different a bit, so in order to aggregate we
>>> should apply different logics, with joining Channel and Article.
>>>
>>> I've thought about workflow like this, but I wonder it can make sense.
>>>
>>> 1. run single process which initializes MySQL by creating table,
>>> deleting row, etc.
>>> 2. run 3 M/Rs simultaneously to aggregate statistic information for each
>>> CP, and insert rows  per Channel to MySQL.
>>> 3. run single process which finalizes whole aggregation - runs
>>> aggregation query from MySQL to insert new row to MySQL, rolling table, etc.
>>>
>>> Definitely 1,2,3 should be run in a row.
>>>
>>> Any helps are really appreciated!
>>> Thanks.
>>>
>>> Regards.
>>> Jungtaek Lim (HeartSaVioR)
>>>
>>
>>

Re: Can I configure multiple M/Rs and normal processes to one workflow?

Posted by daemeon reiydelle <da...@gmail.com>.

I see this frequently as long runninng output phase to relational db's. So
your experience is reasonable. Sometimes it is possible to partition the
mysequel table, but if you need agreggates over the whole, you are sort of
stuck.

(Good luck,  may your business case never require you to run a single long
query ;{)



*.......*






*“Life should not be a journey to the grave with the intention of arriving
safely in apretty and well preserved body, but rather to skid in broadside
in a cloud of smoke,thoroughly used up, totally worn out, and loudly
proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
(+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Wed, Feb 4, 2015 at 2:15 PM, 임정택 <ka...@gmail.com> wrote:

> Yes, it takes more than 10 hours in per CP.
> And we don't have enough resource to run all regions concurrently, it
> needs about one day to complete.
> On 2015년 2월 5일 (목) at 오전 4:51 daemeon reiydelle <da...@gmail.com>
> wrote:
>
>> Null map step (at a guess?), 3 step reduce. No problem. Suspect 3 may be
>> rather long running?
>>
>>
>>
>> *.......*
>>
>>
>>
>>
>>
>>
>> *“Life should not be a journey to the grave with the intention of
>> arriving safely in apretty and well preserved body, but rather to skid in
>> broadside in a cloud of smoke,thoroughly used up, totally worn out, and
>> loudly proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M.
>> ReiydelleUSA (+1) 415.501.0198 <%28%2B1%29%20415.501.0198>London (+44) (0)
>> 20 8144 9872 <%28%2B44%29%20%280%29%2020%208144%209872>*
>>
>> On Tue, Feb 3, 2015 at 6:44 PM, 임정택 <ka...@gmail.com> wrote:
>>
>>> Hello all.
>>>
>>> We're periodically scan HBase tables to aggregate statistic information,
>>> and store it to MySQL.
>>>
>>> We have 3 kinds of CP (kind of data source), each has one Channel and
>>> one Article table.
>>> (Channel : Article is 1:N relation.)
>>>
>>> All CPs table schema are different a bit, so in order to aggregate we
>>> should apply different logics, with joining Channel and Article.
>>>
>>> I've thought about workflow like this, but I wonder it can make sense.
>>>
>>> 1. run single process which initializes MySQL by creating table,
>>> deleting row, etc.
>>> 2. run 3 M/Rs simultaneously to aggregate statistic information for each
>>> CP, and insert rows  per Channel to MySQL.
>>> 3. run single process which finalizes whole aggregation - runs
>>> aggregation query from MySQL to insert new row to MySQL, rolling table, etc.
>>>
>>> Definitely 1,2,3 should be run in a row.
>>>
>>> Any helps are really appreciated!
>>> Thanks.
>>>
>>> Regards.
>>> Jungtaek Lim (HeartSaVioR)
>>>
>>
>>

Re: Can I configure multiple M/Rs and normal processes to one workflow?

Posted by 임정택 <ka...@gmail.com>.

Yes, it takes more than 10 hours in per CP.
And we don't have enough resource to run all regions concurrently, it needs
about one day to complete.
On 2015년 2월 5일 (목) at 오전 4:51 daemeon reiydelle <da...@gmail.com> wrote:

> Null map step (at a guess?), 3 step reduce. No problem. Suspect 3 may be
> rather long running?
>
>
>
> *.......*
>
>
>
>
>
>
> *“Life should not be a journey to the grave with the intention of arriving
> safely in apretty and well preserved body, but rather to skid in broadside
> in a cloud of smoke,thoroughly used up, totally worn out, and loudly
> proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
> (+1) 415.501.0198London (+44) (0) 20 8144 9872*
>
> On Tue, Feb 3, 2015 at 6:44 PM, 임정택 <ka...@gmail.com> wrote:
>
>> Hello all.
>>
>> We're periodically scan HBase tables to aggregate statistic information,
>> and store it to MySQL.
>>
>> We have 3 kinds of CP (kind of data source), each has one Channel and one
>> Article table.
>> (Channel : Article is 1:N relation.)
>>
>> All CPs table schema are different a bit, so in order to aggregate we
>> should apply different logics, with joining Channel and Article.
>>
>> I've thought about workflow like this, but I wonder it can make sense.
>>
>> 1. run single process which initializes MySQL by creating table, deleting
>> row, etc.
>> 2. run 3 M/Rs simultaneously to aggregate statistic information for each
>> CP, and insert rows  per Channel to MySQL.
>> 3. run single process which finalizes whole aggregation - runs
>> aggregation query from MySQL to insert new row to MySQL, rolling table, etc.
>>
>> Definitely 1,2,3 should be run in a row.
>>
>> Any helps are really appreciated!
>> Thanks.
>>
>> Regards.
>> Jungtaek Lim (HeartSaVioR)
>>
>
>

Re: Can I configure multiple M/Rs and normal processes to one workflow?

Posted by 임정택 <ka...@gmail.com>.

Yes, it takes more than 10 hours in per CP.
And we don't have enough resource to run all regions concurrently, it needs
about one day to complete.
On 2015년 2월 5일 (목) at 오전 4:51 daemeon reiydelle <da...@gmail.com> wrote:

> Null map step (at a guess?), 3 step reduce. No problem. Suspect 3 may be
> rather long running?
>
>
>
> *.......*
>
>
>
>
>
>
> *“Life should not be a journey to the grave with the intention of arriving
> safely in apretty and well preserved body, but rather to skid in broadside
> in a cloud of smoke,thoroughly used up, totally worn out, and loudly
> proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
> (+1) 415.501.0198London (+44) (0) 20 8144 9872*
>
> On Tue, Feb 3, 2015 at 6:44 PM, 임정택 <ka...@gmail.com> wrote:
>
>> Hello all.
>>
>> We're periodically scan HBase tables to aggregate statistic information,
>> and store it to MySQL.
>>
>> We have 3 kinds of CP (kind of data source), each has one Channel and one
>> Article table.
>> (Channel : Article is 1:N relation.)
>>
>> All CPs table schema are different a bit, so in order to aggregate we
>> should apply different logics, with joining Channel and Article.
>>
>> I've thought about workflow like this, but I wonder it can make sense.
>>
>> 1. run single process which initializes MySQL by creating table, deleting
>> row, etc.
>> 2. run 3 M/Rs simultaneously to aggregate statistic information for each
>> CP, and insert rows  per Channel to MySQL.
>> 3. run single process which finalizes whole aggregation - runs
>> aggregation query from MySQL to insert new row to MySQL, rolling table, etc.
>>
>> Definitely 1,2,3 should be run in a row.
>>
>> Any helps are really appreciated!
>> Thanks.
>>
>> Regards.
>> Jungtaek Lim (HeartSaVioR)
>>
>
>

Re: Can I configure multiple M/Rs and normal processes to one workflow?

Posted by 임정택 <ka...@gmail.com>.

Yes, it takes more than 10 hours in per CP.
And we don't have enough resource to run all regions concurrently, it needs
about one day to complete.
On 2015년 2월 5일 (목) at 오전 4:51 daemeon reiydelle <da...@gmail.com> wrote:

> Null map step (at a guess?), 3 step reduce. No problem. Suspect 3 may be
> rather long running?
>
>
>
> *.......*
>
>
>
>
>
>
> *“Life should not be a journey to the grave with the intention of arriving
> safely in apretty and well preserved body, but rather to skid in broadside
> in a cloud of smoke,thoroughly used up, totally worn out, and loudly
> proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
> (+1) 415.501.0198London (+44) (0) 20 8144 9872*
>
> On Tue, Feb 3, 2015 at 6:44 PM, 임정택 <ka...@gmail.com> wrote:
>
>> Hello all.
>>
>> We're periodically scan HBase tables to aggregate statistic information,
>> and store it to MySQL.
>>
>> We have 3 kinds of CP (kind of data source), each has one Channel and one
>> Article table.
>> (Channel : Article is 1:N relation.)
>>
>> All CPs table schema are different a bit, so in order to aggregate we
>> should apply different logics, with joining Channel and Article.
>>
>> I've thought about workflow like this, but I wonder it can make sense.
>>
>> 1. run single process which initializes MySQL by creating table, deleting
>> row, etc.
>> 2. run 3 M/Rs simultaneously to aggregate statistic information for each
>> CP, and insert rows  per Channel to MySQL.
>> 3. run single process which finalizes whole aggregation - runs
>> aggregation query from MySQL to insert new row to MySQL, rolling table, etc.
>>
>> Definitely 1,2,3 should be run in a row.
>>
>> Any helps are really appreciated!
>> Thanks.
>>
>> Regards.
>> Jungtaek Lim (HeartSaVioR)
>>
>
>

Re: Can I configure multiple M/Rs and normal processes to one workflow?

Posted by 임정택 <ka...@gmail.com>.

Yes, it takes more than 10 hours in per CP.
And we don't have enough resource to run all regions concurrently, it needs
about one day to complete.
On 2015년 2월 5일 (목) at 오전 4:51 daemeon reiydelle <da...@gmail.com> wrote:

> Null map step (at a guess?), 3 step reduce. No problem. Suspect 3 may be
> rather long running?
>
>
>
> *.......*
>
>
>
>
>
>
> *“Life should not be a journey to the grave with the intention of arriving
> safely in apretty and well preserved body, but rather to skid in broadside
> in a cloud of smoke,thoroughly used up, totally worn out, and loudly
> proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
> (+1) 415.501.0198London (+44) (0) 20 8144 9872*
>
> On Tue, Feb 3, 2015 at 6:44 PM, 임정택 <ka...@gmail.com> wrote:
>
>> Hello all.
>>
>> We're periodically scan HBase tables to aggregate statistic information,
>> and store it to MySQL.
>>
>> We have 3 kinds of CP (kind of data source), each has one Channel and one
>> Article table.
>> (Channel : Article is 1:N relation.)
>>
>> All CPs table schema are different a bit, so in order to aggregate we
>> should apply different logics, with joining Channel and Article.
>>
>> I've thought about workflow like this, but I wonder it can make sense.
>>
>> 1. run single process which initializes MySQL by creating table, deleting
>> row, etc.
>> 2. run 3 M/Rs simultaneously to aggregate statistic information for each
>> CP, and insert rows  per Channel to MySQL.
>> 3. run single process which finalizes whole aggregation - runs
>> aggregation query from MySQL to insert new row to MySQL, rolling table, etc.
>>
>> Definitely 1,2,3 should be run in a row.
>>
>> Any helps are really appreciated!
>> Thanks.
>>
>> Regards.
>> Jungtaek Lim (HeartSaVioR)
>>
>
>

Re: Can I configure multiple M/Rs and normal processes to one workflow?

Posted by daemeon reiydelle <da...@gmail.com>.

Null map step (at a guess?), 3 step reduce. No problem. Suspect 3 may be
rather long running?

*.......*

*“Life should not be a journey to the grave with the intention of arriving
safely in apretty and well preserved body, but rather to skid in broadside
in a cloud of smoke,thoroughly used up, totally worn out, and loudly
proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
(+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Tue, Feb 3, 2015 at 6:44 PM, 임정택 <ka...@gmail.com> wrote:

> Hello all.
>
> We're periodically scan HBase tables to aggregate statistic information,
> and store it to MySQL.
>
> We have 3 kinds of CP (kind of data source), each has one Channel and one
> Article table.
> (Channel : Article is 1:N relation.)
>
> All CPs table schema are different a bit, so in order to aggregate we
> should apply different logics, with joining Channel and Article.
>
> I've thought about workflow like this, but I wonder it can make sense.
>
> 1. run single process which initializes MySQL by creating table, deleting
> row, etc.
> 2. run 3 M/Rs simultaneously to aggregate statistic information for each
> CP, and insert rows  per Channel to MySQL.
> 3. run single process which finalizes whole aggregation - runs aggregation
> query from MySQL to insert new row to MySQL, rolling table, etc.
>
> Definitely 1,2,3 should be run in a row.
>
> Any helps are really appreciated!
> Thanks.
>
> Regards.
> Jungtaek Lim (HeartSaVioR)
>

Re: Can I configure multiple M/Rs and normal processes to one workflow?

Posted by Ted Yu <yu...@gmail.com>.

Have you considered using Apache Phoenix ?
That way all your data is stored in one place.

See http://phoenix.apache.org/

Cheers

On Tue, Feb 3, 2015 at 6:44 PM, 임정택 <ka...@gmail.com> wrote:

> Hello all.
>
> We're periodically scan HBase tables to aggregate statistic information,
> and store it to MySQL.
>
> We have 3 kinds of CP (kind of data source), each has one Channel and one
> Article table.
> (Channel : Article is 1:N relation.)
>
> All CPs table schema are different a bit, so in order to aggregate we
> should apply different logics, with joining Channel and Article.
>
> I've thought about workflow like this, but I wonder it can make sense.
>
> 1. run single process which initializes MySQL by creating table, deleting
> row, etc.
> 2. run 3 M/Rs simultaneously to aggregate statistic information for each
> CP, and insert rows  per Channel to MySQL.
> 3. run single process which finalizes whole aggregation - runs aggregation
> query from MySQL to insert new row to MySQL, rolling table, etc.
>
> Definitely 1,2,3 should be run in a row.
>
> Any helps are really appreciated!
> Thanks.
>
> Regards.
> Jungtaek Lim (HeartSaVioR)
>

Re: Can I configure multiple M/Rs and normal processes to one workflow?

Posted by daemeon reiydelle <da...@gmail.com>.

Null map step (at a guess?), 3 step reduce. No problem. Suspect 3 may be
rather long running?

*.......*

*“Life should not be a journey to the grave with the intention of arriving
safely in apretty and well preserved body, but rather to skid in broadside
in a cloud of smoke,thoroughly used up, totally worn out, and loudly
proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
(+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Tue, Feb 3, 2015 at 6:44 PM, 임정택 <ka...@gmail.com> wrote:

> Hello all.
>
> We're periodically scan HBase tables to aggregate statistic information,
> and store it to MySQL.
>
> We have 3 kinds of CP (kind of data source), each has one Channel and one
> Article table.
> (Channel : Article is 1:N relation.)
>
> All CPs table schema are different a bit, so in order to aggregate we
> should apply different logics, with joining Channel and Article.
>
> I've thought about workflow like this, but I wonder it can make sense.
>
> 1. run single process which initializes MySQL by creating table, deleting
> row, etc.
> 2. run 3 M/Rs simultaneously to aggregate statistic information for each
> CP, and insert rows  per Channel to MySQL.
> 3. run single process which finalizes whole aggregation - runs aggregation
> query from MySQL to insert new row to MySQL, rolling table, etc.
>
> Definitely 1,2,3 should be run in a row.
>
> Any helps are really appreciated!
> Thanks.
>
> Regards.
> Jungtaek Lim (HeartSaVioR)
>

Re: Can I configure multiple M/Rs and normal processes to one workflow?

Posted by daemeon reiydelle <da...@gmail.com>.

Null map step (at a guess?), 3 step reduce. No problem. Suspect 3 may be
rather long running?

*.......*

*“Life should not be a journey to the grave with the intention of arriving
safely in apretty and well preserved body, but rather to skid in broadside
in a cloud of smoke,thoroughly used up, totally worn out, and loudly
proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
(+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Tue, Feb 3, 2015 at 6:44 PM, 임정택 <ka...@gmail.com> wrote:

> Hello all.
>
> We're periodically scan HBase tables to aggregate statistic information,
> and store it to MySQL.
>
> We have 3 kinds of CP (kind of data source), each has one Channel and one
> Article table.
> (Channel : Article is 1:N relation.)
>
> All CPs table schema are different a bit, so in order to aggregate we
> should apply different logics, with joining Channel and Article.
>
> I've thought about workflow like this, but I wonder it can make sense.
>
> 1. run single process which initializes MySQL by creating table, deleting
> row, etc.
> 2. run 3 M/Rs simultaneously to aggregate statistic information for each
> CP, and insert rows  per Channel to MySQL.
> 3. run single process which finalizes whole aggregation - runs aggregation
> query from MySQL to insert new row to MySQL, rolling table, etc.
>
> Definitely 1,2,3 should be run in a row.
>
> Any helps are really appreciated!
> Thanks.
>
> Regards.
> Jungtaek Lim (HeartSaVioR)
>

Re: Can I configure multiple M/Rs and normal processes to one workflow?

Posted by daemeon reiydelle <da...@gmail.com>.

Null map step (at a guess?), 3 step reduce. No problem. Suspect 3 may be
rather long running?

*.......*

*“Life should not be a journey to the grave with the intention of arriving
safely in apretty and well preserved body, but rather to skid in broadside
in a cloud of smoke,thoroughly used up, totally worn out, and loudly
proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
(+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Tue, Feb 3, 2015 at 6:44 PM, 임정택 <ka...@gmail.com> wrote:

> Hello all.
>
> We're periodically scan HBase tables to aggregate statistic information,
> and store it to MySQL.
>
> We have 3 kinds of CP (kind of data source), each has one Channel and one
> Article table.
> (Channel : Article is 1:N relation.)
>
> All CPs table schema are different a bit, so in order to aggregate we
> should apply different logics, with joining Channel and Article.
>
> I've thought about workflow like this, but I wonder it can make sense.
>
> 1. run single process which initializes MySQL by creating table, deleting
> row, etc.
> 2. run 3 M/Rs simultaneously to aggregate statistic information for each
> CP, and insert rows  per Channel to MySQL.
> 3. run single process which finalizes whole aggregation - runs aggregation
> query from MySQL to insert new row to MySQL, rolling table, etc.
>
> Definitely 1,2,3 should be run in a row.
>
> Any helps are really appreciated!
> Thanks.
>
> Regards.
> Jungtaek Lim (HeartSaVioR)
>