You are viewing a plain text version of this content. The canonical link for it is here.

Posted to general@hadoop.apache.org by Deepak Halale <de...@gmail.com> on 2009/09/13 01:27:59 UTC

Hadoop Map/Reduce and Hive clarification

Hi,
I am new to Hadoop , need some clarifications
a) how to automate executing Map/Reduce jobs and also automating loading
data in Hive, do I need to create  a cron job or is there a better way.

b) I have 2 tables as the source for M/R jobs
1) Order Master and Order detail
OrderMaster has order header columns
(OrderId,CustId,PaymentMethod,DeliveryMethod etc)
OrderDetail has orders' item level information (viz.
OrderId,ItemId,Quantity,SalesPrice,CostPrice,DeliveryAddress, Delivery
State,DeliveryZip,DeliveryCountry)
The relation between Master and Detail is 1 to many and OrderId is the key.

If I generate a tab delimited file from each table, how does Reduce  is
going to aggregate the data from OrderDetail example
If I have to sum the OrderRevenue by Order.


Thanks

Deepak