You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nifi.apache.org by shashank_kiett <sh...@yahoo.com.INVALID> on 2016/02/23 16:29:54 UTC

Regarding execution of Map Reduce Jobs with Apache NIFI

Hi,

 

I want to configure a map reduce job in Apache NIFI as a processor. The
scenario for which this job  developed  is as below :

 

There are two files: 

 

1.       User_data having tab separated data like              userid
username           movieid                rating

2.       Movie_data having | separated data like
movieid|movie_name

 

Requirement is :

 

                To get movie name and it's aggregated rating in one
resultant file.

 

Used approach for now [Step by step]:

                

1.       Used ExecuteCommandScript processor with using shell script to load
and fetch data from HIVE.

2.       In  shell script I have written SQL queries for loading and
fetching data then output data was written on disk by using putFile
processor. 

 

Please suggest,

 

If I opted right approach [As I think ExecuteSQL processor should be used
for execution of SQL queries on HIVE but I do not know What is DB connection
string for it ]?

 

what is best approach for it?

 

Thanks with regards

Shashank Tiwari