You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nifi.apache.org by Santosh Pawar <sa...@open-insights.com> on 2019/06/11 05:41:45 UTC

Ingestion for hive to mongodb

Hi Team,

I am ingesting data from hive to mongodb using flow as mentioned below :

SelectHiveQL - > ConvertRecord ->  SplitJson -> PutMongo

Is there any way to ingest data from hive to mongodb using nifi because I
used two processor to push data into mongodb PutMongoRecord and PutMongo
Proccessor but there are limitation with processors given below :

1. PutMongoRecord Processor : upsert mode not avaibalble

2. PutMongo Processor : There is upsert mode but it pushes single object at
a time

Question : Is there any way to insert ArrayJson directly into MongoDB,
because in current flow there is  performance issue which is happening &
becoming a bottleneck as number of files which are getting processed are
now becoming very large since array of json can not be inserted & splitjson
processor is used as overhead to split each array object from these files.

Thanks,
Santosh Pawar

Re: Ingestion for hive to mongodb

Posted by Mark Payne <ma...@hotmail.com>.
Santosh,

The flow that you've outlined there seems reasonable, but it is certainly better if you don't have to split the data up, both in terms of performance as well as in terms of making the flow easier to design. I would imagine that PutMongoRecord missing the Upsert mode is simply an oversight and can be updated. If you're inclined to take a stab at that, that would likely yield you the best results.

Barring that, you may also want to try using SelectHiveQL -> SplitRecord -> PutMongo instead of using ConvertRecord + SplitJson. SplitRecord is capable of reading the data in any incoming format and then writing it out in any format, so it basically handles the job of both Convert and Split, but it does so much more efficiently. So that would certainly make your dataflow more efficient, but I don't know if that would give you the performance gain that you need, given that PutMongo still would be putting individual records.

Thanks
-Mark


> On Jun 11, 2019, at 1:41 AM, Santosh Pawar <sa...@open-insights.com> wrote:
> 
> Hi Team,
> 
> I am ingesting data from hive to mongodb using flow as mentioned below :
> 
> SelectHiveQL - > ConvertRecord ->  SplitJson -> PutMongo
> 
> Is there any way to ingest data from hive to mongodb using nifi because I
> used two processor to push data into mongodb PutMongoRecord and PutMongo
> Proccessor but there are limitation with processors given below :
> 
> 1. PutMongoRecord Processor : upsert mode not avaibalble
> 
> 2. PutMongo Processor : There is upsert mode but it pushes single object at
> a time
> 
> Question : Is there any way to insert ArrayJson directly into MongoDB,
> because in current flow there is  performance issue which is happening &
> becoming a bottleneck as number of files which are getting processed are
> now becoming very large since array of json can not be inserted & splitjson
> processor is used as overhead to split each array object from these files.
> 
> Thanks,
> Santosh Pawar