You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@nifi.apache.org by Susheel Kumar <su...@gmail.com> on 2016/04/28 16:54:00 UTC

Question on setting up nifi flow

Hi,

After attending meetup in NYC, I am realizing NiFi can be used for the data
flow use case I have.  Can someone please share the steps/processors
necessary for below use case.


   1. Receive JSON on a HTTP REST end point
   2. Parse Http Header and do validation. Return Error code & messages as
   JSON to the response in case of validation failures
   3. Parse request JSON, perform various validations (missing data in
   fields), massages some data, add some data
   4. Check if the request JSON unique ID is present in MongoDB and compare
   timestamp to validate if this is an update request or a new request
   5. If new request, an entry is made in mongo and then JSON files are
   written to output folder for another process to pick up and submit to Solr.
   6. If update request, mongo record is updated and JSON files are written
   to output folder


I understand that something like HandleHttpRequest Processor can be used
for receiving http request and then use PutSolrContentStream for writing to
Solr but not clear on what processors will be used for validation etc.
steps 2 thru 5 above.

Appreciate your input.

Thanks,
Susheel

Re: Question on setting up nifi flow

Posted by Susheel Kumar <su...@gmail.com>.

Thanks Pierre, Simon and Bryan.  Let me take a look and come back with few
more questions

On Thu, Apr 28, 2016 at 11:32 AM, Simon Ball <sb...@hortonworks.com> wrote:

> GetMongo is an ingest only processor, so cannot accept and input flow
> file. It also only has a success relation.
>
> A solution to this would be to use NiFi’s own deduplication.
>
> One Flow would seed values in the distributed cache by using GetMongo to
> pull the ids and PutDistributedMapCache to store them in NiFi’s cache.
>
> The main ingest flow would then use UpdateAttributes to create a
> hash.value that matched the values inserted to the cache ->
> DetectDuplicates -> flow to PutMongo (use the upset property) -success->
> PutSolrContentStream
>
> Simon
>
> On Apr 28, 2016, at 5:19 PM, Pierre Villard <pi...@gmail.com>
> wrote:
>
> Hi Susheel,
>
> 1. HandleHttpRequest
> 2. RouteOnAttribute + HandleHttpResponse in case of errors detected in
> headers
> 3. Depending of what you want, there are a lot of options to handle JSON
> data (EvaluateJsonPath will probably useful)
> 4. GetMongo (I think it will route on success in case there is an entry,
> and to failure if there is no record, but this has to be checked, otherwise
> an addional processor will do the job to check the result of the request).
> 5. & 6. PutMongo + PutFile (if local folder) + PutSolr (if you want to do
> Solr by yourself).
>
> Depending of the details, this could be slightly different, but I think it
> gives a good idea of the minimal set of processor you would need.
>
> HTH,
> Pierre
>
>
> 2016-04-28 16:54 GMT+02:00 Susheel Kumar <su...@gmail.com>:
>
>> Hi,
>>
>> After attending meetup in NYC, I am realizing NiFi can be used for the
>> data flow use case I have.  Can someone please share the steps/processors
>> necessary for below use case.
>>
>>
>>    1. Receive JSON on a HTTP REST end point
>>    2. Parse Http Header and do validation. Return Error code & messages
>>    as JSON to the response in case of validation failures
>>    3. Parse request JSON, perform various validations (missing data in
>>    fields), massages some data, add some data
>>    4. Check if the request JSON unique ID is present in MongoDB and
>>    compare timestamp to validate if this is an update request or a new request
>>    5. If new request, an entry is made in mongo and then JSON files are
>>    written to output folder for another process to pick up and submit to Solr.
>>    6. If update request, mongo record is updated and JSON files are
>>    written to output folder
>>
>>
>> I understand that something like HandleHttpRequest Processor can be used
>> for receiving http request and then use PutSolrContentStream for writing to
>> Solr but not clear on what processors will be used for validation etc.
>> steps 2 thru 5 above.
>>
>> Appreciate your input.
>>
>> Thanks,
>> Susheel
>>
>>
>>
>>
>>
>
>

Re: Question on setting up nifi flow

Posted by Simon Ball <sb...@hortonworks.com>.

GetMongo is an ingest only processor, so cannot accept and input flow file. It also only has a success relation.

A solution to this would be to use NiFi’s own deduplication.

One Flow would seed values in the distributed cache by using GetMongo to pull the ids and PutDistributedMapCache to store them in NiFi’s cache.

The main ingest flow would then use UpdateAttributes to create a hash.value that matched the values inserted to the cache -> DetectDuplicates -> flow to PutMongo (use the upset property) -success-> PutSolrContentStream

Simon

On Apr 28, 2016, at 5:19 PM, Pierre Villard <pi...@gmail.com>> wrote:

Hi Susheel,

1. HandleHttpRequest
2. RouteOnAttribute + HandleHttpResponse in case of errors detected in headers
3. Depending of what you want, there are a lot of options to handle JSON data (EvaluateJsonPath will probably useful)
4. GetMongo (I think it will route on success in case there is an entry, and to failure if there is no record, but this has to be checked, otherwise an addional processor will do the job to check the result of the request).
5. & 6. PutMongo + PutFile (if local folder) + PutSolr (if you want to do Solr by yourself).

Depending of the details, this could be slightly different, but I think it gives a good idea of the minimal set of processor you would need.

HTH,
Pierre

2016-04-28 16:54 GMT+02:00 Susheel Kumar <su...@gmail.com>>:
Hi,

After attending meetup in NYC, I am realizing NiFi can be used for the data flow use case I have. Can someone please share the steps/processors necessary for below use case.

1. Receive JSON on a HTTP REST end point
2. Parse Http Header and do validation. Return Error code & messages as JSON to the response in case of validation failures
3. Parse request JSON, perform various validations (missing data in fields), massages some data, add some data
4. Check if the request JSON unique ID is present in MongoDB and compare timestamp to validate if this is an update request or a new request
5. If new request, an entry is made in mongo and then JSON files are written to output folder for another process to pick up and submit to Solr.
6. If update request, mongo record is updated and JSON files are written to output folder

I understand that something like HandleHttpRequest Processor can be used for receiving http request and then use PutSolrContentStream for writing to Solr but not clear on what processors will be used for validation etc. steps 2 thru 5 above.

Appreciate your input.

Thanks,
Susheel

Re: Question on setting up nifi flow

Posted by Bryan Bende <bb...@gmail.com>.

Hi Susheel,

In addition to what Pierre mentioned, if you are interested in an example
of using HandleHttpRequest/Response, there is a template in this repository:

https://github.com/hortonworks-gallery/nifi-templates

The template is HttpExecuteLsCommand.xml and shows how to build a web
service in NiFi that performs a directory listing.

-Bryan


On Thu, Apr 28, 2016 at 11:19 AM, Pierre Villard <
pierre.villard.fr@gmail.com> wrote:

> Hi Susheel,
>
> 1. HandleHttpRequest
> 2. RouteOnAttribute + HandleHttpResponse in case of errors detected in
> headers
> 3. Depending of what you want, there are a lot of options to handle JSON
> data (EvaluateJsonPath will probably useful)
> 4. GetMongo (I think it will route on success in case there is an entry,
> and to failure if there is no record, but this has to be checked, otherwise
> an addional processor will do the job to check the result of the request).
> 5. & 6. PutMongo + PutFile (if local folder) + PutSolr (if you want to do
> Solr by yourself).
>
> Depending of the details, this could be slightly different, but I think it
> gives a good idea of the minimal set of processor you would need.
>
> HTH,
> Pierre
>
>
> 2016-04-28 16:54 GMT+02:00 Susheel Kumar <su...@gmail.com>:
>
>> Hi,
>>
>> After attending meetup in NYC, I am realizing NiFi can be used for the
>> data flow use case I have.  Can someone please share the steps/processors
>> necessary for below use case.
>>
>>
>>    1. Receive JSON on a HTTP REST end point
>>    2. Parse Http Header and do validation. Return Error code & messages
>>    as JSON to the response in case of validation failures
>>    3. Parse request JSON, perform various validations (missing data in
>>    fields), massages some data, add some data
>>    4. Check if the request JSON unique ID is present in MongoDB and
>>    compare timestamp to validate if this is an update request or a new request
>>    5. If new request, an entry is made in mongo and then JSON files are
>>    written to output folder for another process to pick up and submit to Solr.
>>    6. If update request, mongo record is updated and JSON files are
>>    written to output folder
>>
>>
>> I understand that something like HandleHttpRequest Processor can be used
>> for receiving http request and then use PutSolrContentStream for writing to
>> Solr but not clear on what processors will be used for validation etc.
>> steps 2 thru 5 above.
>>
>> Appreciate your input.
>>
>> Thanks,
>> Susheel
>>
>>
>>
>>
>>
>

Re: Question on setting up nifi flow

Posted by Pierre Villard <pi...@gmail.com>.

Hi Susheel,

1. HandleHttpRequest
2. RouteOnAttribute + HandleHttpResponse in case of errors detected in
headers
3. Depending of what you want, there are a lot of options to handle JSON
data (EvaluateJsonPath will probably useful)
4. GetMongo (I think it will route on success in case there is an entry,
and to failure if there is no record, but this has to be checked, otherwise
an addional processor will do the job to check the result of the request).
5. & 6. PutMongo + PutFile (if local folder) + PutSolr (if you want to do
Solr by yourself).

Depending of the details, this could be slightly different, but I think it
gives a good idea of the minimal set of processor you would need.

HTH,
Pierre


2016-04-28 16:54 GMT+02:00 Susheel Kumar <su...@gmail.com>:

> Hi,
>
> After attending meetup in NYC, I am realizing NiFi can be used for the
> data flow use case I have.  Can someone please share the steps/processors
> necessary for below use case.
>
>
>    1. Receive JSON on a HTTP REST end point
>    2. Parse Http Header and do validation. Return Error code & messages
>    as JSON to the response in case of validation failures
>    3. Parse request JSON, perform various validations (missing data in
>    fields), massages some data, add some data
>    4. Check if the request JSON unique ID is present in MongoDB and
>    compare timestamp to validate if this is an update request or a new request
>    5. If new request, an entry is made in mongo and then JSON files are
>    written to output folder for another process to pick up and submit to Solr.
>    6. If update request, mongo record is updated and JSON files are
>    written to output folder
>
>
> I understand that something like HandleHttpRequest Processor can be used
> for receiving http request and then use PutSolrContentStream for writing to
> Solr but not clear on what processors will be used for validation etc.
> steps 2 thru 5 above.
>
> Appreciate your input.
>
> Thanks,
> Susheel
>
>
>
>
>