You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@nifi.apache.org by Mike Harding <mi...@gmail.com> on 2016/09/30 12:12:27 UTC

Best practice for querying table mid-flow

Hi All,

I have a Nifi data flow that receives flowfiles each containing a JSON
object. As part of the transformation of each flowfile I want to query a
hive table using a property in the flowfile's JSON content to retrieve
additional information that I then want to inject into the flowfile. The
updated flowfile is then passed onto the next processor downstream.

Currently the only way I can think of to do this is to:

1 - Put the Flowfile's JSON object into attributes using EvaluateJsonPath
processor.

2 - Pass the Flowfile to a SelectHiveQL processor which runs the query
(using the property from the attribute) and returns the result.

3 - I then pass this to an ExecuteScript processor where I extract the
query result from the Flowfile content and write out the original JSON
object (stored in the attribute) to a new Flowfile content using the query
result to update properties in the JSON object.

Does this make sense, feels like there must be a simpler way?

Mike

Re: Best practice for querying table mid-flow

Posted by Bryan Bende <bb...@gmail.com>.

Hi Mike,

This seems like the correct approach when using out-of-the-box processors.

You could potentially create a custom processor that performed all three of
those steps into one... take JSON as input and extract a value, query Hive,
merge results and write to the flow file.
Normally I would think that the complexity of querying Hive might not be
worth it, but you would be able to re-use the HiveDBCPConnectionPool
service and just have to run the query.

-Bryan

On Fri, Sep 30, 2016 at 8:12 AM, Mike Harding <mi...@gmail.com>
wrote:

> Hi All,
>
> I have a Nifi data flow that receives flowfiles each containing a JSON
> object. As part of the transformation of each flowfile I want to query a
> hive table using a property in the flowfile's JSON content to retrieve
> additional information that I then want to inject into the flowfile. The
> updated flowfile is then passed onto the next processor downstream.
>
> Currently the only way I can think of to do this is to:
>
> 1 - Put the Flowfile's JSON object into attributes using EvaluateJsonPath
> processor.
>
> 2 - Pass the Flowfile to a SelectHiveQL processor which runs the query
> (using the property from the attribute) and returns the result.
>
> 3 - I then pass this to an ExecuteScript processor where I extract the
> query result from the Flowfile content and write out the original JSON
> object (stored in the attribute) to a new Flowfile content using the query
> result to update properties in the JSON object.
>
> Does this make sense, feels like there must be a simpler way?
>
> Mike
>