You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by Mike Harding <mi...@gmail.com> on 2016/09/30 12:12:27 UTC
Best practice for querying table mid-flow
Hi All,
I have a Nifi data flow that receives flowfiles each containing a JSON
object. As part of the transformation of each flowfile I want to query a
hive table using a property in the flowfile's JSON content to retrieve
additional information that I then want to inject into the flowfile. The
updated flowfile is then passed onto the next processor downstream.
Currently the only way I can think of to do this is to:
1 - Put the Flowfile's JSON object into attributes using EvaluateJsonPath
processor.
2 - Pass the Flowfile to a SelectHiveQL processor which runs the query
(using the property from the attribute) and returns the result.
3 - I then pass this to an ExecuteScript processor where I extract the
query result from the Flowfile content and write out the original JSON
object (stored in the attribute) to a new Flowfile content using the query
result to update properties in the JSON object.
Does this make sense, feels like there must be a simpler way?
Mike
Re: Best practice for querying table mid-flow
Posted by Bryan Bende <bb...@gmail.com>.
Hi Mike,
This seems like the correct approach when using out-of-the-box processors.
You could potentially create a custom processor that performed all three of
those steps into one... take JSON as input and extract a value, query Hive,
merge results and write to the flow file.
Normally I would think that the complexity of querying Hive might not be
worth it, but you would be able to re-use the HiveDBCPConnectionPool
service and just have to run the query.
-Bryan
On Fri, Sep 30, 2016 at 8:12 AM, Mike Harding <mi...@gmail.com>
wrote:
> Hi All,
>
> I have a Nifi data flow that receives flowfiles each containing a JSON
> object. As part of the transformation of each flowfile I want to query a
> hive table using a property in the flowfile's JSON content to retrieve
> additional information that I then want to inject into the flowfile. The
> updated flowfile is then passed onto the next processor downstream.
>
> Currently the only way I can think of to do this is to:
>
> 1 - Put the Flowfile's JSON object into attributes using EvaluateJsonPath
> processor.
>
> 2 - Pass the Flowfile to a SelectHiveQL processor which runs the query
> (using the property from the attribute) and returns the result.
>
> 3 - I then pass this to an ExecuteScript processor where I extract the
> query result from the Flowfile content and write out the original JSON
> object (stored in the attribute) to a new Flowfile content using the query
> result to update properties in the JSON object.
>
> Does this make sense, feels like there must be a simpler way?
>
> Mike
>