You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nifi.apache.org by Keith Lim <Ke...@ds-iq.com> on 2016/04/09 02:06:51 UTC

Apache Nifi and user group.

I just found out about Apache Nifi and am extremely existed of the possibility of using it for our end to end processing from acquiring external data feeds, ingesting (ETL) and finally landing in Hadoop (HBASE and HDFS) for used to data scientists.

I want to find out how this can be used to support complicated processes for those processes that are not easily configurable using the built-in processors, e.g.

This case is extracting google places,

Breaking down the continental US into small squares, querying each square using radar search of latitude and longitude, if the call returns the max locations meaning possibly there are more locations in this query, the logic will need to drill down further by dividing into 4 smaller radar searches.   This is iterated until all locations in continental US is covered.

Now, I can build a workflow by stitching various processors and possibly writing some custom code along the way.  Is this how I should proceed?   Or I would need to build a single processor to encapsulate the above.

Any suggestion and pointing me to a user group would be appreciated.

Thanks,
Keith


Re: Apache Nifi and user group.

Posted by Joe Percivall <jo...@yahoo.com.INVALID>.
Hello Keith,

This sounds like a really interesting use-case and I'm glad you reached out. For this use-case, ingesting it into Hadoop will be the easy part while getting all the data will be a bit tougher. As with many use-cases the relatively small details relating to how the workflow is set-up can be very important. So a couple of questions:

What is the API like for Google Places? Are you sending four points and it returns what's inside, is it sent as headers or encoded in the URL etc.
Related, how do you plan on incrementing the search? Is it always an X increase on what ever the last value was?
How do you get the initial small squares? Are they able to be found functionally like the last question or is it a giant list that needs to be ingested?


My initial thinking is you will have an InvokeHttp which will hit each of the squares, fed by whatever way you get the initial squares. Then there will be a Route which will determine if it needs to be broken down. If so it will create 4 flowfiles from the initial query to be routed back to the initial InvokeHttp to be queried. 
 
Also for future reference, there is also a users@apache.nifi.com. 

Looking forward to getting this use-case working,
Joe


- - - - - - Joseph Percivall
linkedin.com/in/Percivall
e: joepercivall@yahoo.com



On Friday, April 8, 2016 8:08 PM, Keith Lim <Ke...@ds-iq.com> wrote:



I just found out about Apache Nifi and am extremely existed of the possibility of using it for our end to end processing from acquiring external data feeds, ingesting (ETL) and finally landing in Hadoop (HBASE and HDFS) for used to data scientists.

I want to find out how this can be used to support complicated processes for those processes that are not easily configurable using the built-in processors, e.g.

This case is extracting google places,

Breaking down the continental US into small squares, querying each square using radar search of latitude and longitude, if the call returns the max locations meaning possibly there are more locations in this query, the logic will need to drill down further by dividing into 4 smaller radar searches.   This is iterated until all locations in continental US is covered.

Now, I can build a workflow by stitching various processors and possibly writing some custom code along the way.  Is this how I should proceed?   Or I would need to build a single processor to encapsulate the above.

Any suggestion and pointing me to a user group would be appreciated.

Thanks,
Keith