You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@nifi.apache.org by Balaji K Hari <ba...@capgemini.com> on 2016/04/12 16:27:08 UTC

Clarifications/Suggestions on Using NIFI.

Hi Team,

Based on the project requirements, I was looking at different features included in Apache NIFI and found that this would be the good way to interact with the Development team who have developed NIFI and are looking for suggestions/inputs from the User community to improvise the product and also it is a great medium where the users who are using this NIFI would get the valuable inputs from the developers for their requirements.

Need your assistance/inputs on the below requirements and how these can be implemented in NIFI to achieve the solution.

è I have observed that, Event Based Scheduling/Any Trigger Based Scheduling is yet to be included in the latest NIFI product. Any workarounds/alternatives to achieve this?

è Can Spark/Hive Jobs can be scheduled on time basis and also executed through NIFI? If Yes, please suggest how can we do this?

è Can we get the data from multiple tables of Oracle/SQL Server/Teradata and put directly in S3/HDFS and also directly to RedShift/Any database? If Yes, please suggest how can we do this?

è Also can we do the transformations/manipulations on the data while moving it to S3/HDFS from RDBMS databases? If Yes, please suggest how can we do this?

è Can we do the validations and also find the duplicate data/records before you put the data into S3/HDFS. For example, I have moved the data from RDBMS tables into S3 and as part of daily loads, I need to check whether any duplicate records are present in the new load and need to remove those records while data movement itself. Please provide your inputs how can we do this?

è Also can you provide valuable inputs on how can we achieve the workflow execution dependency i.e. For example, I have designed one workflow and based on this 1st workflow execution completion, I need to start the second workflow else need to start another workflow. Can this be achieved in NIFI?

It would be really helpful and appreciated on the above inputs, as you would be the best team who can help the us the solutions/workarounds in using the NIFI product as it is been identified as a good user friendly product for Data Ingestion/movement.

Looking forward for your reply with the requested suggestions and solutions. Thanks in Advance!!!! :):)

Regards,
_______________________________________________________________________
Balaji KNV_Hari
Technical Architect

This message contains information that may be privileged or confidential and is the property of the Capgemini Group. It is intended only for the person to whom it is addressed. If you are not the intended recipient, you are not authorized to read, print, retain, copy, disseminate, distribute, or use this message or any part thereof. If you receive this message in error, please notify the sender immediately and delete all copies of this message.

Re: Clarifications/Suggestions on Using NIFI.

Posted by Joe Witt <jo...@gmail.com>.

Hello Balaji

Thank you for your interest in Apache NiFi and indeed the community is
helpful and can answer a lot of questions.  These questions are pretty
high level suggesting you are still in the early phases of learning
about NiFi.  Please take a look through the linked documentation
below.

NiFi does support event based scheduling for a number of processors.
Whether it is supported for a given processor depends on whether it
makes sense for that processor and whether the developer of it
activated that feature.  If it is available then it can be chosen via
the UI or when configuring the processor via the REST API directly.

NiFi can certainly be configured to pull from multiple tables within a
single or across multiple databases at once.  Consider using
QueryDatabaseTable processor(s) for this.  You can setup controller
services for the necessary database connection pools as well.

Regarding transformations on the data yes there are quite a few
processors to do this as well.

For systems which offer a nice way to initiate a job given some data
and for which NiFi could asynchronously come back later and check on
the results then it might well be a perfectly fine way to manage those
jobs.

Yes NiFi can be used for workflow management but this is an extremely
broad topic area so it is best to be specific about a particular use
case.  In NiFi you're building flows which are essentially state
machines.  Inherently then to have reached a certain state in the flow
graph means you've been through other preceding states already which
covers it at the level you've described.

I'd encourage you to break down your questions/ideas into more focused
items so the community can more meaningful assist you with your
evaluation.

https://nifi.apache.org/docs.html
- Overview
- User Guide
- Walk through some of the processors.
https://cwiki.apache.org/confluence/display/NIFI/Apache+NiFi
https://cwiki.apache.org/confluence/display/NIFI/FAQs

Thanks
Joe

On Tue, Apr 12, 2016 at 10:27 AM, Balaji K Hari
<ba...@capgemini.com> wrote:
> Hi Team,
>
> Based on the project requirements, I was looking at different features included in Apache NIFI and found that this would be the good way to interact with the Development team who have developed NIFI and are looking for suggestions/inputs from the User community to improvise the product and also it is a great medium where the users who are using this NIFI would get the valuable inputs from the developers for their requirements.
>
> Need your assistance/inputs on the below requirements and how these can be implemented in NIFI to achieve the solution.
>
>
> è  I have observed that, Event Based Scheduling/Any Trigger Based Scheduling is yet to be included in the latest NIFI product. Any workarounds/alternatives to achieve this?
>
> è  Can Spark/Hive Jobs can be scheduled on time basis and also executed through NIFI? If Yes, please suggest how can we do this?
>
> è  Can we get the data from multiple tables of Oracle/SQL Server/Teradata and put directly in S3/HDFS and also directly to RedShift/Any database? If Yes, please suggest how can we do this?
>
> è  Also can we do the transformations/manipulations on the data while moving it to S3/HDFS from RDBMS databases? If Yes, please suggest how can we do this?
>
> è  Can we do the validations and also find the duplicate data/records before you put the data into S3/HDFS. For example, I have moved the data from RDBMS tables into S3 and as part of daily loads, I need to check whether any duplicate records are present in the new load and need to remove those records while data movement itself. Please provide your inputs how can we do this?
>
> è  Also can you provide valuable inputs on how can we achieve the workflow execution  dependency i.e. For example, I have designed one workflow and based on this 1st workflow execution completion, I need to start the second workflow else need to start another workflow. Can this be achieved in NIFI?
>
> It would be really helpful and appreciated on the above inputs, as you would be the best team who can help the us the solutions/workarounds in using the NIFI product as it is been identified as a good user friendly product for Data Ingestion/movement.
>
> Looking forward for your reply with the requested suggestions and solutions. Thanks in Advance!!!! :):)
>
> Regards,
> _______________________________________________________________________
> Balaji KNV_Hari
> Technical Architect
>
> This message contains information that may be privileged or confidential and is the property of the Capgemini Group. It is intended only for the person to whom it is addressed. If you are not the intended recipient, you are not authorized to read, print, retain, copy, disseminate, distribute, or use this message or any part thereof. If you receive this message in error, please notify the sender immediately and delete all copies of this message.