You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by sudeep mishra <su...@gmail.com> on 2016/01/07 07:56:41 UTC

How to validate records in Hadoop using NiFi?

Hi,

I am pushing some database records into HDFS using Sqoop.

I want to perform some validations on each record in the HDFS data. Which
NiFi processor can I use to split each record (separated by a new line
character) and perform validations?

For validations I want to verify a particular column value for each record
using a SQL query. I can see an ExecuteQuery processor. How can I
dynamically pass query parameters to it. Also is there a way to execute the
queries in bulk rather for each record.

Kindly suggest.

Apprecuate your help.


Thanks & Regards,

Sudeep Shekhar Mishra

Re: How to validate records in Hadoop using NiFi?

Posted by sudeep mishra <su...@gmail.com>.
Thank you Joe.

Sqoop to HDFS data load is outside the NiFi flow. Once the data is pushed
to HDFS then I have to process each record and perform validations.

By Validation i meant that we will be picking a particular column for each
record store in HDFS and the performing a SQL query against another
database.

On Sun, Jan 10, 2016 at 9:17 AM, Joe Witt <jo...@gmail.com> wrote:

> Hello Sudeep,
>
> "Which NiFi processor can I use to split each record (separated by a
> new line character)"
>
>   For this the SplitText processor is rather helpful if you want to
> split each line.  I recommend you do two SplitText processors in a
> chain where one splits on every 1000 lines for example and then the
> next one splits each line.  As long as you have back-pressure setup
> this means you could split arbitrarily larger (in terms of number of
> lines) source files and have good behavior.
>
> ..."and perform validations?"
>
>   Consider if you want to validate each line in a text file and route
> valid lines one way and invalid lines another way.  If this is the
> case then you may be able to avoid using SplitText and simply use
> RouteText instead as it can operate on the original file in a line by
> line manner and perform expression based validation.  This would
> operate in bulk and be quite efficient.
>
> "For validations I want to verify a particular column value for each
> record using a SQL query"
>
>   Our ExecuteSQL processor is designed for executing SQL against a
> JDBC accessible database.  It is not helpful at this point for
> executing queries on line oriented data even if that data were valid
> DML or something.  Interesting idea but not something we support at
> this time.
>
> I'm interested to understand your case more if you don't mind though.
> You mention you're getting data from Sqoop into HDFS.  How is NiFi
> involved in that flow - is it after data lands in HDFS you're pulling
> it into NiFi?
>
> Thanks
> Joe
>
> On Sat, Jan 9, 2016 at 10:32 PM, sudeep mishra <su...@gmail.com>
> wrote:
> > Hi,
> >
> > I am pushing some database records into HDFS using Sqoop.
> >
> > I want to perform some validations on each record in the HDFS data. Which
> > NiFi processor can I use to split each record (separated by a new line
> > character) and perform validations?
> >
> > For validations I want to verify a particular column value for each
> record
> > using a SQL query. I can see an ExecuteQuery processor. How can I
> > dynamically pass query parameters to it. Also is there a way to execute
> the
> > queries in bulk rather for each record.
> >
> > Kindly suggest.
> >
> > Apprecuate your help.
> >
> >
> > Thanks & Regards,
> >
> > Sudeep Shekhar Mishra
> >
> >
> >
> >
> >
> > --
> > Thanks & Regards,
> >
> > Sudeep Shekhar Mishra
> >
> > +91-9167519029
> > sudeepshekharm@gmail.com
>



-- 
Thanks & Regards,

Sudeep Shekhar Mishra

+91-9167519029
sudeepshekharm@gmail.com

Re: How to validate records in Hadoop using NiFi?

Posted by Joe Witt <jo...@gmail.com>.
Hello Sudeep,

"Which NiFi processor can I use to split each record (separated by a
new line character)"

  For this the SplitText processor is rather helpful if you want to
split each line.  I recommend you do two SplitText processors in a
chain where one splits on every 1000 lines for example and then the
next one splits each line.  As long as you have back-pressure setup
this means you could split arbitrarily larger (in terms of number of
lines) source files and have good behavior.

..."and perform validations?"

  Consider if you want to validate each line in a text file and route
valid lines one way and invalid lines another way.  If this is the
case then you may be able to avoid using SplitText and simply use
RouteText instead as it can operate on the original file in a line by
line manner and perform expression based validation.  This would
operate in bulk and be quite efficient.

"For validations I want to verify a particular column value for each
record using a SQL query"

  Our ExecuteSQL processor is designed for executing SQL against a
JDBC accessible database.  It is not helpful at this point for
executing queries on line oriented data even if that data were valid
DML or something.  Interesting idea but not something we support at
this time.

I'm interested to understand your case more if you don't mind though.
You mention you're getting data from Sqoop into HDFS.  How is NiFi
involved in that flow - is it after data lands in HDFS you're pulling
it into NiFi?

Thanks
Joe

On Sat, Jan 9, 2016 at 10:32 PM, sudeep mishra <su...@gmail.com> wrote:
> Hi,
>
> I am pushing some database records into HDFS using Sqoop.
>
> I want to perform some validations on each record in the HDFS data. Which
> NiFi processor can I use to split each record (separated by a new line
> character) and perform validations?
>
> For validations I want to verify a particular column value for each record
> using a SQL query. I can see an ExecuteQuery processor. How can I
> dynamically pass query parameters to it. Also is there a way to execute the
> queries in bulk rather for each record.
>
> Kindly suggest.
>
> Apprecuate your help.
>
>
> Thanks & Regards,
>
> Sudeep Shekhar Mishra
>
>
>
>
>
> --
> Thanks & Regards,
>
> Sudeep Shekhar Mishra
>
> +91-9167519029
> sudeepshekharm@gmail.com

Fwd: How to validate records in Hadoop using NiFi?

Posted by sudeep mishra <su...@gmail.com>.
Hi,

I am pushing some database records into HDFS using Sqoop.

I want to perform some validations on each record in the HDFS data. Which
NiFi processor can I use to split each record (separated by a new line
character) and perform validations?

For validations I want to verify a particular column value for each record
using a SQL query. I can see an ExecuteQuery processor. How can I
dynamically pass query parameters to it. Also is there a way to execute the
queries in bulk rather for each record.

Kindly suggest.

Apprecuate your help.


Thanks & Regards,

Sudeep Shekhar Mishra





-- 
Thanks & Regards,

Sudeep Shekhar Mishra

+91-9167519029
sudeepshekharm@gmail.com