You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Sathish Kumaran Vairavelu <vs...@gmail.com> on 2015/03/27 12:43:42 UTC

Checking Data Integrity in Spark

Hello,

I want to check if there is any way to check the data integrity of the data
files. The use case is perform data integrity check on large files 100+
columns and reject records (write it another file) that does not meet
criteria's (such as NOT NULL, date format, etc). Since there are lot of
columns/integrity rules we should able to data integrity check through
configurations (like xml, json, etc); Please share your thoughts..


Thanks

Sathish

Re: Checking Data Integrity in Spark

Posted by Arush Kharbanda <ar...@sigmoidanalytics.com>.
Its not possible to configure Spark to do checks based on xmls. You would
need to write jobs to do the validations you need.

On Fri, Mar 27, 2015 at 5:13 PM, Sathish Kumaran Vairavelu <
vsathishkumaran@gmail.com> wrote:

> Hello,
>
> I want to check if there is any way to check the data integrity of the
> data files. The use case is perform data integrity check on large files
> 100+ columns and reject records (write it another file) that does not meet
> criteria's (such as NOT NULL, date format, etc). Since there are lot of
> columns/integrity rules we should able to data integrity check through
> configurations (like xml, json, etc); Please share your thoughts..
>
>
> Thanks
>
> Sathish
>



-- 

[image: Sigmoid Analytics] <http://htmlsig.com/www.sigmoidanalytics.com>

*Arush Kharbanda* || Technical Teamlead

arush@sigmoidanalytics.com || www.sigmoidanalytics.com