You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Divya Gehlot <di...@gmail.com> on 2016/05/05 10:51:03 UTC

package for data quality in Spark 1.5.2

Hi,

Is there any package or project in Spark/scala which supports Data Quality
check?
For instance checking null values , foreign key constraint

Would really appreciate ,if somebody has already done it and happy to share
or has any open source package .


Thanks,
Divya

Re: package for data quality in Spark 1.5.2

Posted by Mich Talebzadeh <mi...@gmail.com>.
Hi,

Spark is a query tool. It stores data in HDFS or Hive database or anything
else but does not have its own generic database

nulls values and foreign key constraint belong to the domain of databases.
What is exactly the nature of your requirements? Do you want to use Spark
tool to look at the DDL and relationship in the underlying storage layer?

HTH

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 5 May 2016 at 11:51, Divya Gehlot <di...@gmail.com> wrote:

> Hi,
>
> Is there any package or project in Spark/scala which supports Data Quality
> check?
> For instance checking null values , foreign key constraint
>
> Would really appreciate ,if somebody has already done it and happy to
> share or has any open source package .
>
>
> Thanks,
> Divya
>

Re: package for data quality in Spark 1.5.2

Posted by Mich Talebzadeh <mi...@gmail.com>.
ok thanks let me check it.

So your primary storage layer is Hbase with Phoenix as a tool.

Sounds interesting. I will get back to you on this

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 5 May 2016 at 13:26, Divya Gehlot <di...@gmail.com> wrote:

>
> http://blog.cloudera.com/blog/2015/07/how-to-do-data-quality-checks-using-apache-spark-dataframes/
> I am looking for something similar to above solution .
> ---------- Forwarded message ----------
> From: "Divya Gehlot" <di...@gmail.com>
> Date: May 5, 2016 6:51 PM
> Subject: package for data quality in Spark 1.5.2
> To: "user @spark" <us...@spark.apache.org>
> Cc:
>
> Hi,
>
> Is there any package or project in Spark/scala which supports Data Quality
> check?
> For instance checking null values , foreign key constraint
>
> Would really appreciate ,if somebody has already done it and happy to
> share or has any open source package .
>
>
> Thanks,
> Divya
>

Fwd: package for data quality in Spark 1.5.2

Posted by Divya Gehlot <di...@gmail.com>.
http://blog.cloudera.com/blog/2015/07/how-to-do-data-quality-checks-using-apache-spark-dataframes/
I am looking for something similar to above solution .
---------- Forwarded message ----------
From: "Divya Gehlot" <di...@gmail.com>
Date: May 5, 2016 6:51 PM
Subject: package for data quality in Spark 1.5.2
To: "user @spark" <us...@spark.apache.org>
Cc:

Hi,

Is there any package or project in Spark/scala which supports Data Quality
check?
For instance checking null values , foreign key constraint

Would really appreciate ,if somebody has already done it and happy to share
or has any open source package .


Thanks,
Divya