You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Divya Gehlot <di...@gmail.com> on 2016/05/05 10:51:03 UTC
package for data quality in Spark 1.5.2
Hi,
Is there any package or project in Spark/scala which supports Data Quality
check?
For instance checking null values , foreign key constraint
Would really appreciate ,if somebody has already done it and happy to share
or has any open source package .
Thanks,
Divya
Re: package for data quality in Spark 1.5.2
Posted by Mich Talebzadeh <mi...@gmail.com>.
Hi,
Spark is a query tool. It stores data in HDFS or Hive database or anything
else but does not have its own generic database
nulls values and foreign key constraint belong to the domain of databases.
What is exactly the nature of your requirements? Do you want to use Spark
tool to look at the DDL and relationship in the underlying storage layer?
HTH
Dr Mich Talebzadeh
LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
On 5 May 2016 at 11:51, Divya Gehlot <di...@gmail.com> wrote:
> Hi,
>
> Is there any package or project in Spark/scala which supports Data Quality
> check?
> For instance checking null values , foreign key constraint
>
> Would really appreciate ,if somebody has already done it and happy to
> share or has any open source package .
>
>
> Thanks,
> Divya
>
Re: package for data quality in Spark 1.5.2
Posted by Mich Talebzadeh <mi...@gmail.com>.
ok thanks let me check it.
So your primary storage layer is Hbase with Phoenix as a tool.
Sounds interesting. I will get back to you on this
Dr Mich Talebzadeh
LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
On 5 May 2016 at 13:26, Divya Gehlot <di...@gmail.com> wrote:
>
> http://blog.cloudera.com/blog/2015/07/how-to-do-data-quality-checks-using-apache-spark-dataframes/
> I am looking for something similar to above solution .
> ---------- Forwarded message ----------
> From: "Divya Gehlot" <di...@gmail.com>
> Date: May 5, 2016 6:51 PM
> Subject: package for data quality in Spark 1.5.2
> To: "user @spark" <us...@spark.apache.org>
> Cc:
>
> Hi,
>
> Is there any package or project in Spark/scala which supports Data Quality
> check?
> For instance checking null values , foreign key constraint
>
> Would really appreciate ,if somebody has already done it and happy to
> share or has any open source package .
>
>
> Thanks,
> Divya
>
Fwd: package for data quality in Spark 1.5.2
Posted by Divya Gehlot <di...@gmail.com>.
http://blog.cloudera.com/blog/2015/07/how-to-do-data-quality-checks-using-apache-spark-dataframes/
I am looking for something similar to above solution .
---------- Forwarded message ----------
From: "Divya Gehlot" <di...@gmail.com>
Date: May 5, 2016 6:51 PM
Subject: package for data quality in Spark 1.5.2
To: "user @spark" <us...@spark.apache.org>
Cc:
Hi,
Is there any package or project in Spark/scala which supports Data Quality
check?
For instance checking null values , foreign key constraint
Would really appreciate ,if somebody has already done it and happy to share
or has any open source package .
Thanks,
Divya