You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Benjamin Kim <bb...@gmail.com> on 2016/06/24 21:01:47 UTC

Model Quality Tracking

Has anyone implemented a way to track the performance of a data model? We currently have an algorithm to do record linkage and spit out statistics of matches, non-matches, and/or partial matches with reason codes of why we didn’t match accurately. In this way, we will know if something goes wrong down the line. All of this goes into a csv file directories partitioned by datetime with a hive table on top. Then, we can do analytical queries and even charting if need be. All of this is very manual, but I was wondering if there is a package, software, built-in module, etc. that would do this automatically. Since we are using CDH, it would be great if these graphs could be integrated into Cloudera Manager too.

Any advice is welcome.

Thanks,
Ben


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org