You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@atlas.apache.org by "Vladislav Glinskiy (Jira)" <ji...@apache.org> on 2020/03/02 11:30:00 UTC
[jira] [Commented] (ATLAS-3646) Create new 'spark_ml_model_dataset'
and 'spark_ml_pipeline_dataset' relationship definitions
[ https://issues.apache.org/jira/browse/ATLAS-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049118#comment-17049118 ]
Vladislav Glinskiy commented on ATLAS-3646:
-------------------------------------------
cc [~kabhwan] [~sarath]
> Create new 'spark_ml_model_dataset' and 'spark_ml_pipeline_dataset' relationship definitions
> --------------------------------------------------------------------------------------------
>
> Key: ATLAS-3646
> URL: https://issues.apache.org/jira/browse/ATLAS-3646
> Project: Atlas
> Issue Type: Task
> Reporter: Vladislav Glinskiy
> Priority: Major
> Fix For: 2.1.0, 3.0.0
>
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> Create new 'spark_ml_model_dataset' and 'spark_ml_pipeline_dataset' relationship definitions. This is required in order to integrate Spark Atlas Connector's ML event processor.
> Previously, Spark Atlas Connector used the 'spark_ml_directory' model for the ML model directory and 'spark_ml_model_ml_directory', 'spark_ml_pipeline_ml_directory' relationship definitions. Usage of the 'spark_ml_directory' was reverted in the scope of [https://github.com/hortonworks-spark/spark-atlas-connector/issues/61], [https://github.com/hortonworks-spark/spark-atlas-connector/pull/62] so ML model directory is 'DataSet' entity(i.e. 'hdfs_path', 'fs_path').
> Thus, new relationship definitions must be created, since there is no straightforward way to update existing ones to use 'DataSet' type instead of it's child type 'spark_ml_directory'.
> See:
> * ATLAS-3640
> * [https://github.com/apache/atlas/pull/88#issuecomment-592699723]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)