You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airavata.apache.org by Suresh Marru <sm...@apache.org> on 2020/12/10 15:27:29 UTC

Metadata Catalog Evaluations - Hackethons

Hi All,

To build on prior discussion on Airavata Data Lake [1], [2], [3], our next big step is to make implementation choices. Looks like Apache Airflow [4]  is a unparalleled choice. If any of you are interested will be happy to provide a detailed breakdown of this evaluation. 

On the contrary, a choice is metadata catalog is tricky given the overwhelming number of competing options and all have their own strengths. Looks like the best way forward is for us to document the capabilities which are important to airavata and do a hackethon exploring each of the choices and settle on one. Magda [5], and Atlas [6] both looked promising but do not natively support multi-tenancy. Can we all explore together DataHub [7], Amundsen [8] and Metacat [9]. There are more options, but I listed the ones with wide contribution base. 

Thoughts,

Cheers,
Suresh

[1] - https://markmail.org/thread/cjasb2m5ag6hb7y6 <https://markmail.org/thread/cjasb2m5ag6hb7y6> 
[2] - https://markmail.org/thread/z2arxbby6xxb57pq <https://markmail.org/thread/z2arxbby6xxb57pq> 
[3] - https://github.com/apache/airavata-data-lake <https://github.com/apache/airavata-data-lake>
[4] - https://airflow.apache.org/ <https://airflow.apache.org/> 
[5] - https://magda.io/ <https://magda.io/> 
[6] - https://atlas.apache.org/#/ <https://atlas.apache.org/#/>
[7] - https://github.com/linkedin/datahub <https://github.com/linkedin/datahub> 
[8] -https://github.com/amundsen-io/amundsen <https://github.com/amundsen-io/amundsen> 
[9] - https://github.com/Netflix/metacat <https://github.com/Netflix/metacat>