You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@atlas.apache.org by "Venkatesh Seetharam (JIRA)" <ji...@apache.org> on 2015/09/16 00:48:46 UTC

[jira] [Commented] (ATLAS-164) DFS addon for Atlas

    [ https://issues.apache.org/jira/browse/ATLAS-164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14746422#comment-14746422 ] 

Venkatesh Seetharam commented on ATLAS-164:
-------------------------------------------

Hi [~rémy], thanks for your contribution, it looks quite comprehensive. I have a few questions based on glancing your patch.

* DfsDataModel - This looks like a 1:1 mapping from how HDFS stores the FS. Can we abstract it to model what we want answered from a governance perspective? Not sure how the indoor info can help but definitely helps to know if its a file or a dir.

* IMHO HDFS Bridge can be quite overwhelming. If there are billions of files, importing file metadata with no other attributes can be wasteful.
Something like what falcon abstracts HDFS artifacts as data sets might be more optimal.

Also, we may need to be nice to NN when listing the entire FS - put a sleep or do it in batches.

* Lineage - its not quite clear as to how lineage is handled in the code, say a set of files or a dir is consumed into a PIG script or a java MR/spark job. How do you propose to handle it? Its not evident that this is thought through.

* I did not see unit tests, did see an integration test.

Thanks!

> DFS addon for Atlas
> -------------------
>
>                 Key: ATLAS-164
>                 URL: https://issues.apache.org/jira/browse/ATLAS-164
>             Project: Atlas
>          Issue Type: New Feature
>    Affects Versions: 0.6-incubating
>            Reporter: Rémy SAISSY
>         Attachments: ATLAS-164.15092015.patch, ATLAS-164.15092015.patch
>
>
> Hi,
> I have wrote an addon for sending DFS metadata into Atlas.
> The patch is attached.
> However, I have a hard time getting the unit tests working properly thus some advices would be welcome.
> Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)