You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@atlas.apache.org by "Ziliotto, Federico" <Zi...@kpmg.nl> on 2016/05/10 11:47:43 UTC

Atlas mongoDB integration

Hello everyone,
We would be really interested in using Atlas as a metadata storage, to store additional information about the datasets we are using (mostly to discover what kind of data is present, but also to have data quality, authorization information ecc...).

Since most of our data is stored in mongodb, I wanted to create a mongoBridge similar to the Hive/Sqoop/Storm available in the Atlas codebase. Do you think it is feasible/useful to do so? What would be the implementation challenges of it or the integration problems with the current environment (for instance Ranger would not be able to enforce any security policy related to it without some custom plugin for both ranger and mongo)? Which would be to proper way to implement such a component: as a custom Atlas installation or as a separate project that uses Atlas?
I'm looking for any kind of suggestion or opinion on the topic.
Thanks in advance,
Federico

**********************************************************************



------------------------------------------------------------------------ 
The information in this e-mail (and any attachments) is intended exclusively for the addressee(s). Any use by a party other than the addressee(s) is prohibited. The information may be confidential in nature and fall under a duty of non-disclosure. If you are not the addressee, please notify the sender and delete this e-mail. KPMG cannot guarantee that e-mail communications are secure and error-free and does not accept any liability for damages resulting from the use of e-mail. Our services and other work are carried out under an agreement of instruction (overeenkomst van opdracht) that is subject to the general terms and conditions of the contracting KPMG firm. These general terms and conditions are available on our website (www.kpmg.nl/algemenevoorwaarden) and will be forwarded upon request.
The statutory names of the KPMG-firms and the trade register numbers are also stated on our website, on the same page as the general terms. Agreements with and statements from KPMG are only legally binding upon KPMG if they are confirmed in writing and signed by an authorized person.
------------------------------------------------------------------------ 




Re: Atlas mongoDB integration

Posted by Venkata R Madugundu <ve...@in.ibm.com>.
To begin with, may be you can start small ...

1) Define the metamodel for MongoDb to capture its data modeling notions
like "Document", "Field" etc., (I guess may be one would need a model
pretty close to defining a JSON)

2) Once you are done with #1, investigate if there is a way to infer /
export the mongodb document structures (schema) in some intermediate format
(JSON schema / XSD)

3) Write a translator / interpreter for #2 to convert that into metadata
that can be stored in Atlas. Essentially, read JSON schema, persist as
instances of types defined in #1 by making Atlas REST calls (may be use
AtlasClient)

All of the above enables only an offline push of metadata from Mongo to
Atlas.
For online integration or more real time automated metadata population ...

4) You would need to figure out a way to intercept a hook in MongoDb to
write a callback utility when Document(s) are created / updated / deleted /
renamed  in MongoDb.
May be a trigger like mechanism, I guess would certainly be there in
MongoDb. (Figure out a hook which is publicly exposed and is widely used)

5) I guess after #4, you can reuse work done in steps #1 to #3.

Atlas committers can add more light on integrating closely with Atlas
through AtlasHook.

Thanks
Venkat



From:	"Ziliotto, Federico" <Zi...@kpmg.nl>
To:	"dev@atlas.incubator.apache.org"
            <de...@atlas.incubator.apache.org>
Date:	10/05/16 05:18 PM
Subject:	Atlas mongoDB integration



Hello everyone,
We would be really interested in using Atlas as a metadata storage, to
store additional information about the datasets we are using (mostly to
discover what kind of data is present, but also to have data quality,
authorization information ecc...).

Since most of our data is stored in mongodb, I wanted to create a
mongoBridge similar to the Hive/Sqoop/Storm available in the Atlas
codebase. Do you think it is feasible/useful to do so? What would be the
implementation challenges of it or the integration problems with the
current environment (for instance Ranger would not be able to enforce any
security policy related to it without some custom plugin for both ranger
and mongo)? Which would be to proper way to implement such a component: as
a custom Atlas installation or as a separate project that uses Atlas?
I'm looking for any kind of suggestion or opinion on the topic.
Thanks in advance,
Federico

**********************************************************************



------------------------------------------------------------------------
The information in this e-mail (and any attachments) is intended
exclusively for the addressee(s). Any use by a party other than the
addressee(s) is prohibited. The information may be confidential in nature
and fall under a duty of non-disclosure. If you are not the addressee,
please notify the sender and delete this e-mail. KPMG cannot guarantee that
e-mail communications are secure and error-free and does not accept any
liability for damages resulting from the use of e-mail. Our services and
other work are carried out under an agreement of instruction (overeenkomst
van opdracht) that is subject to the general terms and conditions of the
contracting KPMG firm. These general terms and conditions are available on
our website (www.kpmg.nl/algemenevoorwaarden) and will be forwarded upon
request.
The statutory names of the KPMG-firms and the trade register numbers are
also stated on our website, on the same page as the general terms.
Agreements with and statements from KPMG are only legally binding upon KPMG
if they are confirmed in writing and signed by an authorized person.
------------------------------------------------------------------------