You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@metamodel.apache.org by Ashish Mukherjee <as...@gmail.com> on 2015/06/08 08:18:09 UTC

Enquiry about existing MM adoption

Hello,

I am aware couple of companies are using MM for their production
applications, ,as stated on this list.

For a better understanding of the trends in terms of its use and scale of
operation, I was wondering what are the connectors which people are using
most and what are the typical data sizes being queried through MM etc.
Which data stores do people generally query together or use in a combined
way?

Would any users of MM be willing to share some info related to this?

Thanks,
Ashish

Re: Enquiry about existing MM adoption

Posted by Alberto Rodriguez <ar...@gmail.com>.

Hi Ashish,

I'm working for Stratio, one of the companies that is using Apache MM for a
production application: Datavis. Our application allows the users to build
dashboards to expose the real value of their data fetching it from a wide
range of data stores and using awesome widgets to show the data.

We mainly provide two types of datasources: native and SQL-like. The first
ones are using native drivers to fetch the data (the user should know the
language of the underlying data source, for instance to query our
ElasticSearch native connector the user might know lucene syntax). Using
our MM connectors the users can use their SQL skills to query their data.

We've added another abstraction layer to our application and the user can
combine queries from different data stores to build their widgets.

Kind regards,

Alberto

2015-06-08 8:18 GMT+02:00 Ashish Mukherjee <as...@gmail.com>:

> Hello,
>
> I am aware couple of companies are using MM for their production
> applications, ,as stated on this list.
>
> For a better understanding of the trends in terms of its use and scale of
> operation, I was wondering what are the connectors which people are using
> most and what are the typical data sizes being queried through MM etc.
> Which data stores do people generally query together or use in a combined
> way?
>
> Would any users of MM be willing to share some info related to this?
>
> Thanks,
> Ashish
>

Re: Enquiry about existing MM adoption

Posted by Francisco Javier Cano Bailen <fj...@stratio.com>.

Hello,

I think the previous mail is not able to show the images.

Now, they are attached, are they avalaible?

Besides, I uploaded them:

a) http://oi57.tinypic.com/2r71sb6.jpg
b) http://oi61.tinypic.com/2niv71y.jpg

See you,

2015-06-09 17:21 GMT+02:00 Francisco Javier Cano Bailen <fj...@stratio.com>
:

> Hello Ashish,
>
> Months ago, I performed a very basic load of data on MM - MongoDB
> connector and this was the result:
>
> *query with group by*:
>
>
> *dataContext.query().from(getCollectionName()).select("id").and("name").where("foo").isEquals("bar").groupBy("name")
> .execute();*
>
> a) 5.000.000 records:
>
>
>
>
> 17:25 - 17:26:x Data loading..
> 17:26:x - 17:27:x Query performing..
> 
> b) 10.000.000 records:
>
>
>
>
> 17:34 - 17:38:x Data loading..
> 17:38:x - 17:40:x Query performing..
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~
>
> I modified the *MM mongo connector tests* to load the datasets regarding
> the previous figures.
>
> I hope the provided information will be useful for you.
>
> See you,
> 
>
> 2015-06-09 16:53 GMT+02:00 Ashish Mukherjee <as...@gmail.com>:
>
>> Thank you, Kasper. That's a good insight. Couple of further questions -
>>
>> 1. Any idea of the size of the largest data-sets?
>> 2. Are the deployments all on Cloud or on-premise too?
>>
>> Regards,
>> Ashish
>>
>> On Mon, Jun 8, 2015 at 12:48 PM, Kasper Sørensen <
>> i.am.kasper.sorensen@gmail.com> wrote:
>>
>> > Hi Ashish,
>> >
>> > A bit of information from our side - I represent Human Inference, a data
>> > quality company owned by Neopost. MetaModel was originally founded in
>> our
>> > R&D labs :-)
>> >
>> > We use MetaModel in a bunch of applications, primarily:
>> > DataCleaner - www.datacleaner.org - an open source data quality
>> solution -
>> > with over 15,000 registered users. To my knowledge they use all the
>> > connectors and in data load sizes ranging from tiny to huge.
>> > Data Improver - www.dataimprover.com - a cloud-based contact/mailing
>> data
>> > cleansing street. It's sold primarily in the UK but expanding to US,
>> > Germany, NL and probably more in the future. The sources here are CSV
>> and
>> > Excel files.
>> > HIquality MDM - our Master Data Management hub which is consuming data
>> from
>> > many sources, so the wide array of connectors is a huge value-add there
>> > too.
>> >
>> > One commonality about all three applications is that it is primarily
>> using
>> > MM for batch processing. Typically onboarding a big load of records,
>> doing
>> > some complex processing on them and inserting them then into a cleansed
>> new
>> > datastore. Some of the tools obviously also then does adhoc querying
>> > afterwards, but that's then more in an environment that is more
>> homogenic.
>> >
>> > Best regards,
>> > Kasper
>> >
>> >
>> > 2015-06-08 8:18 GMT+02:00 Ashish Mukherjee <ashish.mukherjee@gmail.com
>> >:
>> >
>> > > Hello,
>> > >
>> > > I am aware couple of companies are using MM for their production
>> > > applications, ,as stated on this list.
>> > >
>> > > For a better understanding of the trends in terms of its use and
>> scale of
>> > > operation, I was wondering what are the connectors which people are
>> using
>> > > most and what are the typical data sizes being queried through MM etc.
>> > > Which data stores do people generally query together or use in a
>> combined
>> > > way?
>> > >
>> > > Would any users of MM be willing to share some info related to this?
>> > >
>> > > Thanks,
>> > > Ashish
>> > >
>> >
>>
>
>
>
> --
>
> Francisco Javier Cano
> Senior Developer
>
>
> <http://www.stratio.com/>
> Vía de las Dos Castillas, 33. Ática 4. 3ª Planta
> 28224 Pozuelo de Alarcón, Madrid
> Tel: +34 91 352 59 42 // *@stratiobd <https://twitter.com/StratioBD>*
>



-- 

Francisco Javier Cano
Senior Developer


<http://www.stratio.com/>
Vía de las Dos Castillas, 33. Ática 4. 3ª Planta
28224 Pozuelo de Alarcón, Madrid
Tel: +34 91 352 59 42 // *@stratiobd <https://twitter.com/StratioBD>*

Re: Enquiry about existing MM adoption

Posted by Francisco Javier Cano Bailen <fj...@stratio.com>.

Hello Ashish,

Months ago, I performed a very basic load of data on MM - MongoDB connector
and this was the result:

*query with group by*:


*dataContext.query().from(getCollectionName()).select("id").and("name").where("foo").isEquals("bar").groupBy("name")
.execute();*

a) 5.000.000 records:




17:25 - 17:26:x Data loading..
17:26:x - 17:27:x Query performing..

b) 10.000.000 records:




17:34 - 17:38:x Data loading..
17:38:x - 17:40:x Query performing..

~~~~~~~~~~~~~~~~~~~~~~~~~

I modified the *MM mongo connector tests* to load the datasets regarding
the previous figures.

I hope the provided information will be useful for you.

See you,


2015-06-09 16:53 GMT+02:00 Ashish Mukherjee <as...@gmail.com>:

> Thank you, Kasper. That's a good insight. Couple of further questions -
>
> 1. Any idea of the size of the largest data-sets?
> 2. Are the deployments all on Cloud or on-premise too?
>
> Regards,
> Ashish
>
> On Mon, Jun 8, 2015 at 12:48 PM, Kasper Sørensen <
> i.am.kasper.sorensen@gmail.com> wrote:
>
> > Hi Ashish,
> >
> > A bit of information from our side - I represent Human Inference, a data
> > quality company owned by Neopost. MetaModel was originally founded in our
> > R&D labs :-)
> >
> > We use MetaModel in a bunch of applications, primarily:
> > DataCleaner - www.datacleaner.org - an open source data quality
> solution -
> > with over 15,000 registered users. To my knowledge they use all the
> > connectors and in data load sizes ranging from tiny to huge.
> > Data Improver - www.dataimprover.com - a cloud-based contact/mailing
> data
> > cleansing street. It's sold primarily in the UK but expanding to US,
> > Germany, NL and probably more in the future. The sources here are CSV and
> > Excel files.
> > HIquality MDM - our Master Data Management hub which is consuming data
> from
> > many sources, so the wide array of connectors is a huge value-add there
> > too.
> >
> > One commonality about all three applications is that it is primarily
> using
> > MM for batch processing. Typically onboarding a big load of records,
> doing
> > some complex processing on them and inserting them then into a cleansed
> new
> > datastore. Some of the tools obviously also then does adhoc querying
> > afterwards, but that's then more in an environment that is more
> homogenic.
> >
> > Best regards,
> > Kasper
> >
> >
> > 2015-06-08 8:18 GMT+02:00 Ashish Mukherjee <as...@gmail.com>:
> >
> > > Hello,
> > >
> > > I am aware couple of companies are using MM for their production
> > > applications, ,as stated on this list.
> > >
> > > For a better understanding of the trends in terms of its use and scale
> of
> > > operation, I was wondering what are the connectors which people are
> using
> > > most and what are the typical data sizes being queried through MM etc.
> > > Which data stores do people generally query together or use in a
> combined
> > > way?
> > >
> > > Would any users of MM be willing to share some info related to this?
> > >
> > > Thanks,
> > > Ashish
> > >
> >
>



-- 

Francisco Javier Cano
Senior Developer


<http://www.stratio.com/>
Vía de las Dos Castillas, 33. Ática 4. 3ª Planta
28224 Pozuelo de Alarcón, Madrid
Tel: +34 91 352 59 42 // *@stratiobd <https://twitter.com/StratioBD>*

Re: Enquiry about existing MM adoption

Posted by Ashish Mukherjee <as...@gmail.com>.

Thank you, Kasper. That's a good insight. Couple of further questions -

1. Any idea of the size of the largest data-sets?
2. Are the deployments all on Cloud or on-premise too?

Regards,
Ashish

On Mon, Jun 8, 2015 at 12:48 PM, Kasper Sørensen <
i.am.kasper.sorensen@gmail.com> wrote:

> Hi Ashish,
>
> A bit of information from our side - I represent Human Inference, a data
> quality company owned by Neopost. MetaModel was originally founded in our
> R&D labs :-)
>
> We use MetaModel in a bunch of applications, primarily:
> DataCleaner - www.datacleaner.org - an open source data quality solution -
> with over 15,000 registered users. To my knowledge they use all the
> connectors and in data load sizes ranging from tiny to huge.
> Data Improver - www.dataimprover.com - a cloud-based contact/mailing data
> cleansing street. It's sold primarily in the UK but expanding to US,
> Germany, NL and probably more in the future. The sources here are CSV and
> Excel files.
> HIquality MDM - our Master Data Management hub which is consuming data from
> many sources, so the wide array of connectors is a huge value-add there
> too.
>
> One commonality about all three applications is that it is primarily using
> MM for batch processing. Typically onboarding a big load of records, doing
> some complex processing on them and inserting them then into a cleansed new
> datastore. Some of the tools obviously also then does adhoc querying
> afterwards, but that's then more in an environment that is more homogenic.
>
> Best regards,
> Kasper
>
>
> 2015-06-08 8:18 GMT+02:00 Ashish Mukherjee <as...@gmail.com>:
>
> > Hello,
> >
> > I am aware couple of companies are using MM for their production
> > applications, ,as stated on this list.
> >
> > For a better understanding of the trends in terms of its use and scale of
> > operation, I was wondering what are the connectors which people are using
> > most and what are the typical data sizes being queried through MM etc.
> > Which data stores do people generally query together or use in a combined
> > way?
> >
> > Would any users of MM be willing to share some info related to this?
> >
> > Thanks,
> > Ashish
> >
>

Re: Enquiry about existing MM adoption

Posted by Kasper Sørensen <i....@gmail.com>.

Hi Ashish,

A bit of information from our side - I represent Human Inference, a data
quality company owned by Neopost. MetaModel was originally founded in our
R&D labs :-)

We use MetaModel in a bunch of applications, primarily:
DataCleaner - www.datacleaner.org - an open source data quality solution -
with over 15,000 registered users. To my knowledge they use all the
connectors and in data load sizes ranging from tiny to huge.
Data Improver - www.dataimprover.com - a cloud-based contact/mailing data
cleansing street. It's sold primarily in the UK but expanding to US,
Germany, NL and probably more in the future. The sources here are CSV and
Excel files.
HIquality MDM - our Master Data Management hub which is consuming data from
many sources, so the wide array of connectors is a huge value-add there too.

One commonality about all three applications is that it is primarily using
MM for batch processing. Typically onboarding a big load of records, doing
some complex processing on them and inserting them then into a cleansed new
datastore. Some of the tools obviously also then does adhoc querying
afterwards, but that's then more in an environment that is more homogenic.

Best regards,
Kasper


2015-06-08 8:18 GMT+02:00 Ashish Mukherjee <as...@gmail.com>:

> Hello,
>
> I am aware couple of companies are using MM for their production
> applications, ,as stated on this list.
>
> For a better understanding of the trends in terms of its use and scale of
> operation, I was wondering what are the connectors which people are using
> most and what are the typical data sizes being queried through MM etc.
> Which data stores do people generally query together or use in a combined
> way?
>
> Would any users of MM be willing to share some info related to this?
>
> Thanks,
> Ashish
>