You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by "Henrik Behrens (JIRA)" <ji...@apache.org> on 2013/12/13 14:38:12 UTC

[jira] [Commented] (DRILL-325) Support for MADlib

    [ https://issues.apache.org/jira/browse/DRILL-325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13847481#comment-13847481 ] 

Henrik Behrens commented on DRILL-325:
--------------------------------------

I strongly support this feature for the following reasons:
•	MADlib already supports a wide range of algorithms for machine learning, data mining and statistics (see http://doc.madlib.net/latest/ for details)
•	MADlib is free and open source
•	MADlib is designed to eventually serve a role for scalable database systems that is similar to the CRAN library for R: a community repository of statistical methods, this time written with scale and parallelism in mind
•	MADlib is open for contributions of both new methods, and ports to additional database platforms
•	MADlib is already supported on the Hadoop platform via HAWQ
•	MADlib has already been started to be ported to Impala (http://blog.cloudera.com/blog/2013/10/how-to-use-madlib-pre-built-analytic-functions-with-impala/)
•	MADlib uses SQL and UDFs/UDAs for implementing analytical functions
•	MADlib supports iterative algorithms (in contrast to SQL)
•	MADlib supports templated Queries (the same function can be applied to different tables, in contrast to SQL)
•	MADlib contains additional sophisticated features and abstractions (Macroprogramming, Microprogramming, Abstraction Layer for UDFs, Convex Optimization, Features for Statistical Text Analysis)

For details please read their excellent paper: http://www.eecs.berkeley.edu/Pubs/TechRpts/2012/EECS-2012-38.pdf

I think it is important that no decisions are currently made concerning Drill that would later make it difficult to port MADlib to Drill (e.g. missing support for iterative or templated Queries etc.).


> Support for MADlib
> ------------------
>
>                 Key: DRILL-325
>                 URL: https://issues.apache.org/jira/browse/DRILL-325
>             Project: Apache Drill
>          Issue Type: New Feature
>            Reporter: Michael Hausenblas
>
> It should be possible to use MADlib (http://doc.madlib.net/latest/) with Drill.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)