You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kylin.apache.org by "Luke Han (JIRA)" <ji...@apache.org> on 2016/02/03 07:52:39 UTC

[jira] [Commented] (KYLIN-1351) Support common RDBMS as data source in Kylin

    [ https://issues.apache.org/jira/browse/KYLIN-1351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15129917#comment-15129917 ] 

Luke Han commented on KYLIN-1351:
---------------------------------

Copy from mailing list and link old JIRA.

----------------
Hi Edward, 
     Thanks to raise this discussion, read data from RDBMs is tricky and we have to come up a very clear design and architecture before implement it.

     There's one thread/JIRA about read data from Oracle directly, but finally dropped this since there's already many tools could handle it, extract data from Oracle and load to Hive.

     The concern here is, most RDBMs are not optimized yet for distribution system to read directly. For example, hundreds Hadoop nodes read data from MySQL or Oracle or others directly. And also network.

     From the beginning, we decided to use Hive as protocol between upstream and Kylin. This is good model so far since users could leverage every ETL tool to do this job, to landing source data into Hive and then build cube based on it. Even if Kylin supports to read data from RDBMs, then how about transform? how about load? it will bring ETL parts into Kylin's scope which is not good idea, I think.
     
      But read from RDBMs is valid to extend input source rather than Hive today, not only RDBMs also SparkSQL, Impala, Drill and other SQL on Hadoop. 
      How about to build a light tool for this requirement? Which could be one extension tool for user to leverage.

      Thanks.
Luke


> Support common RDBMS as data source in Kylin
> --------------------------------------------
>
>                 Key: KYLIN-1351
>                 URL: https://issues.apache.org/jira/browse/KYLIN-1351
>             Project: Kylin
>          Issue Type: New Feature
>            Reporter: Shaofeng SHI
>            Assignee: Edward Zhang
>              Labels: newbie
>
> From v2.0, Kylin's plug-in architecture makes it possible to have multiple data sources, cube engines and storages. Some users ever aksed that whether Kylin support source data feeded from RDBMS like Oracle, MySQL, now it is possible to do that. Some tools like Apache Sqoop can easily export data from RDBMS to HDFS, that would help Kylin get the data and then build that into cubes. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)