You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-issues@hadoop.apache.org by "jiang hehui (JIRA)" <ji...@apache.org> on 2016/06/21 08:46:57 UTC

[jira] [Created] (HADOOP-13304) distributed database for store , mapreduce for compute

jiang hehui created HADOOP-13304:
------------------------------------

             Summary: distributed database for store , mapreduce for compute
                 Key: HADOOP-13304
                 URL: https://issues.apache.org/jira/browse/HADOOP-13304
             Project: Hadoop Common
          Issue Type: New Feature
          Components: fs
    Affects Versions: 2.6.4
            Reporter: jiang hehui


in hadoop ,hdfs is responsible for store , mapreduce is responsible for compute .
my idea is that data are stored in distributed database , data compute is like mapreduce.

!http://images2015.cnblogs.com/blog/439702/201606/439702-20160621124133334-32823985.png!

* insert: 
using two-phase commit ,according to the split policy ,just execute insert in nodes

* delete: 
using two-phase commit ,according to the split policy ,just execute delete in nodes

* update:
using two-phase commit, according to the split policy, if record node does not change ,just execute update in nodes, if record node change, first delete old value in source node , and insert new value in destination node .
* select:
** simple select (like data just in one node , or data fusion across multi nodes not need)is just the same like standalone database server;
** complex select (like distinct , group by, order by, sub query, join across multi nodes),we call a job 
{panel}
{color:red}job are parsed into stages , stages have lineage , all stages in a job make up dag( Directed Acyclic Graph ) ,every stage contains mapsql ,shuffle, reducesql .
when receive request sql, according to metadata ,generate the execution plan which contain the dag , including stage and mapsql ,shuffle, reducesql in each stage; then just execute the plan , and return result to client.

as in spark , it is the same ; rdd is table , job is job;
as mapreduce in hadoop, it is the same ; mapsql is map , shuffle is shuffle , reducesql is reduce.
{color}
{panel}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org