You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "jiang hehui (JIRA)" <ji...@apache.org> on 2016/06/21 08:46:57 UTC
[jira] [Created] (HADOOP-13304) distributed database for store ,
mapreduce for compute
jiang hehui created HADOOP-13304:
------------------------------------
Summary: distributed database for store , mapreduce for compute
Key: HADOOP-13304
URL: https://issues.apache.org/jira/browse/HADOOP-13304
Project: Hadoop Common
Issue Type: New Feature
Components: fs
Affects Versions: 2.6.4
Reporter: jiang hehui
in hadoop ,hdfs is responsible for store , mapreduce is responsible for compute .
my idea is that data are stored in distributed database , data compute is like mapreduce.
!http://images2015.cnblogs.com/blog/439702/201606/439702-20160621124133334-32823985.png!
* insert:
using two-phase commit ,according to the split policy ,just execute insert in nodes
* delete:
using two-phase commit ,according to the split policy ,just execute delete in nodes
* update:
using two-phase commit, according to the split policy, if record node does not change ,just execute update in nodes, if record node change, first delete old value in source node , and insert new value in destination node .
* select:
** simple select (like data just in one node , or data fusion across multi nodes not need)is just the same like standalone database server;
** complex select (like distinct , group by, order by, sub query, join across multi nodes),we call a job
{panel}
{color:red}job are parsed into stages , stages have lineage , all stages in a job make up dag( Directed Acyclic Graph ) ,every stage contains mapsql ,shuffle, reducesql .
when receive request sql, according to metadata ,generate the execution plan which contain the dag , including stage and mapsql ,shuffle, reducesql in each stage; then just execute the plan , and return result to client.
as in spark , it is the same ; rdd is table , job is job;
as mapreduce in hadoop, it is the same ; mapsql is map , shuffle is shuffle , reducesql is reduce.
{color}
{panel}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org