You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hawq.apache.org by "Ruilong Huo (JIRA)" <ji...@apache.org> on 2017/11/10 02:17:00 UTC

[jira] [Assigned] (HAWQ-1270) Plugged storage back-ends for HAWQ

     [ https://issues.apache.org/jira/browse/HAWQ-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ruilong Huo reassigned HAWQ-1270:
---------------------------------

    Assignee: Ruilong Huo  (was: Yi Jin)

> Plugged storage back-ends for HAWQ
> ----------------------------------
>
>                 Key: HAWQ-1270
>                 URL: https://issues.apache.org/jira/browse/HAWQ-1270
>             Project: Apache HAWQ
>          Issue Type: Improvement
>            Reporter: Dmitry Buzolin
>            Assignee: Ruilong Huo
>
> Since HAWQ only depends on Hadoop and Parquet for columnar format support, I would like to propose pluggable storage backend design for Hawq. Hadoop is already supported but there is Ceph -  a distributed, storage system which offers standard Posix compliant file system, object and a block storage. Ceph is also data location aware, written in C++. and is more sophisticated storage backend compare to Hadoop at this time. It provides replicated and erasure encoded storage pools, Other great features of Ceph are: snapshots and an algorithmic approach to map data to the nodes rather than having centrally managed namenodes. I don't think HDFS offers any of these features. In terms of performance, Ceph should be faster than HFDS since it is written on C++ and because it doesn't have scalability limitations when mapping data to storage pools, compare to Hadoop, where name node is such point of contention.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)