You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hawq.apache.org by "Ruilong Huo (JIRA)" <ji...@apache.org> on 2017/11/10 02:17:00 UTC
[jira] [Assigned] (HAWQ-1270) Plugged storage back-ends for HAWQ
[ https://issues.apache.org/jira/browse/HAWQ-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ruilong Huo reassigned HAWQ-1270:
---------------------------------
Assignee: Ruilong Huo (was: Yi Jin)
> Plugged storage back-ends for HAWQ
> ----------------------------------
>
> Key: HAWQ-1270
> URL: https://issues.apache.org/jira/browse/HAWQ-1270
> Project: Apache HAWQ
> Issue Type: Improvement
> Reporter: Dmitry Buzolin
> Assignee: Ruilong Huo
>
> Since HAWQ only depends on Hadoop and Parquet for columnar format support, I would like to propose pluggable storage backend design for Hawq. Hadoop is already supported but there is Ceph - a distributed, storage system which offers standard Posix compliant file system, object and a block storage. Ceph is also data location aware, written in C++. and is more sophisticated storage backend compare to Hadoop at this time. It provides replicated and erasure encoded storage pools, Other great features of Ceph are: snapshots and an algorithmic approach to map data to the nodes rather than having centrally managed namenodes. I don't think HDFS offers any of these features. In terms of performance, Ceph should be faster than HFDS since it is written on C++ and because it doesn't have scalability limitations when mapping data to storage pools, compare to Hadoop, where name node is such point of contention.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)