You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hawq.apache.org by "Dmitry Buzolin (JIRA)" <ji...@apache.org> on 2017/01/17 14:56:26 UTC

[jira] [Updated] (HAWQ-1270) Plugged storage back-ends for HAWQ

     [ https://issues.apache.org/jira/browse/HAWQ-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dmitry Buzolin updated HAWQ-1270:
---------------------------------
    Description: 
Since HAWQ only depends on Hadoop and Parquet for columnar format support, I would like to propose pluggable storage backend design for Hawq. Hadoop is already supported but there is Ceph -  a distributed, storage system which offers standard Posix compliant file system, object and a block storage. Ceph is also data location aware, written in C++. and is more sophisticated storage backend compare to Hadoop at this time. It provides replicated and erasure encoded storage pools, Other great features of Ceph are: snapshots and an algorithmic approach to map data to the nodes rather than having centrally managed namenodes. I don't think HDFS offers any of these features. In terms of performance, Ceph should be faster than HFDS since it is written on C++ and because
it doesn't have scalability limitations when mapping data to storage pools, compare to Hadoop, where name node is such point of contention.

  was:
Since HAWQ only depends on Hadoop and Parquet for columnar format support, I would like to propose pluggable storage backend design for Hawq. Hadoop is already supported but there is Ceph -  a distributed, storage system which offers standard Posix compliant file system, object and a block storage. Ceph is also data location aware, written in C++. and is more sophisticated storage backend compare to Hadoop at this time. It provides replicated and erasure encoded storage pools, Other great features of Ceph is an algorytmic approach to map data to the nodes rather than having centrally managed namenodes and snapshots. I don't think HDFS offers any of these features. In terms of performance, Ceph should be faster than HFDS since it is written on C++ and because
it doesn't have scalability limitations when mapping data to storage pools, compare to Hadoop, where name node is such point of contention.


> Plugged storage back-ends for HAWQ
> ----------------------------------
>
>                 Key: HAWQ-1270
>                 URL: https://issues.apache.org/jira/browse/HAWQ-1270
>             Project: Apache HAWQ
>          Issue Type: Improvement
>            Reporter: Dmitry Buzolin
>            Assignee: Ed Espino
>
> Since HAWQ only depends on Hadoop and Parquet for columnar format support, I would like to propose pluggable storage backend design for Hawq. Hadoop is already supported but there is Ceph -  a distributed, storage system which offers standard Posix compliant file system, object and a block storage. Ceph is also data location aware, written in C++. and is more sophisticated storage backend compare to Hadoop at this time. It provides replicated and erasure encoded storage pools, Other great features of Ceph are: snapshots and an algorithmic approach to map data to the nodes rather than having centrally managed namenodes. I don't think HDFS offers any of these features. In terms of performance, Ceph should be faster than HFDS since it is written on C++ and because
> it doesn't have scalability limitations when mapping data to storage pools, compare to Hadoop, where name node is such point of contention.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)