You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tajo.apache.org by "Hyunsik Choi (JIRA)" <ji...@apache.org> on 2015/11/03 04:51:27 UTC

[jira] [Updated] (TAJO-1959) Improve AWS S3 file system support

     [ https://issues.apache.org/jira/browse/TAJO-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyunsik Choi updated TAJO-1959:
-------------------------------
    Description: 
We need to improve AWS S3 support. As you know, S3 has different characteristics as follows:
 * No move operation. Move is emulated by copy and remove.
 * too slow directory listing 
 * unnecessary locality (i.e., always remote access)
 * eventual consistency

Emulating S3 via just HDFS implementation may cause lots of performance degradation. We need to mitigate the performance degradation points.

This is an umbrella issue to track sub tasks.

  was:
We need to improve AWS S3 support. As you know, S3 has different characteristics as follows:
 * No move operation. Move is emulated by copy and remove.
 * too slow directory listing 
 * unnecessary locality (i.e., always remote access)
 * eventual consistency

Emulating S3 via just HDFS implementation may cause lots of performance degradation. We need to mitigate the performance degradation points.

This is an umbrella issue to track sub tasks.

 * https://www.qubole.com/blog/product/optimizing-hadoop-for-s3-part-1/


> Improve AWS S3 file system support
> ----------------------------------
>
>                 Key: TAJO-1959
>                 URL: https://issues.apache.org/jira/browse/TAJO-1959
>             Project: Tajo
>          Issue Type: Improvement
>          Components: S3, Storage
>            Reporter: Hyunsik Choi
>
> We need to improve AWS S3 support. As you know, S3 has different characteristics as follows:
>  * No move operation. Move is emulated by copy and remove.
>  * too slow directory listing 
>  * unnecessary locality (i.e., always remote access)
>  * eventual consistency
> Emulating S3 via just HDFS implementation may cause lots of performance degradation. We need to mitigate the performance degradation points.
> This is an umbrella issue to track sub tasks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)