You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Andrew Purtell (JIRA)" <ji...@apache.org> on 2015/04/11 02:22:12 UTC

[jira] [Updated] (HBASE-6358) Bulkloading from remote filesystem is problematic

     [ https://issues.apache.org/jira/browse/HBASE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Purtell updated HBASE-6358:
----------------------------------
    Resolution: Incomplete
        Status: Resolved  (was: Patch Available)

> Bulkloading from remote filesystem is problematic
> -------------------------------------------------
>
>                 Key: HBASE-6358
>                 URL: https://issues.apache.org/jira/browse/HBASE-6358
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.94.0
>            Reporter: Dave Revell
>         Attachments: 6358-suggestion.txt, HBASE-6358-trunk-v1.diff, HBASE-6358-trunk-v2.diff, HBASE-6358-trunk-v3.diff
>
>
> Bulk loading hfiles that don't live on the same filesystem as HBase can cause problems for subtle reasons.
> In Store.bulkLoadHFile(), the regionserver will copy the source hfile to its own filesystem if it's not already there. Since this can take a long time for large hfiles, it's likely that the client will timeout and retry. When the client retries repeatedly, there may be several bulkload operations in flight for the same hfile, causing lots of unnecessary IO and tying up handler threads. This can seriously impact performance. In my case, the cluster became unusable and the regionservers had to be kill -9'ed.
> Possible solutions:
>  # Require that hfiles already be on the same filesystem as HBase in order for bulkloading to succeed. The copy could be handled by LoadIncrementalHFiles before the regionserver is called.
>  # Others? I'm not familiar with Hadoop IPC so there may be tricks to extend the timeout or something else.
> I'm willing to write a patch but I'd appreciate recommendations on how to proceed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)