You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "xichaomin (Jira)" <ji...@apache.org> on 2021/08/26 03:20:00 UTC
[jira] [Updated] (HBASE-26225) let
hbase.mapreduce.bulkload.assign.sequenceNumbers take effect in
SecureBulkLoadManager.secureBulkLoadHFiles
[ https://issues.apache.org/jira/browse/HBASE-26225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
xichaomin updated HBASE-26225:
------------------------------
Description:
HBASE-10958 Call Flush before BulkLoad to obtain the latest sequenceID to prevent data loss during replay. '_hbase.mapreduce.bulkload.assign.sequenceNumbers_' controls whether to flush before BulkLoad, but we pass true to whether to flush in *SecureBulkLoadManager*. If we bulkload frequently we flush a lot of small files. Can we make 'hbase.mapreduce.bulkload.assign.sequenceNumbers' work in SecureBulkLoadManager? This passes -1 to sequenceId, we won't loss data.
{code:java}
// code placeholder
{code}
//We call bulkLoadHFiles as requesting user //To enable access prior to staging return region.bulkLoadHFiles(familyPaths, true, new SecureBulkLoadListener(fs, bulkToken, conf), request.getCopyFile(), clusterIds, request.getReplicate());
was:HBASE-10958 Call Flush before BulkLoad to obtain the latest sequenceID to prevent data loss during replay. '_hbase.mapreduce.bulkload.assign.sequenceNumbers_' controls whether to flush before BulkLoad, but we pass true to whether to flush in *SecureBulkLoadManager*. If we bulkload frequently we flush a lot of small files. Can we make 'hbase.mapreduce.bulkload.assign.sequenceNumbers' work in SecureBulkLoadManager? This passes -1 to sequenceId, we won't loss data.
> let hbase.mapreduce.bulkload.assign.sequenceNumbers take effect in SecureBulkLoadManager.secureBulkLoadHFiles
> -------------------------------------------------------------------------------------------------------------
>
> Key: HBASE-26225
> URL: https://issues.apache.org/jira/browse/HBASE-26225
> Project: HBase
> Issue Type: Improvement
> Components: Performance
> Reporter: xichaomin
> Priority: Minor
> Attachments: SecureBulkLoadManager.diff
>
>
> HBASE-10958 Call Flush before BulkLoad to obtain the latest sequenceID to prevent data loss during replay. '_hbase.mapreduce.bulkload.assign.sequenceNumbers_' controls whether to flush before BulkLoad, but we pass true to whether to flush in *SecureBulkLoadManager*. If we bulkload frequently we flush a lot of small files. Can we make 'hbase.mapreduce.bulkload.assign.sequenceNumbers' work in SecureBulkLoadManager? This passes -1 to sequenceId, we won't loss data.
> {code:java}
> // code placeholder
> {code}
> //We call bulkLoadHFiles as requesting user //To enable access prior to staging return region.bulkLoadHFiles(familyPaths, true, new SecureBulkLoadListener(fs, bulkToken, conf), request.getCopyFile(), clusterIds, request.getReplicate());
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)