You are viewing a plain text version of this content. The canonical link for it is here.
Posted to announce@apache.org by Andrew Purtell <ap...@apache.org> on 2017/12/19 19:32:36 UTC
[ANNOUNCE] Apache HBase 1.4.0 is now available for download
The HBase team is happy to announce the immediate availability of Apache
HBase 1.4.0!
Apache HBase is an open-source, distributed, versioned, non-relational
database. Apache HBase gives you low latency random access to billions of
rows with millions of columns atop non-specialized hardware. To learn more
about HBase, see https://hbase.apache.org/.
Download through an ASF mirror:
https://www.apache.org/dyn/closer.lua/hbase/1.4.0
HBase 1.4.0 is the first release in the new HBase 1.4 line, continuing on
the theme of earlier 1.x releases of bringing a stable, reliable database
to the Apache Big Data ecosystem and beyond. As a minor release, 1.4.0
contains a number of new features and improvements that won't appear in
maintenance releases of older code lines. However complete compatibility
with data formats and interoperability with older clients is assured.
There are no special considerations for upgrade or rollback except as
noted in this release announcement.
Maintenance releases of the 1.4.0 code line will occur at roughly a
monthly cadence.
For instructions on verifying ASF release downloads, please see
https://www.apache.org/dyn/closer.cgi#verify
Project member signature keys can be found at
https://www.apache.org/dist/hbase/KEYS
Thanks to all the contributors who made this release possible!
The complete list of the 660 issues resolved in this release can be found
at https://s.apache.org/OErT . New developer and user-facing
incompatibilities, important issues, features, and major improvements
include:
Critical
- HBASE-15484 Correct the semantic of batch and partial
Now setBatch doesn't mean setAllowPartialResult(true)
Scan#setBatch is helpful in paging queries, if you just want to prevent
OOM at client, use setAllowPartialResults(true) is better.
We deprecated isPartial and use mayHaveMoreCellsInRow. If it returns
false, current Result must be the last one of this row.
- HBASE-17287 Master becomes a zombie if filesystem object closes
If filesystem is not available during log split, abort master server.
- HBASE-17471 Region Seqid will be out of order in WAL if using
mvccPreAssign
MVCCPreAssign is added by HBASE-16698, but pre-assign mvcc is only used
in put/delete path. Other write paths like increment/append still assign
mvcc in ringbuffer's consumer thread. If put and increment are used
parallel. Then seqid in WAL may not increase monotonically. Disorder in
wals will lead to data loss. This patch bring all mvcc/seqid event in
wal.append, and synchronize wal append and mvcc acquirement. No disorder
in wal will happen. Performance test shows no regression.
- HBASE-17595 Add partial result support for small/limited scan
Now small scan and limited scan can also return partial results.
- HBASE-17717 Incorrect ZK ACL set for HBase superuser
In previous versions of HBase, the system intended to set a ZooKeeper
ACL on all "sensitive" ZNodes for the user specified in the
hbase.superuser configuration property. Unfortunately, the ACL was
malformed which resulted in the hbase.superuser being unable to access
the sensitive ZNodes that HBase creates. HBase will automatically
correct the ACLs on start so users do not need to manually correct the
ACLs.
- HBASE-17887 Row-level consistency is broken for read
Now we pass on list of memstoreScanners to the StoreScanner along with
the new files to ensure that the StoreScanner sees the latest memstore
after flush.
- HBASE-17931 Assign system tables to servers with highest version
We usually keep compatibility between old client and new server so we
can do rolling upgrade, HBase cluster first, then HBase client. But we
don't guarantee new client can access old server. In an HBase cluster,
we have system tables and region servers will access these tables so for
servers they are also an HBase client. So if the system tables are in
region servers with lower version we may get trouble because region
servers with higher version may can not access them. After this patch,
we will move all system regions to region servers with highest version.
So when we do a rolling upgrade across two major or minor versions, we
should ALWAYS UPGRADE MASTER FIRST and then upgrade region
servers.
The new master will handle system tables correctly.
- HBASE-18035 Meta replica does not give any primaryOperationTimeout to
primary meta region
When a client is configured to use meta replica, it sends scan request
to all meta replicas almost at the same time. Since meta replica
contains stale data, if result from one of replica comes back first, the
client may get wrong region locations. To fix this,
"hbase.client.meta.replica.scan.timeout" is introduced, a client will
always send to primary meta region first, wait the configured timeout
for reply. If no result is received, it will send request to replica
meta regions. The unit for "hbase.client.meta.replica.scan.timeout" is
microsecond, the default value is 1000000 (1 second).
- HBASE-18137 Replication gets stuck for empty WALs
0-length WAL files can potentially cause the replication queue to get
stuck. A new config "replication.source.eof.autorecovery" has been
added: if set to true (default is false), the 0-length WAL file will be
skipped after 1) the max number of retries has been hit, and 2) there
are more WAL files in the queue. The risk of enabling this is that
there is a chance the 0-length WAL file actually has some data (e.g.
block went missing and will come back once a datanode is recovered).
- HBASE-18164 Much faster locality cost function and candidate generator
New locality cost function and candidate generator that use caching and
incremental computation to allow the stochastic load balancer to
consider ~20x more cluster configurations for big clusters.
- HBASE-18192 Replication drops recovered queues on region server shutdown
If a region server that is processing recovered queue for another
previously dead region server is gracefully shut down, it can drop the
recovered queue under certain conditions. Running without this fix on a
1.2+ release means possibility of continuing data loss in replication,
irrespective of which WALProvider is used. If a single WAL group (or
DefaultWALProvider) is used, running without this fix will always cause
dataloss in replication whenever a region server processing recovered
queues is gracefully shutdown.
- HBASE-18233 We shouldn't wait for readlock in doMiniBatchMutation in case
of deadlock
This patch plus the sort of mutations done in HBASE-17924 fixes a
performance regression doing increments and/or checkAndPut-style
operations.
- HBASE-18255 Time-Delayed HBase Performance Degradation with Java 7
This change sets the JVM property ReservedCodeCacheSize to 256MB in the
provided hbase-env.sh example file. The specific value for this property
attempts to prevent performance issues seen when HBase using Java 7. The
value set is the same as the default when using Java 8.
- HBASE-18469 Correct RegionServer metric of totalRequestCount
We introduced a new RegionServer metrics in name of
"totalRowActionRequestCount" which counts in all row actions and equals
to the sum of "readRequestCount" and "writeRequestCount". Meantime, we
have changed "totalRequestCount" to count only once for multi request,
while previously we will count in action number of the request. As a
result, existing monitoring system on totalRequestCount will still work
but see a smaller value, and we strongly recommend to change to use the
new metrics to monitor server load.
- HBASE-18577 shaded client includes several non-relocated third party
dependencies
The HBase shaded artifacts (hbase-shaded-client and hbase-shaded-server)
no longer contain several non-relocated third party dependency classes
that were mistakenly included. Downstream users who relied on these
classes being present will need to add a runtime dependency onto an
appropriate third party artifact. Previously, we erroneously packaged
several third party libs without relocating them. In some cases these
libraries have now been relocated; in some cases they are no longer
included at all.
Includes:
* jaxb
* jetty
* jersey
* codahale metrics (HBase 1.4+ only)
* commons-crypto
* jets3t
* junit
* curator (HBase 1.4+)
* netty 3 (HBase 1.1)
* mokito-junit4 (HBase 1.1)
- HBASE-18665 ReversedScannerCallable invokes getRegionLocations
incorrectly
Performing reverse scan on tables used the meta cache incorrectly and
fetched data from meta table every time. This fix solves this issue and
which results in performance improvement for reverse scans.
- HBASE-19285 Add per-table latency histograms
Per-RegionServer table latency histograms have been returned to HBase
(after being removed due to impacting performance). These metrics are
exposed via a new JMX bean "TableLatencies" with the typical naming
conventions: namespace, table, and histogram component.
Major
- HBASE-7621 REST client doesn't support binary row keys
RemoteHTable now supports binary row keys with any character or byte
by properly encoding request URLs. This is a both a behavioral change
from earlier versions and an important fix for protocol correctness.
- HBASE-11013 Clone Snapshots on Secure Cluster Should provide option to
apply Retained User Permissions
While creating a snapshot, it will save permissions of the original
table into .snapshotinfo file(Backward compatibility), which is in the
snapshot root directory. For clone_snapshot/restore_snapshot command,
we provide an additional option (RESTORE_ACL) to decide whether we will
grant permissions of the origin table to the newly created table.
- HBASE-14548 Expand how table coprocessor jar and dependency path can
be specified
Allows a directory containing the jars or some wildcards to be
specified, such as:
hdfs://namenode:port/user/hadoop-user/
or
hdfs://namenode:port/user/hadoop-user/*.jar
Please note that if a directory is specified, all jar files directly
in the directory are added, but it does not search files in the
subtree rooted in the directory. Do not use a wildcard if you would
like to specify a directory.
- HBASE-14925 HBase shell command/tool to list table's region info through
command line
Added a shell command 'list_regions' for displaying the table's region
info through command line. List all regions for a particular table as an
array and also filter them by server name (optional) as prefix and
maximum locality (optional). By default, it will return all the regions
for the table with any locality. The command displays server name,
region name, start key, end key, size of the region in MB, number of
requests and the locality. The information can be projected out via an
array as third parameter. By default all these information is displayed.
Possible array values are SERVER_NAME, REGION_NAME, START_KEY,
END_KEY, SIZE, REQ and LOCALITY. Values are not case sensitive. If you
don't want to filter by server name, pass an empty hash or string.
- HBASE-15187 Integrate CSRF prevention filter to REST gateway
Protection against CSRF attack can be turned on with config
parameter, hbase.rest.csrf.enabled - default value is false.
The custom header to be sent can be changed via config parameter,
hbase.rest.csrf.custom.header whose default value is
"X-XSRF-HEADER".
The configuration parameter hbase.rest.csrf.methods.to.ignore
controls which HTTP methods are not associated with custom header
check.
The config parameter hbase.rest-csrf.browser-useragents-regex is a
comma-separated list of regular expressions used to match against an
HTTP request's User-Agent header when protection against cross-site
request forgery (CSRF) is enabled for REST server by setting
hbase.rest.csrf.enabled to true.
- HBASE-15236 Inconsistent cell reads over multiple bulk-loaded HFiles
During bulkloading, if there are multiple hfiles corresponding to
the same region, and if they have same timestamps (which may have
been set using importtsv.timestamp) and duplicate keys across them,
then get and scan may return values coming from different hfiles.
- HBASE-15243 Utilize the lowest seek value when all Filters in
MUST_PASS_ONE FilterList return SEEK_NEXT_USING_HINT
When all filters in a MUST_PASS_ONE FilterList return a
SEEK_USING_NEXT_HINT code, we return SEEK_NEXT_USING_HINT from
FilterList#filterKeyValue() to utilize the lowest seek value.
- HBASE-15386 PREFETCH_BLOCKS_ON_OPEN in HColumnDescriptor is ignored
Changes the prefetch TRACE-level loggings to include the word
'Prefetch' in them so you know what they are about.
Changes the cryptic logging of the CacheConfig#toString to have
some preamble saying why and what column family is responsible.
- HBASE-15576 Scanning cursor to prevent blocking long time on
ResultScanner.next()
If you don't like scanning being blocked too long because of heartbeat
and partial result, you can use Scan#setNeedCursorResult(true) to get a
special result within scanning timeout setting time which will tell you
where row the server is scanning. See its javadoc for more details.
- HBASE-15633 Backport HBASE-15507 to branch-1
Adds update_peer_config to the HBase shell and ReplicationAdmin, and
provides a callback for custom replication endpoints to be notified
of changes to configuration and peer data
- HBASE-15686 Add override mechanism for the exempt classes when
dynamically loading table coprocessor
The
n
ew coprocessor table descriptor attribute,
hbase.coprocessor.classloader.included.classes, is added. A user can
specify class name prefixes (semicolon separated) which should be
loaded by CoprocessorClassLoader.
- HBASE-15711 Add client side property to allow logging details for
batch errors
A new client side property hbase.client.log.batcherrors.details is
introduced to allow logging the full stacktrace of exceptions for
batch errors. It is disabled by default.
- HBASE-15816 Provide client with ability to set priority on Operations
Added setPriority(int priority) API to Put, Delete, Increment, Append,
Get and Scan. For all these ops, the user can provide a custom RPC
priority level.
- HBASE-15924 Enhance hbase services autorestart capability to
hbase-daemon.sh
Now one can start hbase services with enabled "autostart/autorestart"
feature in controlled fashion with the help of "--autostart-window-size"
to define the window period and the "--autostart-window-retry-limit" to
define the number of times the hbase services have to be restarted upon
being killed/terminated abnormally within the provided window perioid.
The following cases are supported with "autostart/autorestart":
a) --autostart-window-size=0 and --autostart-window-retry-limit=0,
indicates infinite window size and no retry limit
b) not providing the args, will default to a)
c) --autostart-window-size=0 and --autostart-window-retry-limit=
<positive value> indicates the autostart process to bail out if the
retry limit exceeds irrespective of window period
d) --autostart-window-size=<x> and --autostart-window-retry-limit=<y>
indicates the autostart process to bail out if the retry limit "y"
is exceeded for the last window period "x".
- HBASE-15941 HBCK repair should not unsplit healthy splitted region
A new option -removeParents is now available that will remove an old
parent when two valid daughters for that parent exist and
-fixHdfsOverlaps is used. If there is an issue trying to remove the
parent from META or sidelining the parent from HDFS we will fallback to
do a regular merge. For now this option only works when the overlap
group consists only of 3 regions (a parent, daughter A and daughter B)
- HBASE-15950 Fix memstore size estimates to be tighter
The estimates of heap usage by the memstore objects (KeyValue, object
and array header sizes, etc) have been made more accurate for heap
sizes up to 32G (using CompressedOops), resulting in them dropping by
10-50% in practice. This also results in less number of flushes and
compactions due to "fatter" flushes. As a result, the actual heap usage
of the memstore before being flushed may increase by up to 100%. If
configured memory limits for the region server had been tuned based on
observed usage, this change could result in worse GC behavior or even
OutOfMemory errors. Set the environment property (not hbase-site.xml)
"hbase.memorylayout.use.unsafe" to false to disable.
- HBASE-15994 Allow selection of RpcSchedulers
Adds a FifoRpcSchedulerFactory so you can try the FifoRpcScheduler by
setting "hbase.region.server.rpc.scheduler.factory.class"
- HBASE-16052 Improve HBaseFsck Scalability
Improves the performance and scalability of HBaseFsck, especially for
large clusters with a small number of large tables.
Searching for lingering reference files is now a multi-threaded
operation. Loading HDFS region directory information is now multi-
threaded at the region-level instead of the table-level to maximize
concurrency. A performance bug in HBaseFsck that resulted in
redundant I/O and RPCs was fixed by introducing a FileStatusFilter
that filters FileStatus objects directly.
- HBASE-16213 A new HFileBlock structure for fast random get
Introduces a new DataBlockEncoding in name of ROW_INDEX_V1, which
could improve random read (get) performance especially when the
average record size (key-value size per row) is small. To use this
feature, please set DATA_BLOCK_ENCODING to ROW_INDEX_V1 for column
family of newly created table, or change existing CF with below shell
command:
alter 'table_name',{NAME => 'cf', DATA_BLOCK_ENCODING =>
'ROW_INDEX_V1'}.
Please note that if we turn this DBE on, HFile block will be bigger
than NONE encoding because it adds some metadata for binary search.
Seek in row when random reading is one of the main consumers of CPU.
This helps.
- HBASE-16244 LocalHBaseCluster start timeout should be configurable
When LocalHBaseCluster is started from the command line the Master
would give up after 30 seconds due to a hardcoded timeout meant for
unit tests. This change allows the timeout to be configured via
hbase-site as well as sets it to 5 minutes when LocalHBaseCluster
is started from the command line.
- HBASE-16336 Removing peers seems to be leaving spare queues
Add a ReplicationZKNodeCleaner periodic check and delete any useless
replication queue belonging to a peer which does not exist.
- HBASE-16388 Prevent client threads being blocked by only one slow
region server
Adds a new configuration, hbase.client.perserver.requests.threshold,
to limit the max number of concurrent request to one region server.
If the user still create new request after reaching the limit,
client will throw ServerTooBusyException and do not send the request
to the server. This is a client side feature and can prevent client's
threads being blocked by one slow region server resulting in the
availability of client is much lower than the availability of region
servers.
- HBASE-16540 Scan should do additional validation on start and stop row
Scan#setStartRow() and Scan#setStopRow() now validate the argument
passed for each row key. If the length of the parameter passed
exceeds Short.MAX_VALUE, an IllegalArgumentException will be thrown.
- HBASE-16584 Backport the new ipc implementation in HBASE-16432 to
branch-1
The netty dependency is upgraded to 4.1.1.Final. And also some
configurations of the old AsyncRpcClient is gone. Such as
"hbase.rpc.client.threads.max" and "hbase.rpc.client.nativetransport".
- HBASE-16653 Backport HBASE-11393 to all branches which support namespace
During HBASE-11393, we have done two things:
1. unify tableCFs with peerConfig
2. Fix ns not support issue for replication.
This issue is to backport it to branch-1
How to rolling update if the replication peer have old table-cfs
config? Due to we modify proto object of ReplicationPeerConfig (add
tableCFs field), so when we do rolling update, we have to update
original ReplicationPeerConfig data on ZK firstly.
1. Make sure the master have the permission to modify replication
peer znode.
2. Disable the replication peer.
3. Rolling update master first. The master will copy the table-cfs
config from old table-cfs znode and add it to the new proto
object of ReplicationPeerConfig.
4. Rolling update regionservers.
5. Enable the replication peer.
If you can't change the replication peer znode permission, you can
use the TableCFsUpdater tool to copy the table-cfs config.
1. Disable the replication peer.
2. bin/hbase
org.apache.hadoop.hbase.replication.master.TableCFsUpdater
update
3. Rolling update master and regionservers.
4. Enable the replication peer.
- HBASE-16672 Add option for bulk load to always copy hfile(s) instead of
renaming
This issue adds a config, always.copy.files, to LoadIncrementalHFiles.
When set to true, source hfiles would be copied. Meaning source hfiles
would be kept after bulk load is done. Default value is false.
- HBASE-16698] Performance issue: handlers stuck waiting for CountDownLatch
inside WALKey#getWriteEntry under high writing workload
Assign sequenceid to an edit before we go on the ringbuffer; undoes
contention on WALKey latch. Adds a new config
"hbase.hregion.mvcc.preassign" which defaults to true: i.e. this speedup
is enabled.
- HBASE-16755 Honor flush policy under global memstore pressure
Prior to this change, when the memstore low water mark is exceeded on a
regionserver, the regionserver will force flush all stores on the
regions selected for flushing until we drop below the low water mark.
With this change, the regionserver will continue to force flush regions
when above the memstore low water mark, but will only flush the stores
returned by the configured FlushPolicy.
- HBASE-16993 BucketCache throw java.io.IOException: Invalid HFile block
magic when configuring hbase.bucketcache.bucket.sizes
Any value for hbase.bucketcache.bucket.sizes configuration must be a
multiple of 256. If that is not the case, instantiation of L2 Bucket
cache itself will fail throwing IllegalArgumentException.
- HBASE-17112 Prevent setting timestamp of delta operations the same as
previous value
Before this issue, two concurrent Increments/Appends done in same
millisecond or RS's clock going back will result in two results having
same TS, which is not friendly to versioning and will get wrong result
in sink cluster if the replication is disordered. After this issue, the
result of Increment/Append will always have an incremental TS. There is
no longer any inconsistency in replication for these operations.
- HBASE-17178 Add region balance throttling
Add region balance throttling. Master execute every region balance plan
per balance interval, which is equals to divide max balancing time by
the size of region balance plan. And Introduce a new config
hbase.master.balancer.maxRitPercent to protect availability. If config
this to 0.01, then the max percent of regions in transition is 1% when
balancing. Then the cluster's availability is at least 99% when
balancing.
- HBASE-17280 Add mechanism to control cleaner chore behavior
The HBase cleaner chore process cleans up old WAL files and archived
HFiles. Cleaner operation can affect query performance when running
heavy workloads, so disable the cleaner during peak hours. The cleaner
has the following HBase shell commands:
- cleaner_chore_enabled: Queries whether cleaner chore is enabled/
disabled.
- cleaner_chore_run: Manually runs the cleaner to remove files.
- cleaner_chore_switch: enables or disables the cleaner and returns
the previous state of the cleaner. For example, cleaner-switch true
enables the cleaner.
Following APIs are added in Admin:
- setCleanerChoreRunning(boolean on): Enable/Disable the cleaner chore
- runCleanerChore(): Ask for cleaner chore to run
- isCleanerChoreEnabled(): Query whether cleaner chore is enabled/
disabled.
- HBASE-17296 Provide per peer throttling for replication
Provide per peer throttling for replication. Add the bandwidth upper
limit to ReplicationPeerConfig and a new shell cmd set_peer_bandwidth
to update the bandwidth as needed.
- HBASE-17426 Inconsistent environment variable names for enabling JMX
In bin/hbase-config.sh, if value for HBASE_JMX_BASE is empty, keep
current behavior. If HBASE_JMX_OPTS is not empty, keep current
behavior. Otherwise use the value of HBASE_JMX_BASE
- HBASE-17437 Support specifying a WAL directory outside of the root
directory
This patch adds support for specifying a WAL directory outside of the
HBase root directory.
Multiple configuration variables were added to accomplish this:
hbase.wal.dir: used to configure where the root WAL directory is
located. Could be on a different FileSystem than the root directory. WAL
directory can not be set to a subdirectory of the root directory. The
default value of this is the root directory if unset.
hbase.rootdir.perms: Configures FileSystem permissions to set on the
root directory. This is '700' by default.
hbase.wal.dir.perms: Configures FileSystem permissions to set on the WAL
directory FileSystem. This is '700' by default.
- HBASE-17472 Correct the semantic of permission grant
Before this patch, later granted permissions will override previous
granted permissions, and previous granted permissions will be lost. This
issue re-defines the grant semantic: for master branch, later granted
permissions will merge with previous granted permissions. For
branch-1.4, grant keep override behavior for compatibility purpose, and
a grant with mergeExistingPermission flag provided.
- HBASE-17508 Unify the implementation of small scan and regular scan for
sync client
Now the scan.setSmall method is deprecated. Consider using scan.setLimit
and scan.setReadType in the future. And we will open scanner lazily when
you call scanner.next. This is an incompatible change which delays the
table existence check and permission check.
- HBASE-17578 Thrift per-method metrics should still update in the case of
exceptions
In prior versions, the HBase Thrift handlers failed to increment per-
method metrics when an exception was encountered. These metrics will
now always be incremented, whether an exception is encountered or not.
This change also adds exception-type metrics, similar to those exposed
in regionservers, for individual exceptions which are received by the
Thrift handlers.
- HBASE-17583 Add inclusive/exclusive support for startRow and endRow of
scan for sync client
Now you can include or exclude the startRow and stopRow for a scan. The
new methods to specify startRow and stopRow are withStartRow and
withStopRow. The old methods to specify startRow and Row(include
constructors) are marked as deprecated as in the old time if startRow
and stopRow are equal then we will consider it as a get scan and include
the stopRow implicitly. This is strange after we can set inclusiveness
explicitly so we add new methods and depredate the old methods. The
deprecated methods will be removed in the future.
- HBASE-17584 Expose ScanMetrics with ResultScanner rather than Scan
Now you can use ResultScanner.getScanMetrics to get the scan metrics at
any time during the scan operation. The old Scan.getScanMetrics is
deprecated and still work, but if you use ResultScanner.getScanMetrics
to get the scan metrics and reset it, then the metrics published to the
Scan instaince will be messed up.
- HBASE-17599 Use mayHaveMoreCellsInRow instead of isPartial
The word 'isPartial' is ambiguous so we introduce a new method
'mayHaveMoreCellsInRow' to replace it. And the old meaning of
'isPartial' is not the same with 'mayHaveMoreCellsInRow' as for batched
scan, if the number of returned cells equals to the batch, isPartial
will be false. After this change the meaning of 'isPartial' will be same
with 'mayHaveMoreCellsInRow'. This is an incompatible change but it is
not likely to break a lot of things as for batched scan the old
'isPartial' is just a redundant information, i.e, if the number of
returned cells reaches the batch limit. You have already know the number
of returned cells and the value of batch.
- HBASE-17737 Thrift2 proxy should support scan timeRange per column
family
Thrift2 proxy now supports scan timeRange per column family.
- HBASE-17757 Unify blocksize after encoding to decrease memory fragment
Blocksize is set in columnfamily's atrributes. It is used to control
block sizes when generating blocks. But, it doesn't take encoding into
count. If you set encoding to blocks, after encoding, the block size
varies. Since blocks will be cached in memory after encoding (default),
it will cause memory fragment if using blockcache, or decrease the pool
efficiency if using bucketCache. This issue introduced a new config
named 'hbase.writer.unified.encoded.blocksize.ratio'. The default value
of this config is 1, meaning doing nothing. If this value is set to a
smaller value like 0.5, and the blocksize is set to 64KB (default value
of blocksize). It will unify the blocksize after encoding to 64KB * 0.5
= 32KB. Unified blocksize will relieve the memory problems mentioned
above.
- HBASE-17817 Make Regionservers log which tables it removed coprocessors
from when aborting
Adds table name to exception logging when a coprocessor is removed from
a table by the region server.
- HBASE-17861 Regionserver down when checking the permission of staging dir
if hbase.rootdir is on S3
Some object store does not support unix style permission. This fixes the
permission check issue when specify staging dir in different file
system. Currently it covers s3, wasb, swift.
- HBASE-17877 Improve HBase's byte[] comparator
Updated the lexicographic byte array comparator to use a slightly more
optimized version similar to the one available in the guava library that
compares only the first index where left[index] != right[index]. The
comparator also returns the diff directly instead of mapping it to -1,
0, +1 range as was being done in the earlier version. We have seen
significant performance gains, calculated in terms of throughput (ops/
ms) with these changes ranging from approx 20% for smaller byte arrays
up to 200 bytes and almost 100% for large byte array sizes that are in
few KBs. We benchmarked with up to 16KB arrays and the general trend
indicates that the performance improvement increases as the size of the
byte array increases.
- HBASE-17956 Raw scan should ignore TTL
Now raw scan can also read expired cells.
- HBASE-18023 Log multi-requests for more than threshold number of rows
Introduces a warning message in the RegionServer log when an RPC is
received from a client that has more than 5000 "actions" (where an
"action" is a collection of mutations for a specific row) in a single
RPC. Misbehaving clients who send large RPCs to RegionServers can be
malicious, causing temporary pauses via garbage collection or denial of
service via crashes. The threshold of 5000 actions per RPC is defined by
the property "hbase.rpc.rows.warning.threshold" in hbase-site.xml.
- HBASE-18090 Improve TableSnapshotInputFormat to allow multiple mappers
per region
In this task, we make it possible to run multiple mappers per region in
the table snapshot.
- HBASE-18122 Scanner id should include ServerName of region server
The first 32 bits are MurmurHash32 of ServerName string "host,port,ts".
The ServerName contains both host, port, and start timestamp so it can
prevent collision. The lowest 32bit is generated by atomic int.
- HBASE-18149 The setting rules for table-scope attributes and family-scope
attributes should keep consistent
If the table-scope attributes value is false, you need not to enclose
'false' in single quotation. Both COMPACTION_ENABLED => false and
COMPACTION_ENABLED => 'false' will take effect.
- HBASE-18226 Disable reverse DNS lookup at HMaster and use the hostname
provided by RegionServer
The following config is added:
hbase.regionserver.hostname.disable.master.reversedns
This config is for experts: don't set its value unless you really know
what you are doing. When set to true, regionserver will use the current
node hostname for the servername and HMaster will skip reverse DNS
lookup and use the hostname sent by regionserver instead. Note that this
config and hbase.regionserver.hostname are mutually exclusive. See
https://issues.apache.org/jira/browse/HBASE-18226 for more details.
Caution: please make sure rolling upgrade succeeds before turning on
this feature.
- HBASE-18247 Hbck to fix the case that replica region shows as key in the
meta table
The hbck tool can now correct the meta table should it get an entry for
a read replica region.
- HBASE-18374 RegionServer Metrics improvements
This change adds the latency metrics checkAndPut, checkAndDelete,
putBatch and deleteBatch . Also the previous regionserver "mutate"
latency metrics are renamed to "put" metrics. Batch metrics capture the
latency of the entire batch containing put/delete whereas put/delete
metrics capture latency per operation. Note this change will break
existing monitoring based on regionserver "mutate" latency metric.
- HBASE-18520 Add jmx value to determine true Master Start time
Adds a JMX value to track when the Master has finished initializing.
The jmx config is 'masterFinishedInitializationTime' and details the
time in millis that the Master is fully usable and ready to serve
requests.
- HBASE-18533 Expose BucketCache values to be configured
This patch exposes configuration for Bucketcache. These configs are very
similar to those for the LRU cache, but are described below:
"hbase.bucketcache.single.factor"; /** Single access bucket size */
"hbase.bucketcache.multi.factor"; /** Multiple access bucket size */
"hbase.bucketcache.memory.factor"; /** In-memory bucket size */
"hbase.bucketcache.extrafreefactor"; /** Free this floating point
factor of extra blocks when evicting. For example free the number of
blocks requested * (1 + extraFreeFactor) */
"hbase.bucketcache.acceptfactor"; /** Acceptable size of cache (no
evictions if size < acceptable) */
"hbase.bucketcache.minfactor"; /** Minimum threshold of cache (when
evicting, evict until size < min) */
- HBASE-18675 Making {max,min}SessionTimeout configurable for
MiniZooKeeperCluster
Standalone clusters and minicluster instances can now configure the
session timeout for our embedded ZooKeeper quorum using
"hbase.zookeeper.property.minSessionTimeout" and
"hbase.zookeeper.property.maxSessionTimeout".
- HBASE-18786 FileNotFoundException should not be silently handled for
primary region replicas
FileNotFoundException opening a StoreFile in a primary replica now
causes a RegionServer to crash out where before it would be ignored (or
optionally handled via close/reopen).
- HBASE-18993 Backport patches in HBASE-18410 to branch-1.x branches
This change fixes bugs in FilterList, and also does a code refactor
which ensures interface compatibility.
The primary bug fixes are :
1. For sub-filter in FilterList with MUST_PASS_ONE, if previous
filterKeyValue() of sub-filter returns NEXT_COL, we cannot make sure
that the next cell will be the first cell in next column, because
FilterList choose the minimal forward step among sub-filters, and it
may return a SKIP. so here we add an extra check to ensure that the
next cell will match previous return code for sub-filters.
2. Previous logic about transforming cell of FilterList is incorrect, we
should set the previous transform result (rather than the given cell
in question) as the initial value of transform cell before call
filterKeyValue() of FilterList.
3. Handle the ReturnCodes which the previous code did not handle.
About code refactor, we divided the FilterList into two separated sub-
classes: FilterListWithOR and FilterListWithAND, The FilterListWithOR
has been optimized to choose the next minimal step to seek cell rather
than SKIP cell one by one, and the FilterListWithAND has been optimized
to choose the next maximal key to seek among sub-filters in filter list.
All in all, The code in FilterList is clean and easier to follow now.
Note that ReturnCode NEXT_ROW has been redefined as skipping to next
row in current family, not to next row in all family. it’s more
reasonable, because ReturnCode is a concept in store level, not in
region level.
- HBASE-19035 Miss metrics when coprocessor use region scanner to read
data
Move read requests count to region level. Because RegionScanner is
exposed to CP. Update write requests count in processRowsWithLocks.
Remove requestRowActionCount in RSRpcServices. This metric can be
computed by region's readRequestsCount and writeRequestsCount.
- HBASE-19051 Add new split algorithm for num string
Add new split algorithm DecimalStringSplit,row are decimal-encoded long
values in the range "00000000" => "99999999".
- HBASE-19131 Add the ClusterStatus hook and cleanup other hooks which can
be replaced by ClusterStatus hook**
1) Add preGetClusterStatus() and postGetClusterStatus() hooks
2) add preGetClusterStatus() to access control check - an admin action
- HBASE-19144 Retry assignments in FAILED_OPEN state when servers (re)join
the cluster
When regionserver placement groups (RSGroups) is active, as servers
join the cluster the Master will attempt to reassign regions in
FAILED_OPEN state.
- HBASE-19419 Remove hbase-native-client from branch-1
Removed the hbase-native-client module from branch-1 (it is still in
Master). It is not complete. Look for a finished C++ client in the near
future which may be backported to branch-1 at that point.
---
Cheers,
The HBase Dev Team