You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Steven Noels <st...@outerthought.org> on 2011/02/14 16:21:12 UTC

Lily 0.3 is released

Hi all,

Lily is a data/content repository that integrates HBase with SOLR: flexible
content storage and automatic index maintenance - at scale. It's available
under the Apache license.

This release is the result of 3 months of hard work since Lily 0.2 last
October. Our focus was stabilization, performance and robustness, providing
a platform we can continue building upon. More than 50 tickets were resolved
during this development sprint, and we're slowly readying ourselves for the
1.0 release. Lily 0.3 brings many gradual improvements over Lily 0.2. It has
a more solid implementation of the blob fields, automatic retry of
operations that fail due to I/O exceptions (between Lily client and Lily
server), and other miscellaneous improvements, all listed underneath.

Everything Lily can be found at www.lilyproject.org. We're now also sharing
details of our commercial software subscription service with select
prospects, let us know if you're interested!

Here's a concise list of improvements since Lily 0.2:

   - Repository
      - Performance / space improvements
         - Shorter column key encoding (field id's)
         - Reduction of number of column families used
         - Avoid duplicate values in the table: make use of sparseness of
         the table
         - Drop the use of HBase rowlocks, which do not survive region
         splits/moves.
         - Use byte[] as keys in RecordType FieldType cache
      - API
         - Added a new method createOrUpdate which creates or updates a
         record depending on whether it already exists. This new method has the
         advantage over the create method that it can be retried in case of IO
         exceptions, i.e. it is idempotent, similar to PUT in HTTP/REST.
         - Allow updating versioned-mutable fields without specifying the
         record type.
         - Throw a RecordLockedException instead of generic exception when a
         record is locked, this allows Lily clients to retry the
operation in that
         case.
      - Clear historical data when deleting a record and remove any
      referenced blobs.
      - The link index stores record IDs and field IDs as bytes instead of
      strings.
      - The record ID string representation was changed to use comma instead
      of semicolon to separate variant properties, since the use of
semicolons was
      problematic in the JAX-RS based REST interface implementation.
   - Upgrade to Apache HBase 0.90
   - Blobs
      - Rework blobstore functionality
         - Blobs can only be accessed through the record they are used in,
         not directly by using their blob key. This is to allow for future
         record-level access control.
         - Introduce a Repository.getBlob() method, which returns a
         BlobAccess object, which provides access to the blob meta
data (Blob object)
         and the blob input stream. This avoids the need to read the
record in case
         you need the blob metadata.
         - Uploaded blobs which are never used in a record are cleaned up.
      - The HDFS-stored blobs are stored in a hierarchical structure.
   - RowLog improvements
      - Performance improvements
         - the RowLog processor uses a Zookeeper based notification system
         instead of Netty based.
         - Optimize queue scanning: avoid scanning over deleted rows in the
         table, fix too-frequent scanning, fix endless scanning loop
on startup in
         case of no repository activity.
         - The RowLog processor only processes messages of a minimal age
         (avoid conflicts with direct processing of wal messages).
      - Extended RowLogConfigurationManager to add/update rowlog
      configuration information.
      - Avoid and remove stale messages in the queue.
      - Allow the rowlog to either use row-level locks (wal use case) or
      executionstate-level locks per subscription (mq use case) when processing
      messages.
      - Added a WAL processor which handles open WAL messages.
   - REST interface
      - Adapted blob-support to new blobstore functionality. Content-Length
      header is now set when downloading blobs. Multi-value or
hierarchical blobs
      are now accessible.
      - Support updating versioned-mutable fields.
      - Fixed various smaller bugs reported by users.
   - HBase index library
      - Allow to add/remove multiple entries in one call.
      - Performance
         - Fixed important performance issue whereby row scanning always ran
         to the end of the index table.
         - Enable scan caching.
         - Added a performance testing tool.
      - Indexer
      - Upgrade to Tika 0.8
      - Performance
         - Avoid FieldNotFoundException when evaluating field values
      - the SOLR request-writer and response-parser implementation
      configurable. This allows to use the XML format instead of the javabin
      format.
   - LilyClient
      - Automatically retry operations on IOExceptions, this allows
      operations to survive node failures.
      - Automatic balancing over all Lily nodes. Each method called on the
      Repository object will automatically be performed on an
arbitrarily selected
      Lily node.
      - Avro: switch from HTTP to Netty transport. For this, upgraded to an
      Avro 1.5 snapshot with patch AVRO-747.
   - Tester tool
      - Allows to configure test scenarios and indexer and solr
      configuration.
      - Has extended logging, metrics and metrics plotting (gnuplot
      integration) capabilities allowing for performance evaluations.
      - Introduces general performance testing library.
   - Lily server process
      - Ability to create tables with multiple initial regions at first
      cluster startup (record table, linkindex, blobincubator, ...).
Also allows
      to set the max file size and the memstore flush size.
      - The initial Lily startup can now be performed on multiple nodes
      concurrently, previously this failed because the table creation
code did not
      handle failures in case of concurrent table creation.
      - Configuration files changed so that they allow for inheritance (=
      fallback from one conf dir to another, to the built-in conf). Include
      default configuration in Kauri-module jars. All this will help in
      maintaining Lily configuration across Lily versions.

We hope you'll enjoy this new Lily as much as we did making it. Let us know
how we're doing!

The Outerthought Lily team.
--
Steven Noels
http://outerthought.org/
Scalable Smart Data
Makers of Kauri, Daisy CMS and Lily