You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by "Colak, Emre" <em...@bina.roche.com> on 2015/10/14 05:13:46 UTC

Cells do not get cleared after TTL is set in HBase

Hi,

I have an HBase table with the following description:

{NAME => 'cf', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW',
REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE',
MIN_VERSIONS => '0' , TTL => 'FOREVER', KEEP_DELETED_CELLS => 'FALSE',
BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}

I put some values in it and then set TTL (30s) on those values with another
put operation. First thing I notice is that the timestamps of the cells get
updated after the 2nd put. And 30 seconds later, when I do a scan on the
table, I still see those cells in the table, however this time with their
timestamps updated to the original timestamps.

I understand that these cells won't necessarily be deleted until a
compaction, but why do they still come up in my scan even though the TTL
that I set on them has expired?

Best,

Emre

Re: Cells do not get cleared after TTL is set in HBase

Posted by Emre Colak <co...@gmail.com>.
Thanks for taking a look Anoop. I've just filed HBASE-14630.

On Fri, Oct 16, 2015 at 6:34 AM, Anoop John <an...@gmail.com> wrote:

> I believe the issue with the order with the per cell TTL calc and avoid
> expired cells and versions control is the issue.    When the scan happens
> after the TTL time after second put,  there will be still 2 cells in the
> system.  The 2nd one will not come out as it is TTL expired.  But the 1st
> one as such is not expired..n  If the version check and select only latest
> one happens 1st, and the TTL check, u would have got the desired behavior.
>     Mind raising a jira.  We can discuss there how/whether to solve it.
>
> -Anoop-
>
> On Wed, Oct 14, 2015 at 9:43 AM, Emre Colak <co...@gmail.com> wrote:
>
> > Yes, I'm trying to use the per cell TTL feature. I've tried releases
> 1.0.2
> > and 1.1.2.
> >
> > Here's some Scala code that I've written:
> > ===============================
> >
> > def makePut(rowKey: Array[Byte], cf: Array[Byte], qual: Array[Byte],
> value:
> > Array[Byte]): Put = {
> >     val put = new Put(rowKey)
> >     put.addColumn(cf, qual, value)
> >     put
> > }
> >
> > def getIndex(table: Table, indexName: Array[Byte], cfName: Array[Byte]):
> > Seq[(String, Array[Byte], Long)] = {
> >   val result = MutableList[(String, Array[Byte], Long])]()
> >
> >     val queryResult = table.get(new Get(indexName))
> >     val cellScanner: CellScanner = queryResult.cellScanner()
> >     while (cellScanner.advance()) {
> >     val cell = cellScanner.current()
> >
> >     if (CellUtil.matchingFamily(cell, cfName)) {
> >         val tuple = (Bytes.toStringBinary(cell.getQualifierArray,
> > cell.getQualifierOffset, cell.getQualifierLength),
> >                       Bytes.copy(cell.getValueArray, cell.getValueOffset,
> > cell.getValueLength),
> >                       cell.getTimestamp)
> >         result += tuple
> >       }
> >   }
> >
> >     result
> > }
> >
> > def printIndices(table: Table, indexName: Array[Byte], cfName:
> > Array[Byte]): Unit = {
> >   getIndex(table, indexName, cfName).foreach {
> >     case (q, v, ts) => {
> > println("qualifier: %s, value: %s, ts: %d".format(q, v, ts))
> >       }
> >     }
> > }
> >
> > // Establish connection
> >
> > println("Inserting indices into the database")
> > val table = connection.getTable(TableName.valueOf(tableName))
> > table.put(makePut(rowKeyBytes, cfBytes, Bytes.toBytes("idx1"),
> > Array[Byte](0,0,0,0,1)))
> > table.put(makePut(rowKeyBytes, cfBytes, Bytes.toBytes("idx2"),
> > Array[Byte](0,0,0,1,0)))
> > table.put(makePut(rowKeyBytes, cfBytes, Bytes.toBytes("idx3"),
> > Array[Byte](0,0,1,0,0)))
> >
> > println("Indices in the database: ")
> > val putList = MutableList[Put]()
> > getIndex(table, rowKeyBytes, cfBytes).foreach {
> >   case (q, v, ts) => {
> > println("qualifier: %s, value: %s, ts: %d".format(q, v, ts))
> >
> >          val put = makePut(rowKeyBytes, cfBytes, Bytes.toBytes(q), v)
> >          put.setTTL(30000) // 30 second TTL
> >          putList += put
> >     }
> >     putList += makePut(rowKeyBytes, cfBytes, Bytes.toBytes("idxMerged"),
> > Array[Byte](0,0,1,1,1))
> > }
> >
> > println("Merging existing cells and setting TTLs")
> > table.put(putList)
> >
> > println("Table contents right after the merge: ")
> > printIndices(table, rowKeyBytes, cfBytes)
> >
> > Thread.sleep(10000)
> >
> > println("Table contents 10 seconds after the merge: ")
> > printIndices(table, rowKeyBytes, cfBytes)
> >
> > Thread.sleep(30000)
> >
> > println("Table contents 40 seconds after the merge: ")
> > printIndices(table, rowKeyBytes, cfBytes)
> >
> > // close table and connection
> >
> > And here's what it prints out:
> > =========================
> >
> > Inserting indices into the database
> > Indices in the database:
> > key: idx1, value: 0,0,0,0,1, ts: 1444791952201
> > key: idx2, value: 0,0,0,1,0, ts: 1444791952214
> > key: idx3, value: 0,0,1,0,0, ts: 1444791952218
> > Merging existing cells and setting TTLs
> > Table contents right after the merge:
> > key: idxMerged, value: 0,0,1,1,1, ts: 1444791952341
> > key: idx1, value: 0,0,0,0,1, ts: 1444791952341
> > key: idx2, value: 0,0,0,1,0, ts: 1444791952341
> > key: idx3, value: 0,0,1,0,0, ts: 1444791952341
> > Table contents 10 seconds after the merge:
> > key: idxMerged, value: 0,0,1,1,1, ts: 1444791952341
> > key: idx1, value: 0,0,0,0,1, ts: 1444791952341
> > key: idx2, value: 0,0,0,1,0, ts: 1444791952341
> > key: idx3, value: 0,0,1,0,0, ts: 1444791952341
> > Table contents 40 seconds after the merge:
> > key: idxMerged, value: 0,0,1,1,1, ts: 1444791952341
> > key: idx1, value: 0,0,0,0,1, ts: 1444791952201
> > key: idx2, value: 0,0,0,1,0, ts: 1444791952214
> > key: idx3, value: 0,0,1,0,0, ts: 1444791952218
> >
> >
> > On Tue, Oct 13, 2015 at 8:25 PM, Ted Yu <yu...@gmail.com> wrote:
> >
> > > Looks like you are using per cell TTL feature.
> > >
> > > Which hbase release are you using ?
> > >
> > > Can you formulate your description with either sequence of shell
> commands
> > > or a unit test ?
> > >
> > > Thanks
> > >
> > > On Tue, Oct 13, 2015 at 8:13 PM, Colak, Emre <
> emre.colak@bina.roche.com>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > I have an HBase table with the following description:
> > > >
> > > > {NAME => 'cf', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW',
> > > > REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE',
> > > > MIN_VERSIONS => '0' , TTL => 'FOREVER', KEEP_DELETED_CELLS =>
> 'FALSE',
> > > > BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}
> > > >
> > > > I put some values in it and then set TTL (30s) on those values with
> > > another
> > > > put operation. First thing I notice is that the timestamps of the
> cells
> > > get
> > > > updated after the 2nd put. And 30 seconds later, when I do a scan on
> > the
> > > > table, I still see those cells in the table, however this time with
> > their
> > > > timestamps updated to the original timestamps.
> > > >
> > > > I understand that these cells won't necessarily be deleted until a
> > > > compaction, but why do they still come up in my scan even though the
> > TTL
> > > > that I set on them has expired?
> > > >
> > > > Best,
> > > >
> > > > Emre
> > > >
> > >
> >
>

Re: Cells do not get cleared after TTL is set in HBase

Posted by Anoop John <an...@gmail.com>.
I believe the issue with the order with the per cell TTL calc and avoid
expired cells and versions control is the issue.    When the scan happens
after the TTL time after second put,  there will be still 2 cells in the
system.  The 2nd one will not come out as it is TTL expired.  But the 1st
one as such is not expired..n  If the version check and select only latest
one happens 1st, and the TTL check, u would have got the desired behavior.
    Mind raising a jira.  We can discuss there how/whether to solve it.

-Anoop-

On Wed, Oct 14, 2015 at 9:43 AM, Emre Colak <co...@gmail.com> wrote:

> Yes, I'm trying to use the per cell TTL feature. I've tried releases 1.0.2
> and 1.1.2.
>
> Here's some Scala code that I've written:
> ===============================
>
> def makePut(rowKey: Array[Byte], cf: Array[Byte], qual: Array[Byte], value:
> Array[Byte]): Put = {
>     val put = new Put(rowKey)
>     put.addColumn(cf, qual, value)
>     put
> }
>
> def getIndex(table: Table, indexName: Array[Byte], cfName: Array[Byte]):
> Seq[(String, Array[Byte], Long)] = {
>   val result = MutableList[(String, Array[Byte], Long])]()
>
>     val queryResult = table.get(new Get(indexName))
>     val cellScanner: CellScanner = queryResult.cellScanner()
>     while (cellScanner.advance()) {
>     val cell = cellScanner.current()
>
>     if (CellUtil.matchingFamily(cell, cfName)) {
>         val tuple = (Bytes.toStringBinary(cell.getQualifierArray,
> cell.getQualifierOffset, cell.getQualifierLength),
>                       Bytes.copy(cell.getValueArray, cell.getValueOffset,
> cell.getValueLength),
>                       cell.getTimestamp)
>         result += tuple
>       }
>   }
>
>     result
> }
>
> def printIndices(table: Table, indexName: Array[Byte], cfName:
> Array[Byte]): Unit = {
>   getIndex(table, indexName, cfName).foreach {
>     case (q, v, ts) => {
> println("qualifier: %s, value: %s, ts: %d".format(q, v, ts))
>       }
>     }
> }
>
> // Establish connection
>
> println("Inserting indices into the database")
> val table = connection.getTable(TableName.valueOf(tableName))
> table.put(makePut(rowKeyBytes, cfBytes, Bytes.toBytes("idx1"),
> Array[Byte](0,0,0,0,1)))
> table.put(makePut(rowKeyBytes, cfBytes, Bytes.toBytes("idx2"),
> Array[Byte](0,0,0,1,0)))
> table.put(makePut(rowKeyBytes, cfBytes, Bytes.toBytes("idx3"),
> Array[Byte](0,0,1,0,0)))
>
> println("Indices in the database: ")
> val putList = MutableList[Put]()
> getIndex(table, rowKeyBytes, cfBytes).foreach {
>   case (q, v, ts) => {
> println("qualifier: %s, value: %s, ts: %d".format(q, v, ts))
>
>          val put = makePut(rowKeyBytes, cfBytes, Bytes.toBytes(q), v)
>          put.setTTL(30000) // 30 second TTL
>          putList += put
>     }
>     putList += makePut(rowKeyBytes, cfBytes, Bytes.toBytes("idxMerged"),
> Array[Byte](0,0,1,1,1))
> }
>
> println("Merging existing cells and setting TTLs")
> table.put(putList)
>
> println("Table contents right after the merge: ")
> printIndices(table, rowKeyBytes, cfBytes)
>
> Thread.sleep(10000)
>
> println("Table contents 10 seconds after the merge: ")
> printIndices(table, rowKeyBytes, cfBytes)
>
> Thread.sleep(30000)
>
> println("Table contents 40 seconds after the merge: ")
> printIndices(table, rowKeyBytes, cfBytes)
>
> // close table and connection
>
> And here's what it prints out:
> =========================
>
> Inserting indices into the database
> Indices in the database:
> key: idx1, value: 0,0,0,0,1, ts: 1444791952201
> key: idx2, value: 0,0,0,1,0, ts: 1444791952214
> key: idx3, value: 0,0,1,0,0, ts: 1444791952218
> Merging existing cells and setting TTLs
> Table contents right after the merge:
> key: idxMerged, value: 0,0,1,1,1, ts: 1444791952341
> key: idx1, value: 0,0,0,0,1, ts: 1444791952341
> key: idx2, value: 0,0,0,1,0, ts: 1444791952341
> key: idx3, value: 0,0,1,0,0, ts: 1444791952341
> Table contents 10 seconds after the merge:
> key: idxMerged, value: 0,0,1,1,1, ts: 1444791952341
> key: idx1, value: 0,0,0,0,1, ts: 1444791952341
> key: idx2, value: 0,0,0,1,0, ts: 1444791952341
> key: idx3, value: 0,0,1,0,0, ts: 1444791952341
> Table contents 40 seconds after the merge:
> key: idxMerged, value: 0,0,1,1,1, ts: 1444791952341
> key: idx1, value: 0,0,0,0,1, ts: 1444791952201
> key: idx2, value: 0,0,0,1,0, ts: 1444791952214
> key: idx3, value: 0,0,1,0,0, ts: 1444791952218
>
>
> On Tue, Oct 13, 2015 at 8:25 PM, Ted Yu <yu...@gmail.com> wrote:
>
> > Looks like you are using per cell TTL feature.
> >
> > Which hbase release are you using ?
> >
> > Can you formulate your description with either sequence of shell commands
> > or a unit test ?
> >
> > Thanks
> >
> > On Tue, Oct 13, 2015 at 8:13 PM, Colak, Emre <em...@bina.roche.com>
> > wrote:
> >
> > > Hi,
> > >
> > > I have an HBase table with the following description:
> > >
> > > {NAME => 'cf', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW',
> > > REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE',
> > > MIN_VERSIONS => '0' , TTL => 'FOREVER', KEEP_DELETED_CELLS => 'FALSE',
> > > BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}
> > >
> > > I put some values in it and then set TTL (30s) on those values with
> > another
> > > put operation. First thing I notice is that the timestamps of the cells
> > get
> > > updated after the 2nd put. And 30 seconds later, when I do a scan on
> the
> > > table, I still see those cells in the table, however this time with
> their
> > > timestamps updated to the original timestamps.
> > >
> > > I understand that these cells won't necessarily be deleted until a
> > > compaction, but why do they still come up in my scan even though the
> TTL
> > > that I set on them has expired?
> > >
> > > Best,
> > >
> > > Emre
> > >
> >
>

Re: Cells do not get cleared after TTL is set in HBase

Posted by Emre Colak <co...@gmail.com>.
Yes, I'm trying to use the per cell TTL feature. I've tried releases 1.0.2
and 1.1.2.

Here's some Scala code that I've written:
===============================

def makePut(rowKey: Array[Byte], cf: Array[Byte], qual: Array[Byte], value:
Array[Byte]): Put = {
    val put = new Put(rowKey)
    put.addColumn(cf, qual, value)
    put
}

def getIndex(table: Table, indexName: Array[Byte], cfName: Array[Byte]):
Seq[(String, Array[Byte], Long)] = {
  val result = MutableList[(String, Array[Byte], Long])]()

    val queryResult = table.get(new Get(indexName))
    val cellScanner: CellScanner = queryResult.cellScanner()
    while (cellScanner.advance()) {
    val cell = cellScanner.current()

    if (CellUtil.matchingFamily(cell, cfName)) {
        val tuple = (Bytes.toStringBinary(cell.getQualifierArray,
cell.getQualifierOffset, cell.getQualifierLength),
                      Bytes.copy(cell.getValueArray, cell.getValueOffset,
cell.getValueLength),
                      cell.getTimestamp)
        result += tuple
      }
  }

    result
}

def printIndices(table: Table, indexName: Array[Byte], cfName:
Array[Byte]): Unit = {
  getIndex(table, indexName, cfName).foreach {
    case (q, v, ts) => {
println("qualifier: %s, value: %s, ts: %d".format(q, v, ts))
      }
    }
}

// Establish connection

println("Inserting indices into the database")
val table = connection.getTable(TableName.valueOf(tableName))
table.put(makePut(rowKeyBytes, cfBytes, Bytes.toBytes("idx1"),
Array[Byte](0,0,0,0,1)))
table.put(makePut(rowKeyBytes, cfBytes, Bytes.toBytes("idx2"),
Array[Byte](0,0,0,1,0)))
table.put(makePut(rowKeyBytes, cfBytes, Bytes.toBytes("idx3"),
Array[Byte](0,0,1,0,0)))

println("Indices in the database: ")
val putList = MutableList[Put]()
getIndex(table, rowKeyBytes, cfBytes).foreach {
  case (q, v, ts) => {
println("qualifier: %s, value: %s, ts: %d".format(q, v, ts))

         val put = makePut(rowKeyBytes, cfBytes, Bytes.toBytes(q), v)
         put.setTTL(30000) // 30 second TTL
         putList += put
    }
    putList += makePut(rowKeyBytes, cfBytes, Bytes.toBytes("idxMerged"),
Array[Byte](0,0,1,1,1))
}

println("Merging existing cells and setting TTLs")
table.put(putList)

println("Table contents right after the merge: ")
printIndices(table, rowKeyBytes, cfBytes)

Thread.sleep(10000)

println("Table contents 10 seconds after the merge: ")
printIndices(table, rowKeyBytes, cfBytes)

Thread.sleep(30000)

println("Table contents 40 seconds after the merge: ")
printIndices(table, rowKeyBytes, cfBytes)

// close table and connection

And here's what it prints out:
=========================

Inserting indices into the database
Indices in the database:
key: idx1, value: 0,0,0,0,1, ts: 1444791952201
key: idx2, value: 0,0,0,1,0, ts: 1444791952214
key: idx3, value: 0,0,1,0,0, ts: 1444791952218
Merging existing cells and setting TTLs
Table contents right after the merge:
key: idxMerged, value: 0,0,1,1,1, ts: 1444791952341
key: idx1, value: 0,0,0,0,1, ts: 1444791952341
key: idx2, value: 0,0,0,1,0, ts: 1444791952341
key: idx3, value: 0,0,1,0,0, ts: 1444791952341
Table contents 10 seconds after the merge:
key: idxMerged, value: 0,0,1,1,1, ts: 1444791952341
key: idx1, value: 0,0,0,0,1, ts: 1444791952341
key: idx2, value: 0,0,0,1,0, ts: 1444791952341
key: idx3, value: 0,0,1,0,0, ts: 1444791952341
Table contents 40 seconds after the merge:
key: idxMerged, value: 0,0,1,1,1, ts: 1444791952341
key: idx1, value: 0,0,0,0,1, ts: 1444791952201
key: idx2, value: 0,0,0,1,0, ts: 1444791952214
key: idx3, value: 0,0,1,0,0, ts: 1444791952218


On Tue, Oct 13, 2015 at 8:25 PM, Ted Yu <yu...@gmail.com> wrote:

> Looks like you are using per cell TTL feature.
>
> Which hbase release are you using ?
>
> Can you formulate your description with either sequence of shell commands
> or a unit test ?
>
> Thanks
>
> On Tue, Oct 13, 2015 at 8:13 PM, Colak, Emre <em...@bina.roche.com>
> wrote:
>
> > Hi,
> >
> > I have an HBase table with the following description:
> >
> > {NAME => 'cf', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW',
> > REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE',
> > MIN_VERSIONS => '0' , TTL => 'FOREVER', KEEP_DELETED_CELLS => 'FALSE',
> > BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}
> >
> > I put some values in it and then set TTL (30s) on those values with
> another
> > put operation. First thing I notice is that the timestamps of the cells
> get
> > updated after the 2nd put. And 30 seconds later, when I do a scan on the
> > table, I still see those cells in the table, however this time with their
> > timestamps updated to the original timestamps.
> >
> > I understand that these cells won't necessarily be deleted until a
> > compaction, but why do they still come up in my scan even though the TTL
> > that I set on them has expired?
> >
> > Best,
> >
> > Emre
> >
>

Re: Cells do not get cleared after TTL is set in HBase

Posted by Ted Yu <yu...@gmail.com>.
Looks like you are using per cell TTL feature.

Which hbase release are you using ?

Can you formulate your description with either sequence of shell commands
or a unit test ?

Thanks

On Tue, Oct 13, 2015 at 8:13 PM, Colak, Emre <em...@bina.roche.com>
wrote:

> Hi,
>
> I have an HBase table with the following description:
>
> {NAME => 'cf', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW',
> REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE',
> MIN_VERSIONS => '0' , TTL => 'FOREVER', KEEP_DELETED_CELLS => 'FALSE',
> BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}
>
> I put some values in it and then set TTL (30s) on those values with another
> put operation. First thing I notice is that the timestamps of the cells get
> updated after the 2nd put. And 30 seconds later, when I do a scan on the
> table, I still see those cells in the table, however this time with their
> timestamps updated to the original timestamps.
>
> I understand that these cells won't necessarily be deleted until a
> compaction, but why do they still come up in my scan even though the TTL
> that I set on them has expired?
>
> Best,
>
> Emre
>