You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Thibaut Britz <th...@trendiction.com> on 2012/01/02 11:51:04 UTC

Accessing expired data

Hi,

due to a misconfiguration on our site, some parts of our data got saved
with a wrong expiration date, which expired just recently.

How can I recover the data?
Is it sufficient to copy over a backup of the tables into the table
directory and iterate over the table (e.g. Read.ALL). Does cassandra return
expired data in this case? Or will they be silently dropped? Will the
sstable2jason output expired data?


Thanks,
Thibaut

Re: Accessing expired data

Posted by Thibaut Britz <th...@trendiction.com>.
Thanks Sylvain!

That's exactly what I needed to know.



On Mon, Jan 2, 2012 at 12:49 PM, Sylvain Lebresne <sy...@datastax.com>wrote:

> On Mon, Jan 2, 2012 at 11:51 AM, Thibaut Britz
> <th...@trendiction.com> wrote:
> > Hi,
> >
> > due to a misconfiguration on our site, some parts of our data got saved
> with
> > a wrong expiration date, which expired just recently.
> >
> > How can I recover the data?
> > Is it sufficient to copy over a backup of the tables into the table
> > directory and iterate over the table (e.g. Read.ALL). Does cassandra
> return
> > expired data in this case?
>
> It won't, unless you trick the nodes by setting their clocks in the
> past. But that is
> not something I would recommend you to do (unless you do that in some
> specific
> test cluster for that purpose only).
>
> > Or will they be silently dropped? Will the
> > sstable2jason output expired data?
>
> It will (if used on sstable that contains the data obviously). In the
> sstable2json
> output, expiring columns should look like:
>  [ column_name, column_value, column_timestamp, "e", column_ttl,
> local_expiration_time ]
> where column_ttl is the ttl you've set on the column and
> local_expiration_time is a timestamp
> of when that data will expire on the node (it's a timestamp in
> milliseconds).
>
> Using this is probably the simplest way to recover from that. A fairly
> simple option could
> be to filter that output by changing the local_expiration_time to
> whatever you want and
> use that as input for json2sstable.
>
> --
> Sylvain
>
> >
> >
> > Thanks,
> > Thibaut
> >
> >
>

Re: Accessing expired data

Posted by Sylvain Lebresne <sy...@datastax.com>.
On Mon, Jan 2, 2012 at 11:51 AM, Thibaut Britz
<th...@trendiction.com> wrote:
> Hi,
>
> due to a misconfiguration on our site, some parts of our data got saved with
> a wrong expiration date, which expired just recently.
>
> How can I recover the data?
> Is it sufficient to copy over a backup of the tables into the table
> directory and iterate over the table (e.g. Read.ALL). Does cassandra return
> expired data in this case?

It won't, unless you trick the nodes by setting their clocks in the
past. But that is
not something I would recommend you to do (unless you do that in some specific
test cluster for that purpose only).

> Or will they be silently dropped? Will the
> sstable2jason output expired data?

It will (if used on sstable that contains the data obviously). In the
sstable2json
output, expiring columns should look like:
  [ column_name, column_value, column_timestamp, "e", column_ttl,
local_expiration_time ]
where column_ttl is the ttl you've set on the column and
local_expiration_time is a timestamp
of when that data will expire on the node (it's a timestamp in milliseconds).

Using this is probably the simplest way to recover from that. A fairly
simple option could
be to filter that output by changing the local_expiration_time to
whatever you want and
use that as input for json2sstable.

--
Sylvain

>
>
> Thanks,
> Thibaut
>
>