You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by Matt Fellows <ma...@bespokesoftware.com> on 2013/08/15 14:31:46 UTC

Secure deletion of blocks

Hi,
I'm looking into writing a patch for HDFS which will provide a new method
within HDFS which can securely delete the contents of a block on all the
nodes upon which it exists. By securely delete I mean, overwrite with
1's/0's/random data cyclically such that the data could not be recovered
forensically.

I'm not currently aware of any existing code / methods which provide this,
so was going to implement this myself.

I figured the DataNode.java was probably the place to start looking into
how this could be done, so I've read the source for this, but it's not
really enlightened me a massive amount.

I'm assuming I need to tell the NameServer that all DataNodes with a
particular block id would be required to be deleted, then as each DataNode
calls home, the DataNode would be instructed to securely delete the
relevant block, and it would oblige.

Unfortunately I have no idea where to begin and was looking for some
pointers?

I guess specifically I'd like to know:

1. Where the hdfs CLI commands are implemented
2. How a DataNode identifies a block / how a NameServer could inform a
DataNode to delete a block
3. Where the existing "delete" is implemented so I can make sure my secure
delete makes use of it after successfully blanking the block contents
4. If I've got the right idea about this at all?

Kind regards,
Matt Fellows

-- 
[image: cid:1CBF4038-3F0F-4FC2-A1FF-6DC81B8B6F94]
 First Option Software Ltd
Signal House
Jacklyns Lane
Alresford
SO24 9JJ
Tel: +44 (0)1962 738232
Mob: +44 (0)7710 160458
Fax: +44 (0)1962 600112
Web: www.b <http://www.fosolutions.co.uk/>espokesoftware.com<http://bespokesoftware.com/>

-- 
____________________________________________________

This is confidential, non-binding and not company endorsed - see full terms 
at www.fosolutions.co.uk/emailpolicy.html 
First Option Software Ltd Registered No. 06340261
Signal House, Jacklyns Lane, Alresford, Hampshire, SO24 9JJ, U.K.
____________________________________________________


Re: Secure deletion of blocks

Posted by Todd Lipcon <to...@cloudera.com>.
Hi Matt,

I'd also recommend implementing this in a somewhat pluggable way -- eg a
configuration for a Deleter class. The default Deleter can be the one we
use today which just removes the file, and you could plug in a
SecureDeleter. I'd also see some use cases for a Deleter implementation
which doesn't actually delete the block, but instead moves it to a local
trash directory which is deleted a day or two later. This sort of policy
can help recover data as a last ditch effort if there is some kind of
accidental deletion and there aren't snapshots in place.

-Todd

On Thu, Aug 15, 2013 at 11:50 AM, Andrew Wang <an...@cloudera.com>wrote:

> Hi Matt,
>
> Here are some code pointers:
>
> - When doing a file deletion, the NameNode turns the file into a set of
> blocks that need to be deleted.
> - When datanodes heartbeat in to the NN (see BPServiceActor#offerService),
> the NN replies with blocks to be invalidated (see BlockCommand and
> DatanodeProtocol.DNA_INVALIDATE).
> - The DN processes these invalidates in
> BPServiceActor#processCommandFromActive (look for DNA_INVALIDATE again).
> - The magic lines you're looking for are probably in
> FsDatasetAsyncDiskService#run, since we delete blocks in the background
>
> Best,
> Andrew
>
>
> On Thu, Aug 15, 2013 at 5:31 AM, Matt Fellows <
> matt.fellows@bespokesoftware.com> wrote:
>
> > Hi,
> > I'm looking into writing a patch for HDFS which will provide a new method
> > within HDFS which can securely delete the contents of a block on all the
> > nodes upon which it exists. By securely delete I mean, overwrite with
> > 1's/0's/random data cyclically such that the data could not be recovered
> > forensically.
> >
> > I'm not currently aware of any existing code / methods which provide
> this,
> > so was going to implement this myself.
> >
> > I figured the DataNode.java was probably the place to start looking into
> > how this could be done, so I've read the source for this, but it's not
> > really enlightened me a massive amount.
> >
> > I'm assuming I need to tell the NameServer that all DataNodes with a
> > particular block id would be required to be deleted, then as each
> DataNode
> > calls home, the DataNode would be instructed to securely delete the
> > relevant block, and it would oblige.
> >
> > Unfortunately I have no idea where to begin and was looking for some
> > pointers?
> >
> > I guess specifically I'd like to know:
> >
> > 1. Where the hdfs CLI commands are implemented
> > 2. How a DataNode identifies a block / how a NameServer could inform a
> > DataNode to delete a block
> > 3. Where the existing "delete" is implemented so I can make sure my
> secure
> > delete makes use of it after successfully blanking the block contents
> > 4. If I've got the right idea about this at all?
> >
> > Kind regards,
> > Matt Fellows
> >
> > --
> > [image: cid:1CBF4038-3F0F-4FC2-A1FF-6DC81B8B6F94]
> >  First Option Software Ltd
> > Signal House
> > Jacklyns Lane
> > Alresford
> > SO24 9JJ
> > Tel: +44 (0)1962 738232
> > Mob: +44 (0)7710 160458
> > Fax: +44 (0)1962 600112
> > Web: www.b <http://www.fosolutions.co.uk/>espokesoftware.com<
> http://bespokesoftware.com/>
> >
> > ______________________________**______________________
> >
> > This is confidential, non-binding and not company endorsed - see full
> > terms at www.fosolutions.co.uk/**emailpolicy.html<
> http://www.fosolutions.co.uk/emailpolicy.html>
> >
> > First Option Software Ltd Registered No. 06340261
> > Signal House, Jacklyns Lane, Alresford, Hampshire, SO24 9JJ, U.K.
> > ______________________________**______________________
> >
> >
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Secure deletion of blocks

Posted by Andrew Wang <an...@cloudera.com>.
Hi Matt,

Here are some code pointers:

- When doing a file deletion, the NameNode turns the file into a set of
blocks that need to be deleted.
- When datanodes heartbeat in to the NN (see BPServiceActor#offerService),
the NN replies with blocks to be invalidated (see BlockCommand and
DatanodeProtocol.DNA_INVALIDATE).
- The DN processes these invalidates in
BPServiceActor#processCommandFromActive (look for DNA_INVALIDATE again).
- The magic lines you're looking for are probably in
FsDatasetAsyncDiskService#run, since we delete blocks in the background

Best,
Andrew


On Thu, Aug 15, 2013 at 5:31 AM, Matt Fellows <
matt.fellows@bespokesoftware.com> wrote:

> Hi,
> I'm looking into writing a patch for HDFS which will provide a new method
> within HDFS which can securely delete the contents of a block on all the
> nodes upon which it exists. By securely delete I mean, overwrite with
> 1's/0's/random data cyclically such that the data could not be recovered
> forensically.
>
> I'm not currently aware of any existing code / methods which provide this,
> so was going to implement this myself.
>
> I figured the DataNode.java was probably the place to start looking into
> how this could be done, so I've read the source for this, but it's not
> really enlightened me a massive amount.
>
> I'm assuming I need to tell the NameServer that all DataNodes with a
> particular block id would be required to be deleted, then as each DataNode
> calls home, the DataNode would be instructed to securely delete the
> relevant block, and it would oblige.
>
> Unfortunately I have no idea where to begin and was looking for some
> pointers?
>
> I guess specifically I'd like to know:
>
> 1. Where the hdfs CLI commands are implemented
> 2. How a DataNode identifies a block / how a NameServer could inform a
> DataNode to delete a block
> 3. Where the existing "delete" is implemented so I can make sure my secure
> delete makes use of it after successfully blanking the block contents
> 4. If I've got the right idea about this at all?
>
> Kind regards,
> Matt Fellows
>
> --
> [image: cid:1CBF4038-3F0F-4FC2-A1FF-6DC81B8B6F94]
>  First Option Software Ltd
> Signal House
> Jacklyns Lane
> Alresford
> SO24 9JJ
> Tel: +44 (0)1962 738232
> Mob: +44 (0)7710 160458
> Fax: +44 (0)1962 600112
> Web: www.b <http://www.fosolutions.co.uk/>espokesoftware.com<http://bespokesoftware.com/>
>
> ______________________________**______________________
>
> This is confidential, non-binding and not company endorsed - see full
> terms at www.fosolutions.co.uk/**emailpolicy.html<http://www.fosolutions.co.uk/emailpolicy.html>
>
> First Option Software Ltd Registered No. 06340261
> Signal House, Jacklyns Lane, Alresford, Hampshire, SO24 9JJ, U.K.
> ______________________________**______________________
>
>

Re: Secure deletion of blocks

Posted by Matt Fellows <ma...@bespokesoftware.com>.
Thanks for the heads up, but I think I've managed to implement it crudely by overwriting sequentially with 1s, 0s and random bytes and tested it successfully on an ext4 partition. 

    
      


    I tested it by dd-ing the entire partition to a file, confirming a particular string was not present with strings, uploaded a large file with a chosen string repeated in it many times, dd'd the partition to confirm it was present, issued a delete, repeated the test and confirmed it had been removed. 

    
      


    I'm sure some journal information may be leaked, but the entire block can't be reconstructed from the journal else your disk would be halved in useable size right?

    —
Sent from Mailbox for iPhone

On Tue, Aug 20, 2013 at 8:43 PM, Colin McCabe <cm...@alumni.cmu.edu>
wrote:

>> If I've got the right idea about this at all?
> From the man page for wipe(1);
> "Journaling filesystems (such as Ext3 or ReiserFS) are now being used by
> default by most Linux distributions. No secure deletion program that does
> filesystem-level calls can sanitize files on such filesystems, because
> sensitive data and metadata can be written to the journal, which cannot be
> readily accessed. Per-file secure deletion is better implemented in the
> operating system."
> You might be able to work around this by turning off the journal on these
> filesystems.  But even then, you've got issues like the drive remapping bad
> sectors (and leaving around the old ones), flash firmware that is unable to
> erase less than an erase block, etc.
> The simplest solution is probably just to use full-disk encryption.  Then
> you don't need any code changes at all.
> Doing something like invoking shred on the block files could improve
> security somewhat, but it's not going to work all the time.
> Colin
> On Thu, Aug 15, 2013 at 5:31 AM, Matt Fellows <
> matt.fellows@bespokesoftware.com> wrote:
>> Hi,
>> I'm looking into writing a patch for HDFS which will provide a new method
>> within HDFS which can securely delete the contents of a block on all the
>> nodes upon which it exists. By securely delete I mean, overwrite with
>> 1's/0's/random data cyclically such that the data could not be recovered
>> forensically.
>>
>> I'm not currently aware of any existing code / methods which provide this,
>> so was going to implement this myself.
>>
>> I figured the DataNode.java was probably the place to start looking into
>> how this could be done, so I've read the source for this, but it's not
>> really enlightened me a massive amount.
>>
>> I'm assuming I need to tell the NameServer that all DataNodes with a
>> particular block id would be required to be deleted, then as each DataNode
>> calls home, the DataNode would be instructed to securely delete the
>> relevant block, and it would oblige.
>>
>> Unfortunately I have no idea where to begin and was looking for some
>> pointers?
>>
>> I guess specifically I'd like to know:
>>
>> 1. Where the hdfs CLI commands are implemented
>> 2. How a DataNode identifies a block / how a NameServer could inform a
>> DataNode to delete a block
>> 3. Where the existing "delete" is implemented so I can make sure my secure
>> delete makes use of it after successfully blanking the block contents
>> 4. If I've got the right idea about this at all?
>>
>> Kind regards,
>> Matt Fellows
>>
>> --
>> [image: cid:1CBF4038-3F0F-4FC2-A1FF-6DC81B8B6F94]
>>  First Option Software Ltd
>> Signal House
>> Jacklyns Lane
>> Alresford
>> SO24 9JJ
>> Tel: +44 (0)1962 738232
>> Mob: +44 (0)7710 160458
>> Fax: +44 (0)1962 600112
>> Web: www.b <http://www.fosolutions.co.uk/>espokesoftware.com<http://bespokesoftware.com/>
>>
>> ______________________________**______________________
>>
>> This is confidential, non-binding and not company endorsed - see full
>> terms at www.fosolutions.co.uk/**emailpolicy.html<http://www.fosolutions.co.uk/emailpolicy.html>
>>
>> First Option Software Ltd Registered No. 06340261
>> Signal House, Jacklyns Lane, Alresford, Hampshire, SO24 9JJ, U.K.
>> ______________________________**______________________
>>
>>
-- 
____________________________________________________

This is confidential, non-binding and not company endorsed - see full terms 
at www.fosolutions.co.uk/emailpolicy.html 
First Option Software Ltd Registered No. 06340261
Signal House, Jacklyns Lane, Alresford, Hampshire, SO24 9JJ, U.K.
____________________________________________________


Re: Secure deletion of blocks

Posted by Colin McCabe <cm...@alumni.cmu.edu>.
Just to clarify, ext4 has the option to turn off journalling.  ext3 does
not.  Not sure about reiser.

Colin


On Tue, Aug 20, 2013 at 12:42 PM, Colin McCabe <cm...@alumni.cmu.edu>wrote:

> > If I've got the right idea about this at all?
>
> From the man page for wipe(1);
>
> "Journaling filesystems (such as Ext3 or ReiserFS) are now being used by
> default by most Linux distributions. No secure deletion program that does
> filesystem-level calls can sanitize files on such filesystems, because
> sensitive data and metadata can be written to the journal, which cannot be
> readily accessed. Per-file secure deletion is better implemented in the
> operating system."
>
> You might be able to work around this by turning off the journal on these
> filesystems.  But even then, you've got issues like the drive remapping bad
> sectors (and leaving around the old ones), flash firmware that is unable to
> erase less than an erase block, etc.
>
> The simplest solution is probably just to use full-disk encryption.  Then
> you don't need any code changes at all.
>
> Doing something like invoking shred on the block files could improve
> security somewhat, but it's not going to work all the time.
>
> Colin
>
>
> On Thu, Aug 15, 2013 at 5:31 AM, Matt Fellows <
> matt.fellows@bespokesoftware.com> wrote:
>
>> Hi,
>> I'm looking into writing a patch for HDFS which will provide a new method
>> within HDFS which can securely delete the contents of a block on all the
>> nodes upon which it exists. By securely delete I mean, overwrite with
>> 1's/0's/random data cyclically such that the data could not be recovered
>> forensically.
>>
>> I'm not currently aware of any existing code / methods which provide
>> this, so was going to implement this myself.
>>
>> I figured the DataNode.java was probably the place to start looking into
>> how this could be done, so I've read the source for this, but it's not
>> really enlightened me a massive amount.
>>
>> I'm assuming I need to tell the NameServer that all DataNodes with a
>> particular block id would be required to be deleted, then as each DataNode
>> calls home, the DataNode would be instructed to securely delete the
>> relevant block, and it would oblige.
>>
>> Unfortunately I have no idea where to begin and was looking for some
>> pointers?
>>
>> I guess specifically I'd like to know:
>>
>> 1. Where the hdfs CLI commands are implemented
>> 2. How a DataNode identifies a block / how a NameServer could inform a
>> DataNode to delete a block
>> 3. Where the existing "delete" is implemented so I can make sure my
>> secure delete makes use of it after successfully blanking the block contents
>> 4. If I've got the right idea about this at all?
>>
>> Kind regards,
>> Matt Fellows
>>
>> --
>> [image: cid:1CBF4038-3F0F-4FC2-A1FF-6DC81B8B6F94]
>>  First Option Software Ltd
>> Signal House
>> Jacklyns Lane
>> Alresford
>> SO24 9JJ
>> Tel: +44 (0)1962 738232
>> Mob: +44 (0)7710 160458
>> Fax: +44 (0)1962 600112
>> Web: www.b <http://www.fosolutions.co.uk/>espokesoftware.com<http://bespokesoftware.com/>
>>
>> ______________________________**______________________
>>
>> This is confidential, non-binding and not company endorsed - see full
>> terms at www.fosolutions.co.uk/**emailpolicy.html<http://www.fosolutions.co.uk/emailpolicy.html>
>>
>> First Option Software Ltd Registered No. 06340261
>> Signal House, Jacklyns Lane, Alresford, Hampshire, SO24 9JJ, U.K.
>> ______________________________**______________________
>>
>>
>

Re: Secure deletion of blocks

Posted by Colin McCabe <cm...@alumni.cmu.edu>.
> If I've got the right idea about this at all?

>From the man page for wipe(1);

"Journaling filesystems (such as Ext3 or ReiserFS) are now being used by
default by most Linux distributions. No secure deletion program that does
filesystem-level calls can sanitize files on such filesystems, because
sensitive data and metadata can be written to the journal, which cannot be
readily accessed. Per-file secure deletion is better implemented in the
operating system."

You might be able to work around this by turning off the journal on these
filesystems.  But even then, you've got issues like the drive remapping bad
sectors (and leaving around the old ones), flash firmware that is unable to
erase less than an erase block, etc.

The simplest solution is probably just to use full-disk encryption.  Then
you don't need any code changes at all.

Doing something like invoking shred on the block files could improve
security somewhat, but it's not going to work all the time.

Colin


On Thu, Aug 15, 2013 at 5:31 AM, Matt Fellows <
matt.fellows@bespokesoftware.com> wrote:

> Hi,
> I'm looking into writing a patch for HDFS which will provide a new method
> within HDFS which can securely delete the contents of a block on all the
> nodes upon which it exists. By securely delete I mean, overwrite with
> 1's/0's/random data cyclically such that the data could not be recovered
> forensically.
>
> I'm not currently aware of any existing code / methods which provide this,
> so was going to implement this myself.
>
> I figured the DataNode.java was probably the place to start looking into
> how this could be done, so I've read the source for this, but it's not
> really enlightened me a massive amount.
>
> I'm assuming I need to tell the NameServer that all DataNodes with a
> particular block id would be required to be deleted, then as each DataNode
> calls home, the DataNode would be instructed to securely delete the
> relevant block, and it would oblige.
>
> Unfortunately I have no idea where to begin and was looking for some
> pointers?
>
> I guess specifically I'd like to know:
>
> 1. Where the hdfs CLI commands are implemented
> 2. How a DataNode identifies a block / how a NameServer could inform a
> DataNode to delete a block
> 3. Where the existing "delete" is implemented so I can make sure my secure
> delete makes use of it after successfully blanking the block contents
> 4. If I've got the right idea about this at all?
>
> Kind regards,
> Matt Fellows
>
> --
> [image: cid:1CBF4038-3F0F-4FC2-A1FF-6DC81B8B6F94]
>  First Option Software Ltd
> Signal House
> Jacklyns Lane
> Alresford
> SO24 9JJ
> Tel: +44 (0)1962 738232
> Mob: +44 (0)7710 160458
> Fax: +44 (0)1962 600112
> Web: www.b <http://www.fosolutions.co.uk/>espokesoftware.com<http://bespokesoftware.com/>
>
> ______________________________**______________________
>
> This is confidential, non-binding and not company endorsed - see full
> terms at www.fosolutions.co.uk/**emailpolicy.html<http://www.fosolutions.co.uk/emailpolicy.html>
>
> First Option Software Ltd Registered No. 06340261
> Signal House, Jacklyns Lane, Alresford, Hampshire, SO24 9JJ, U.K.
> ______________________________**______________________
>
>