You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@accumulo.apache.org by mm...@apache.org on 2018/10/01 20:39:15 UTC

[accumulo-website] branch master updated: Add documentation for crypto (#108)

This is an automated email from the ASF dual-hosted git repository.

mmiller pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/accumulo-website.git


The following commit(s) were added to refs/heads/master by this push:
     new 2d51393  Add documentation for crypto (#108)
2d51393 is described below

commit 2d51393144a8664486ff315093439c997b4a01f1
Author: Mike Miller <mm...@apache.org>
AuthorDate: Mon Oct 1 16:39:02 2018 -0400

    Add documentation for crypto (#108)
---
 _docs-2-0/administration/crypto.md | 112 +++++++++++++++++++++++++++++++++++++
 1 file changed, 112 insertions(+)

diff --git a/_docs-2-0/administration/crypto.md b/_docs-2-0/administration/crypto.md
new file mode 100644
index 0000000..74de348
--- /dev/null
+++ b/_docs-2-0/administration/crypto.md
@@ -0,0 +1,112 @@
+---
+title: On Disk Encryption
+category: administration
+order: 14
+---
+
+For an additional layer of security, Accumulo can encrypt files stored on disk.  On Disk encryption was reworked 
+for 2.0, making it easier to configure and more secure.  The files that can be encrypted include: [RFiles][design] and Write Ahead Logs (WALs).
+For information on encrypting data over the wire see the section on [SSL].  For information on cryptographic client-server authentication see the section on [Kerberos].
+
+## Configuration
+
+To encrypt all tables on disk, encryption must be enabled before an Accumulo instance is initialized.  If on disk 
+encryption is enabled on an existing cluster, only files created after it is enabled will be encrypted 
+(root and metadata tables will not be encrypted in this case) and existing data won't be encrypted until compaction.  To configure on disk encryption, add the 
+{% plink instance.crypto.service %} property to your `accumulo.properties` file.  The value of this property is the
+class name of the service which will perform crypto on RFiles and WALs. 
+```
+instance.crypto.service=org.apache.accumulo.core.security.crypto.impl.AESCryptoService
+```
+Out of the box, Accumulo provides the `AESCryptoService` for basic encryption needs.  This class provides AES encryption 
+with Galois/Counter Mode (GCM) for RFiles and Cipher Block Chaining (CBC) mode for WALs.  The additional properties 
+below are required by this crypto service to be set using the {% plink instance.crypto.opts.* %} prefix.
+```
+instance.crypto.opts.key.provider=uri
+instance.crypto.opts.key.location=file:///secure/path/to/crypto-key-file
+```
+The first property tells the crypto service how it will get the key encryption key.  The second property tells the service 
+where to find the key.  For now, the only valid values are "uri" and the path to the key file. The key file can be 16 or 32 bytes. 
+For example, openssl can be used to create a random 32 byte key:
+```
+openssl rand -out /path/to/keyfile 32
+```
+Initializing Accumulo after these instance properties are set, will enable on disk encryption across your entire cluster.
+
+## Custom Crypto
+
+The new crypto interface for 2.0 allows for easier custom implementation of encryption and decryption. Your
+class only has to implement the {% jlink org.apache.accumulo.core.spi.crypto.CryptoService %} interface to work with Accumulo.
+The interface has 3 methods:
+```java
+  void init(Map<String,String> conf) throws CryptoException;
+  FileEncrypter getFileEncrypter(CryptoEnvironment environment);
+  FileDecrypter getFileDecrypter(CryptoEnvironment environment);
+```
+The `init` method is where you will initialize any resources required for crypto and will get called once per Tablet Server.
+The `getFileEncrypter` method requires implementation of a {% jlink org.apache.accumulo.core.spi.crypto.FileEncrypter %} 
+for encryption and the `getFileDecrypter` method requires implementation of a {% jlink org.apache.accumulo.core.spi.crypto.FileDecrypter %} 
+for decryption. The `CryptoEnvironment` passed into these methods will provide the scope of the crypto. 
+The FileEncrypter has two methods:
+```java
+  OutputStream encryptStream(OutputStream outputStream) throws CryptoService.CryptoException;
+  byte[] getDecryptionParameters();
+```
+The `encryptStream` method performs the encryption on the provided OutputStream and returns an OutputStream, most likely 
+wrapped in at least one other OutputStream.  The `getDecryptionParameters` returns a byte array of anything that will be 
+required to perform decryption. The FileDecrypter only has one method:
+```java
+  InputStream decryptStream(InputStream inputStream) throws CryptoService.CryptoException;
+```
+For more help getting started see {% jlink org.apache.accumulo.core.security.crypto.impl.AESCryptoService %}.
+
+## Things to keep in mind
+
+The on disk encryption configured here is only for RFiles and Write Ahead Logs (WALs).  The majority of data in Accumulo
+is written to disk with these files but there are a few scenarios that can take place where data will be unencrypted, 
+even with the crypto service enabled.
+
+### Sorted WALs
+
+If a tablet server is killed with WALs enabled, Accumulo will create temporary sorted WALs during recovery that are unencrypted.  
+These files will only contain recent data that has not been compacted but will be written to the disk unencrypted. Once recovery 
+is finished, these unencrypted files will be removed.
+
+### Data in Memory & Logs
+
+For queries, data is decrypted when read from RFiles and cached in memory.  This means that data is unencrypted in memory 
+while Accumulo is running.  Depending on the situation, this also means that some data can be printed to logs. A stacktrace being logged 
+during an exception is one example. Accumulo developers have made sure not to expose data protected by authorizations during logging but 
+its the additional data that gets encrypted on disk that could be exposed in a log file. 
+
+### Bulk Import
+
+There are 2 ways to create RFiles for bulk ingest: with the [RFile API][rfile] and during Map Reduce using [AccumuloOutputFormat].  
+The [RFile API][rfile] allows passing in the configuration properties for encryption mentioned above.  The [AccumuloOutputFormat] does 
+not allow for encryption of RFiles so any data bulk imported through this process will be unencrypted.
+
+### Zookeeper
+
+Accumulo stores a lot of metadata about the cluster in Zookeeper.  Keep in mind that this metadata does not get encrypted with On Disk encryption enabled.
+
+## GCM performance
+
+The AESCryptoService uses GCM mode for RFiles. [Java 9 introduced GHASH hardware support used by GCM.](http://openjdk.java.net/jeps/246)
+
+A test was performed on a VM with 4 2.3GHz processors and 16GB of RAM. The test encrypted and decrypted arrays of size 131072 bytes 1000000 times. The results are as follows:
+
+    Java 9 GCM times:
+        Time spent encrypting:        209.210s
+        Time spent decrypting:        276.800s
+    Java 8 GCM times:
+        Time spent encrypting:        2,818.440s
+        Time spent decrypting:        2,883.960s
+
+As you can see, there is a significant performance hit when running without the GHASH CPU instruction. It is advised Java 9 or later be used when enabling encryption.
+
+
+[SSL]: {% durl administration/ssl %}
+[Kerberos]: {% durl administration/kerberos %}
+[design]: {% durl getting-started/design#rfile %}
+[rfile]: {% jurl org.apache.accumulo.core.client.rfile.RFile %}
+[AccumuloOutputFormat]: {% jurl org.apache.accumulo.core.client.mapred.AccumuloOutputFormat %}