You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@accumulo.apache.org by mm...@apache.org on 2018/10/01 20:39:15 UTC
[accumulo-website] branch master updated: Add documentation for
crypto (#108)
This is an automated email from the ASF dual-hosted git repository.
mmiller pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/accumulo-website.git
The following commit(s) were added to refs/heads/master by this push:
new 2d51393 Add documentation for crypto (#108)
2d51393 is described below
commit 2d51393144a8664486ff315093439c997b4a01f1
Author: Mike Miller <mm...@apache.org>
AuthorDate: Mon Oct 1 16:39:02 2018 -0400
Add documentation for crypto (#108)
---
_docs-2-0/administration/crypto.md | 112 +++++++++++++++++++++++++++++++++++++
1 file changed, 112 insertions(+)
diff --git a/_docs-2-0/administration/crypto.md b/_docs-2-0/administration/crypto.md
new file mode 100644
index 0000000..74de348
--- /dev/null
+++ b/_docs-2-0/administration/crypto.md
@@ -0,0 +1,112 @@
+---
+title: On Disk Encryption
+category: administration
+order: 14
+---
+
+For an additional layer of security, Accumulo can encrypt files stored on disk. On Disk encryption was reworked
+for 2.0, making it easier to configure and more secure. The files that can be encrypted include: [RFiles][design] and Write Ahead Logs (WALs).
+For information on encrypting data over the wire see the section on [SSL]. For information on cryptographic client-server authentication see the section on [Kerberos].
+
+## Configuration
+
+To encrypt all tables on disk, encryption must be enabled before an Accumulo instance is initialized. If on disk
+encryption is enabled on an existing cluster, only files created after it is enabled will be encrypted
+(root and metadata tables will not be encrypted in this case) and existing data won't be encrypted until compaction. To configure on disk encryption, add the
+{% plink instance.crypto.service %} property to your `accumulo.properties` file. The value of this property is the
+class name of the service which will perform crypto on RFiles and WALs.
+```
+instance.crypto.service=org.apache.accumulo.core.security.crypto.impl.AESCryptoService
+```
+Out of the box, Accumulo provides the `AESCryptoService` for basic encryption needs. This class provides AES encryption
+with Galois/Counter Mode (GCM) for RFiles and Cipher Block Chaining (CBC) mode for WALs. The additional properties
+below are required by this crypto service to be set using the {% plink instance.crypto.opts.* %} prefix.
+```
+instance.crypto.opts.key.provider=uri
+instance.crypto.opts.key.location=file:///secure/path/to/crypto-key-file
+```
+The first property tells the crypto service how it will get the key encryption key. The second property tells the service
+where to find the key. For now, the only valid values are "uri" and the path to the key file. The key file can be 16 or 32 bytes.
+For example, openssl can be used to create a random 32 byte key:
+```
+openssl rand -out /path/to/keyfile 32
+```
+Initializing Accumulo after these instance properties are set, will enable on disk encryption across your entire cluster.
+
+## Custom Crypto
+
+The new crypto interface for 2.0 allows for easier custom implementation of encryption and decryption. Your
+class only has to implement the {% jlink org.apache.accumulo.core.spi.crypto.CryptoService %} interface to work with Accumulo.
+The interface has 3 methods:
+```java
+ void init(Map<String,String> conf) throws CryptoException;
+ FileEncrypter getFileEncrypter(CryptoEnvironment environment);
+ FileDecrypter getFileDecrypter(CryptoEnvironment environment);
+```
+The `init` method is where you will initialize any resources required for crypto and will get called once per Tablet Server.
+The `getFileEncrypter` method requires implementation of a {% jlink org.apache.accumulo.core.spi.crypto.FileEncrypter %}
+for encryption and the `getFileDecrypter` method requires implementation of a {% jlink org.apache.accumulo.core.spi.crypto.FileDecrypter %}
+for decryption. The `CryptoEnvironment` passed into these methods will provide the scope of the crypto.
+The FileEncrypter has two methods:
+```java
+ OutputStream encryptStream(OutputStream outputStream) throws CryptoService.CryptoException;
+ byte[] getDecryptionParameters();
+```
+The `encryptStream` method performs the encryption on the provided OutputStream and returns an OutputStream, most likely
+wrapped in at least one other OutputStream. The `getDecryptionParameters` returns a byte array of anything that will be
+required to perform decryption. The FileDecrypter only has one method:
+```java
+ InputStream decryptStream(InputStream inputStream) throws CryptoService.CryptoException;
+```
+For more help getting started see {% jlink org.apache.accumulo.core.security.crypto.impl.AESCryptoService %}.
+
+## Things to keep in mind
+
+The on disk encryption configured here is only for RFiles and Write Ahead Logs (WALs). The majority of data in Accumulo
+is written to disk with these files but there are a few scenarios that can take place where data will be unencrypted,
+even with the crypto service enabled.
+
+### Sorted WALs
+
+If a tablet server is killed with WALs enabled, Accumulo will create temporary sorted WALs during recovery that are unencrypted.
+These files will only contain recent data that has not been compacted but will be written to the disk unencrypted. Once recovery
+is finished, these unencrypted files will be removed.
+
+### Data in Memory & Logs
+
+For queries, data is decrypted when read from RFiles and cached in memory. This means that data is unencrypted in memory
+while Accumulo is running. Depending on the situation, this also means that some data can be printed to logs. A stacktrace being logged
+during an exception is one example. Accumulo developers have made sure not to expose data protected by authorizations during logging but
+its the additional data that gets encrypted on disk that could be exposed in a log file.
+
+### Bulk Import
+
+There are 2 ways to create RFiles for bulk ingest: with the [RFile API][rfile] and during Map Reduce using [AccumuloOutputFormat].
+The [RFile API][rfile] allows passing in the configuration properties for encryption mentioned above. The [AccumuloOutputFormat] does
+not allow for encryption of RFiles so any data bulk imported through this process will be unencrypted.
+
+### Zookeeper
+
+Accumulo stores a lot of metadata about the cluster in Zookeeper. Keep in mind that this metadata does not get encrypted with On Disk encryption enabled.
+
+## GCM performance
+
+The AESCryptoService uses GCM mode for RFiles. [Java 9 introduced GHASH hardware support used by GCM.](http://openjdk.java.net/jeps/246)
+
+A test was performed on a VM with 4 2.3GHz processors and 16GB of RAM. The test encrypted and decrypted arrays of size 131072 bytes 1000000 times. The results are as follows:
+
+ Java 9 GCM times:
+ Time spent encrypting: 209.210s
+ Time spent decrypting: 276.800s
+ Java 8 GCM times:
+ Time spent encrypting: 2,818.440s
+ Time spent decrypting: 2,883.960s
+
+As you can see, there is a significant performance hit when running without the GHASH CPU instruction. It is advised Java 9 or later be used when enabling encryption.
+
+
+[SSL]: {% durl administration/ssl %}
+[Kerberos]: {% durl administration/kerberos %}
+[design]: {% durl getting-started/design#rfile %}
+[rfile]: {% jurl org.apache.accumulo.core.client.rfile.RFile %}
+[AccumuloOutputFormat]: {% jurl org.apache.accumulo.core.client.mapred.AccumuloOutputFormat %}