You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Yi Liu (JIRA)" <ji...@apache.org> on 2014/05/25 15:01:03 UTC
[jira] [Commented] (HADOOP-10628) Javadoc and few code style improvement for Crypto input and output streams

    [ https://issues.apache.org/jira/browse/HADOOP-10628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14008333#comment-14008333 ] 

Yi Liu commented on HADOOP-10628:
---------------------------------

The additional javadoc and code style comments in HADOOP-10603 are as following.
-------------------------------------------------------------------------------------------
In general, please add blank lines before all block comments.

AESCTRCryptoCodec.java
+public abstract class AESCTRCryptoCodec extends CryptoCodec {
+ /**
+ * For AES, the algorithm block is fixed size of 128 bits.
+ * @see http://en.wikipedia.org/wiki/Advanced_Encryption_Standard
+ */
+ private static final int AES_BLOCK_SIZE = 16;

+ /**
+ * IV is produced by combining initial IV and the counter using addition.
+ * IV length should be the same as
{@link #AES_BLOCK_SIZE}
+ */

The IV is produced by adding the initial IV to the counter. IV length
should be the same as {@link #AES_BLOCK_SIZE}

.

+ @Override
+ public void calculateIV(byte[] initIV, long counter, byte[] IV) {
...
+ ByteBuffer buf = ByteBuffer.wrap(IV);

add a final decl.

CryptoCodec.java

+ /**
+ * Get block size of a block cipher.

Get the block size of a block cipher.

+ * For different algorithms, the block size may be different.
+ * @return int block size

@return the block size

+ * Get a
{@link #org.apache.hadoop.crypto.Encryptor}

.

s/Get a/Get an/

+ * @return Encryptor

@return the Encryptor

+ * @return Decryptor

@return the Decryptor

+ * This interface is only for Counter (CTR) mode. Typically calculating
+ * IV(Initialization Vector) is up to Encryptor or Decryptor, for
+ * example
{@link #javax.crypto.Cipher} will maintain encryption context
+ * internally when do encryption/decryption continuously using its
+ * Cipher#update interface.

This interface is only needed for AES-CTR mode. The IV is generally
calculated by the Encryptor or Decryptor and maintained as internal
state. For example, a {@link #javax.crypto.Cipher}

will maintain its
encryption context internally using the Cipher#update interface.

+ * In Hadoop, multiple nodes may read splits of a file, so decrypting of
+ * file is not continuous, even for encrypting may be not continuous. For
+ * each part, we need to calculate the counter through file position.

Encryption/Decryption is not always on the entire file. For example,
in Hadoop, a node may only decrypt a portion of a file (i.e. a
split). In these situations, the counter is derived from the file
position.

+ * <p/>
+ * Typically IV for a file position is produced by combining initial IV and
+ * the counter using any lossless operation (concatenation, addition, or XOR).
+ * @see http://en.wikipedia.org/wiki/Block_cipher_mode_of_operation#Counter_.28CTR.29

The IV can be calculated based on the file position by combining the
initial IV and the counter with a lossless operation (concatenation, addition, or XOR).
@see http://en.wikipedia.org/wiki/Block_cipher_mode_of_operation#Counter_.28CTR.29

CryptoInputStream.java

+public class CryptoInputStream extends FilterInputStream implements
+ Seekable, PositionedReadable, ByteBufferReadable, HasFileDescriptor,
+ CanSetDropBehind, CanSetReadahead, HasEnhancedByteBufferAccess {

Add a newline here please.

+ /**
+ * Whether underlying stream supports

s/Whether underlying/Whether the underlying/

+ /**
+ * Padding = pos%(algorithm blocksize); Padding is put into
{@link #inBuffer}
+ * before any other data goes in. The purpose of padding is to put input data
+ * at proper position.

s/put input data/put the input data/

+ @Override
+ public int read(byte[] b, int off, int len) throws IOException {

+ int remaining = outBuffer.remaining();

final int remaining...

+ if (usingByteBufferRead == null) {
+ if (in instanceof ByteBufferReadable) {
+ try { + n = ((ByteBufferReadable) in).read(inBuffer); + usingByteBufferRead = Boolean.TRUE; + } catch (UnsupportedOperationException e) { + usingByteBufferRead = Boolean.FALSE; + }
+ }
+ if (!usingByteBufferRead.booleanValue()) { + n = readFromUnderlyingStream(); + }
+ } else {
+ if (usingByteBufferRead.booleanValue()) { + n = ((ByteBufferReadable) in).read(inBuffer); + } else { + n = readFromUnderlyingStream(); + }
+ }

For the code above, I wonder if we shouldn't maintain a reference to
the actual ByteBuffer once it is known to be ByteBufferReadable. If
the caller switches BBs, then it is possible that this could throw a
UnsupportedOperationException. So the check would be to see if the BB
was the same one that was already known to be BBReadable, and if not,
then check it again.

+ // Read data from underlying stream.
+ private int readFromUnderlyingStream() throws IOException {
+ int toRead = inBuffer.remaining();
+ byte[] tmp = getTmpBuf();
+ int n = in.read(tmp, 0, toRead);

finals could be added to each of these decls.

+ private void decrypt() throws IOException {
+ Preconditions.checkState(inBuffer.position() >= padding);
+ if(inBuffer.position() == padding) { s/if(/if (/ + // There is no real data in inBuffer. + return; + }
+ inBuffer.flip();
+ outBuffer.clear();
+ decryptor.decrypt(inBuffer, outBuffer);
+ inBuffer.clear();
+ outBuffer.flip();
+ if (padding > 0) {
+ /**
+ * The plain text and cipher text have 1:1 mapping, they start at same
+ * position.

have a 1:1 mapping
at the same position.

+ if (decryptor.isContextReset()) {
+ /**
+ * Typically we will not get here. To improve performance in CTR mode,
+ * we rely on the decryptor maintaining context, for example calculating
+ * the counter. Unfortunately, some bad implementations can't maintain
+ * context so we need to re-init after doing decryption.

This code is generally not executed since the decryptor usually
maintains its context (e.g. the counter). However, some
implementations can't maintain context so a re-init is necessary after
each decryption call.

+ private void updateDecryptor() throws IOException {
+ long counter = streamOffset / codec.getAlgorithmBlockSize();

final

+ padding = (byte)(streamOffset % codec.getAlgorithmBlockSize());

s/) (/

+ * Reset the underlying stream offset; and clear {@link #inBuffer}

and
+ *
{@link #outBuffer}. Typically this happens when doing {@link #seek(long)}
+ * or {@link #skip(long)}.

Reset the underlying stream offset, {@link #inBuffer} and {@link #outBuffer}

. This typically happens during
{@link #seek(long)}


or
{@link #skip(long)}

operations.

+ /**
+ * Free the direct buffer manually.

Forcibly free the direct byte buffers.

+ */
+ private void freeBuffers() {
+ sun.misc.Cleaner inBufferCleaner =
+ ((sun.nio.ch.DirectBuffer) inBuffer).cleaner();
+ inBufferCleaner.clean();
+ sun.misc.Cleaner outBufferCleaner =
+ ((sun.nio.ch.DirectBuffer) outBuffer).cleaner();

add final decls.

+ // Positioned read.

// PositionedReadable

+ @Override
+ public int read(long position, byte[] buffer, int offset, int length)
+ throws IOException {
+ checkStream();
+ try {
+ int n = ((PositionedReadable) in).read(position, buffer, offset, length);

final

+ if (n > 0) {
+ /**
+ * Since this operation does not change the current offset of a file,
+ * streamOffset should be not changed and we need to restore the

s/be not/not be/
s/changed and we/changed. We/

+ /**
+ * Decrypt given length of data in buffer: start from offset.
+ * Output is also buffer and start from same offset. Restore the
+ *
{@link #decryptor} and {@link #outBuffer} after decryption.

Decrypt length bytes in buffer starting at offset. Restore the {@link #decryptor}

and
{@link #outBuffer}

after the decryption.

+ private void decrypt(long position, byte[] buffer, int offset, int length)
+ throws IOException {
+
+ byte[] tmp = getTmpBuf();
+ int unread = outBuffer.remaining();

final decls for tmp and unread.

+ if (unread > 0)
{ // Cache outBuffer + outBuffer.get(tmp, 0, unread); + }

+ long curOffset = streamOffset;

Why create curOffset at all? It looks like it is only used to pass to
resetStreamOffset below.

+ resetStreamOffset(position);
+
+ int n = 0;
+ while (n < length) {
+ int toDecrypt = Math.min(length - n, inBuffer.remaining());

final

+ * Since this operation does not change the current offset of a file,
+ * streamOffset should be not changed and we need to restore the decryptor
+ * and outBuffer after decryption.

Since this operation does not change the current offset of the file,
streamOffset should be not changed. We need to restore the decryptor
and outBuffer after decryption.

// Seekable
+ @Override
+ public void seek(long pos) throws IOException {
+ Preconditions.checkArgument(pos >= 0, "Cannot seek to negative offset.");
+ checkStream();
+ try {
+ // If target pos we have already read and decrypt.

This comment is confusing. Could you please reword it?

+ // Skip n bytes
+ @Override
+ public long skip(long n) throws IOException
{ ... + int pos = outBuffer.position() + (int) n; final + outBuffer.position(pos); + return n; + }

else {
+ /**
+ * Subtract outBuffer.remaining() to see how many bytes we need to
+ * skip in underlying stream. We get real skipped bytes number of
+ * underlying stream then add outBuffer.remaining() to get skipped
+ * bytes number from user's view.

Subtract outBuffer.remaining() to see how many bytes we need to
skip in the underlying stream. Add outBuffer.remaining() to the actual
number of skipped bytes in the underlying stream to get the number of
skipped bytes from the user's point of view.

+ // ByteBuffer read.
+ @Override
+ public int read(ByteBuffer buf) throws IOException {
+ checkStream();
+ if (in instanceof ByteBufferReadable) {

It would be good if we didn't have to do this instanceof check here.

+ int unread = outBuffer.remaining();

final

+ if (unread > 0) { // Have unread decrypted data in buffer.
+ int toRead = buf.remaining();

final

+ if (toRead <= unread) {
+ int limit = outBuffer.limit();

final

+ int pos = buf.position();
+ int n = ((ByteBufferReadable) in).read(buf);

final

+ private void decrypt(ByteBuffer buf, int n, int start)
+ throws IOException {
+ int pos = buf.position();
+ int limit = buf.limit();

final

+ @Override
+ public ByteBuffer read(ByteBufferPool bufferPool, int maxLength,
+ EnumSet<ReadOption> opts) throws IOException,
+ UnsupportedOperationException {
...
+ ByteBuffer buffer = ((HasEnhancedByteBufferAccess) in).
+ read(bufferPool, maxLength, opts);

final

+ if (buffer != null) {
+ int n = buffer.remaining();

final

+ if (n > 0) {
+ streamOffset += buffer.remaining(); // Read n bytes
+ int pos = buffer.position();

final

CryptoOutputStream.java:

+ private void encrypt() throws IOException {
...
+ * The plain text and cipher text have 1:1 mapping, they start at same
+ * position.

The plain and cipher texts have a 1:1 mapping. They start at the same position.

+ * We will generally not get here. For CTR mode, to improve
+ * performance, we rely on the encryptor maintaining context, for
+ * example to calculate the counter. But some bad implementations
+ * can't maintain context, and need us to re-init after doing
+ * encryption.

This code is generally not executed since the encryptor usually
maintains its context (e.g. the counter). However, some
implementations can't maintain context so a re-init is necessary after
each encryption call.

+ /**
+ * Update the
{@link #encryptor}

: calculate counter and
{@link #padding}

.
+ */
+ private void updateEncryptor() throws IOException {
+ long counter = streamOffset / codec.getAlgorithmBlockSize();

final

+ padding = (byte)(streamOffset % codec.getAlgorithmBlockSize());

s/) (/

+ * Free the direct buffer manually.

Forcibly free the direct byte buffers.

+ /**
+ * To flush, we need to encrypt the data in buffer and write to underlying
+ * stream, then do the flush.

s/data in buffer/data in the buffer/
s/write to underlying/write to the underlying/

+ public void write(int b) throws IOException {
+ oneByteBuf[0] = (byte)(b & 0xff);

s/) (/

Decryptor.java:

+ /**
+ * Initialize the decryptor, the internal decryption context will be

s/decryptor, the/decryptor. The/

+ /**
+ * Indicate whether decryption context is reset.

s/whether decryption/whether the decryption/

+ * It's useful for some mode like CTR which requires different IV for
+ * different parts of data. Usually decryptor can maintain the context
+ * internally such as calculating IV/counter, then continue a multiple-part
+ * decryption operation without reinit the decryptor using key and the new
+ * IV. For mode like CTR, if context is reset after each decryption, the
+ * decryptor should be reinit before each operation, that's not efficient.

Certain algorithms, like CTR, require a different IV depending on the
position in the stream. Generally, the decryptor maintains any
necessary context for calculating the IV and counter so that no reset
is necessary during the decryption. Resetting before each operation in
CTR is very inefficient.

+ /**
+ * This exposes a direct interface for record decryption with direct byte
+ * buffers.

This presents a direct interface decrypting with direct ByteBuffers.

+ * The decrypt() function need not always consume the buffers provided,
+ * it will need to be called multiple times to decrypt an entire buffer
+ * and the object will hold the decryption context internally.

decrypt() does not always decrypt the entire buffer and may
potentially need to be called multipe times to process an entire
buffer.

+ * Some implementation may need enough space in the destination buffer to
+ * decrypt an entire input.

Some implementations may require sufficient space in the destination
buffer to decrypt the entire input buffer.

+ * The end result will move inBuffer.position() by the bytes-read and
+ * outBuffer.position() by the bytes-written. It should not modify the
+ * inBuffer.limit() or outBuffer.limit() to maintain consistency of operation.

Upon return, inBuffer.position() will be advanced by the number of
bytes read and outBuffer.position() by bytes written. Implementations
should not modify the limit()s on either buffer.

+ * @param inBuffer in direct
{@link ByteBuffer} for reading from. Requires
+ * inBuffer != null and inBuffer.remaining() > 0

@param inBuffer a direct {@link ByteBuffer}

to read from. inBuffer may
not be null and inBuffer.remaining() must be > 0.

+ * @param outBuffer out direct
{@link ByteBuffer} for storing the results
+ * into. Requires outBuffer != null and outBuffer.remaining() > 0

@param outBuffer a direct {@link ByteBuffer}

to write to. outBuffer may
not be null and outBuffer.remaining() must be > 0.

Encryptor.java:

+ /**
+ * Initialize the encryptor, the internal encryption context will be
+ * reset.

Initialize the encryptor and reset the internal encryption context.

+ /**
+ * Indicate whether encryption context is reset.
+ * <p/>
+ * It's useful for some mode like CTR which requires different IV for
+ * different parts of data. Usually encryptor can maintain the context
+ * internally such as calculating IV/counter, then continue a multiple-part
+ * encryption operation without reinit the encryptor using key and the new
+ * IV. For mode like CTR, if context is reset after each encryption, the
+ * encryptor should be reinit before each operation, that's not efficient.

Certain algorithms, like CTR, require a different IV depending on the
position in the stream. Generally, the encryptor maintains any
necessary context for calculating the IV and counter so that no reset
is necessary during the encryption. Resetting before each operation in
CTR is very inefficient.

+ * @return boolean whether context is reset.

@return whether the context is reset.

+ /**
+ * This exposes a direct interface for record encryption with direct byte
+ * buffers.

This presents a direct interface encrypting with direct ByteBuffers.

+ * <p/>
+ * The encrypt() function need not always consume the buffers provided,
+ * it will need to be called multiple times to encrypt an entire buffer
+ * and the object will hold the encryption context internally.

encrypt() does not always encrypt the entire buffer and may
potentially need to be called multipe times to process an entire
buffer.

+ * <p/>
+ * Some implementation may need enough space in the destination buffer to
+ * encrypt an entire input.

Some implementations may require sufficient space in the destination
buffer to encrypt the entire input buffer.

+ * <p/>
+ * The end result will move inBuffer.position() by the bytes-read and
+ * outBuffer.position() by the bytes-written. It should not modify the
+ * inBuffer.limit() or outBuffer.limit() to maintain consistency of operation.

Upon return, inBuffer.position() will be advanced by the number of
bytes read and outBuffer.position() by bytes written. Implementations
should not modify the limit()s on either buffer.

+ * <p/>
+ * @param inBuffer in direct
{@link ByteBuffer} for reading from. Requires
+ * inBuffer != null and inBuffer.remaining() > 0

@param inBuffer a direct {@link ByteBuffer}

to read from. inBuffer may
not be null and inBuffer.remaining() must be > 0.

+ * @param outBuffer out direct
{@link ByteBuffer} for storing the results
+ * into. Requires outBuffer != null and outBuffer.remaining() > 0

@param outBuffer a direct {@link ByteBuffer}

to write to. outBuffer may
not be null and outBuffer.remaining() must be > 0.

JCEAESCTRDecryptor.java:

+ /**
+ * For AES-CTR, will consume all input data and needs enough space in the
+ * destination buffer to decrypt entire input data.

AES-CTR will consume all of the input data. It requires enough space
in the destination buffer to decrypt the entire input buffer.

JCEAESCTREncryptor.java:

+ /**
+ * For AES-CTR, will consume all input data and needs enough space in the
+ * destination buffer to encrypt entire input data.

AES-CTR will consume all of the input data. It requires enough space
in the destination buffer to encrypt the entire input buffer.

core-default.xml:

+<property>
+ <name>hadoop.security.crypto.buffer.size</name>
+ <value>8192</value>
+ <description>
+ The buffer size used in Crypto InputStream and OutputStream, and default
+ value is 8192.

The buffer size used by CryptoInputStream and CryptoOutputStream. [you
don't need to say what the default value is since it's right above it].

CryptoStreamsTestBase.java:

+public abstract class CryptoStreamsTestBase {
+ protected static final Log LOG= LogFactory.getLog(

s/LOG= /LOG = /

+ public void setUp() throws IOException {
+ // Generate data
+ int seed = new Random().nextInt();
+ DataOutputBuffer dataBuf = new DataOutputBuffer();
+ RandomDatum.Generator generator = new RandomDatum.Generator(seed);

final

+ for(int i=0; i < count; ++i) {
s/for(/for (/
s/i=0/i = 0/
+ generator.next();
+ RandomDatum key = generator.getKey();
+ RandomDatum value = generator.getValue();

final

+ private void verify(InputStream in, int bytesToVerify,
+ byte[] expectedBytes) throws IOException {
+ byte[] readBuf = new byte[bytesToVerify];
+ readAll(in, readBuf, 0, bytesToVerify);
+ for (int i=0; i<bytesToVerify; i++) {

s/=/ = /
s/</ < /

+ final int len1 = dataLen/4;

s|/| / |
+ // Read len1 bytes
+ byte [] readData = new byte[len1];

s/byte []/byte[]/

+ @Test(timeout=120000)
+ public void testSeek() throws Exception {

Do you have a test that verifies that seek with an exception unwinds
the seek position correctly? The contract is to not move the position
if an exception occurs.

+ @Test(timeout=120000)
+ public void testCombinedOp() throws Exception {
+ OutputStream out = getOutputStream(defaultBufferSize);
+ writeData(out);
+
+ final int len1 = dataLen/8;
+ final int len2 = dataLen/10;

s|/| / |

+ @Test(timeout=120000)
+ public void testHasEnhancedByteBufferAccess() throws Exception {
+ OutputStream out = getOutputStream(defaultBufferSize);
+ writeData(out);
+
+ InputStream in = getInputStream(defaultBufferSize);
+ final int len1 = dataLen/8;

s|/| / |

> Javadoc and few code style improvement for Crypto input and output streams
> --------------------------------------------------------------------------
>
>                 Key: HADOOP-10628
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10628
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: security
>    Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134)
>            Reporter: Yi Liu
>            Assignee: Yi Liu
>             Fix For: fs-encryption (HADOOP-10150 and HDFS-6134)
>
>
> There are some additional comments from [~clamb] related to javadoc and few code style on HADOOP-10603, let's fix them in this follow-on JIRA.



--
This message was sent by Atlassian JIRA
(v6.2#6252)