You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@fluo.apache.org by keith-turner <gi...@git.apache.org> on 2016/11/09 20:24:18 UTC

[GitHub] incubator-fluo-website pull request #41: Blog post about immutable bytes

GitHub user keith-turner opened a pull request:

    https://github.com/apache/incubator-fluo-website/pull/41

    Blog post about immutable bytes

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/keith-turner/incubator-fluo-website jibbs

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-fluo-website/pull/41.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #41
    
----
commit 0df0f830d9f115b019fcad96389ce19cc601b149
Author: Keith Turner <kt...@apache.org>
Date:   2016-11-09T20:21:34Z

    Blog post about immutable bytes

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-fluo-website pull request #41: Blog post about immutable bytes

Posted by ctubbsii <gi...@git.apache.org>.
Github user ctubbsii commented on a diff in the pull request:

    https://github.com/apache/incubator-fluo-website/pull/41#discussion_r87273916
  
    --- Diff: _posts/blog/2016-11-09-immutable-bytes.md ---
    @@ -0,0 +1,180 @@
    +---
    +title: "Java needs an immutable byte string"
    +date: 2016-11-09 11:43:00 +0000
    +author: Keith Turner
    +---
    +
    +## Byte Sequences in Java
    +
    +Working with byte arrays in Java can be painful.  To work around this Fluo created [Bytes] and
    +[BytesBuilder].  Bytes is an immutable wrapper around a byte array.  Bytes has good implementations
    +of `hashCode()`, `equals()`, and `compareTo()`.  These functions and its immutability make it
    +suitable to use as a map key.  If you have ever tried to use a byte array as a map key in Java you
    +quickly realize that you need to write a wrapper class.
    +
    +Fluo should not have to create these classes. Java really should offer something out the box as part
    +of its standard library.  However it does not offer any good class for this common case.
    +
    +## Why not use String?
    +
    +Trying to stuff arbitrary binary data in a String can corrupt the data.  The following little
    +program shows this, it will print `false`.
    +
    +```java
    +    byte bytes1[] = new byte[256];
    +    for(int i = 0; i<255; i++)
    +      bytes1[i] = (byte)i;
    +
    +    byte bytes2[] = new String(bytes1).getBytes();
    +    
    +    System.out.println(Arrays.equals(bytes1, bytes2));
    +```
    +
    +String can be made to work by specifying a character set. The following program will print `true`.
    +However this is error prone and inefficient.  Using this method results in copying between byte arrays
    +and internal string char arrays.
    +
    +```java
    +    byte bytes1[] = new byte[256];
    +    for(int i = 0; i<255; i++)
    +      bytes1[i] = (byte)i;
    +
    +    String str = new String(bytes1, StandardCharsets.ISO_8859_1);
    +    byte bytes2[] = str.getBytes(StandardCharsets.ISO_8859_1);
    +    
    +    System.out.println(Arrays.equals(bytes1, bytes2));
    +```
    +
    +## Why not use ByteBuffer?
    +  
    +A read only ByteBuffer might seem like it would fit the bill of an immutable byte array wrapper.
    +However, the following program shows two ways that ByteBuffer falls short.  ByteBuffers are great
    +for I/O, but it would not be prudent to use them as map keys.
    +
    +```java
    +    byte[] bytes1 = new byte[] {1,2,3,(byte)250};
    +    ByteBuffer bb1 = ByteBuffer.wrap(bytes1).asReadOnlyBuffer();
    +
    +    System.out.println(bb1.hashCode());
    +    bytes1[2]=89;
    +    System.out.println(bb1.hashCode());
    +    bb1.get();
    +    System.out.println(bb1.hashCode());
    +```
    +
    +The program above prints the following, which is less than ideal when using a
    +ByteBuffer as a HashMap key :
    +
    +```
    +747721
    +830367
    +26786
    +```
    +
    +This little program shows two things.  First, the only guarantee we are getting from
    +`asReadOnlyBuffer()` is that `bb1` can not be used to modify `bytes1`.  However, the originator of
    +the read only buffer can still modify the wrapped byte array.   Java's String and Fluo's Bytes avoid
    +this by always copying data into an internal private array that never escapes.
    +
    +The second issue is that `bb1` has a position and calling `bb1.get()` changes this position.
    +Changing the position conceptually changes the contents of the ByteBuffer.  This is why `hashCode()`
    +returns something different after `bb1.get()` is called.  So even though `bb1` does not enable
    +mutating `bytes1`, `bb1` is itself mutable.  One might think that calling `map.put(bb1.duplicate(),
    +aValue)` would avoid issues.  However, any code iterating over a maps keys could mutate the
    +ByteBuffers position.
    +
    +## Why not use Protobufs ByteString?
    +
    +[Protocol Buffers][pb] has a beautiful implementation of an immutable byte array wrapper called
    +[ByteString].  I would encourage its use when possible.  However in Fluo's case its not really
    +appropriate to use for two reasons.  First any library designer should try to minimize what
    +transitive dependencies they force on users.  Internally Fluo does not currently use Protocol
    +Buffers in its implementation, so this would be a new dependency for Fluo users.  The second reason
    +is going to require some background to explain.
    +
    +Technologies like [OSGI] and [Jigsaw] seek to modularize Java libraries and provide dependency
    +isolation.  Dependency isolation allows a user to use a library without having to share a libraries
    +dependencies.  For example, consider the following hypothetical scenario.
    +
    + * Fluo's implementation uses Protobuf version 2.5
    + * Fluo user code uses Protobuf version 1.8
    +
    +Without dependency isolation the user must converge dependencies and make their application and
    +Fluo's implementation use the same version of Protobuf.  Sometimes this works without issue, but
    +sometimes things will break because Protobuf dropped, changed, or added a method.
    +
    +With dependency isolation, Fluo's implementation and Fluo user code can easily use different versions
    +of Protobuf.  This is only true as long as Fluo's API does not use Protobuf.  So this the second
    +reason that Fluo should not use classes from Protobuf in its API.  If Fluo used Protobuf in its API
    +then it forces the user to have to converge dependencies, even if they are using OSGI or Jigsaw. 
    +
    +## What about the copies?
    +
    +As mentioned earlier, an Immutable type requires a defensive copy at creation time.  When we were
    +designing Fluo's API we were worried about this at first.  However a simple truth became apparent.
    +If the API took a mutable type, then all boundary points between the user and Fluo would require
    +defensive copies.  For example assume Fluo's API took byte arrays and consider the following code.
    +
    +```java
    +//A Fluo transaction
    +Transaction tx = ...
    +byte[] row = ...
    +
    +tx.set(row, col1, v1)
    --- End diff --
    
    Semicolons in java code?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-fluo-website pull request #41: Blog post about immutable bytes

Posted by ctubbsii <gi...@git.apache.org>.
Github user ctubbsii commented on a diff in the pull request:

    https://github.com/apache/incubator-fluo-website/pull/41#discussion_r87280336
  
    --- Diff: _posts/blog/2016-11-09-immutable-bytes.md ---
    @@ -0,0 +1,180 @@
    +---
    +title: "Java needs an immutable byte string"
    +date: 2016-11-09 11:43:00 +0000
    +author: Keith Turner
    +---
    +
    +## Byte Sequences in Java
    +
    +Working with byte arrays in Java can be painful.  To work around this Fluo created [Bytes] and
    +[BytesBuilder].  Bytes is an immutable wrapper around a byte array.  Bytes has good implementations
    +of `hashCode()`, `equals()`, and `compareTo()`.  These functions and its immutability make it
    +suitable to use as a map key.  If you have ever tried to use a byte array as a map key in Java you
    +quickly realize that you need to write a wrapper class.
    +
    +Fluo should not have to create these classes. Java really should offer something out the box as part
    +of its standard library.  However it does not offer any good class for this common case.
    +
    +## Why not use String?
    +
    +Trying to stuff arbitrary binary data in a String can corrupt the data.  The following little
    +program shows this, it will print `false`.
    +
    +```java
    +    byte bytes1[] = new byte[256];
    +    for(int i = 0; i<255; i++)
    +      bytes1[i] = (byte)i;
    +
    +    byte bytes2[] = new String(bytes1).getBytes();
    +    
    +    System.out.println(Arrays.equals(bytes1, bytes2));
    +```
    +
    +String can be made to work by specifying a character set. The following program will print `true`.
    +However this is error prone and inefficient.  Using this method results in copying between byte arrays
    +and internal string char arrays.
    +
    +```java
    +    byte bytes1[] = new byte[256];
    +    for(int i = 0; i<255; i++)
    +      bytes1[i] = (byte)i;
    +
    +    String str = new String(bytes1, StandardCharsets.ISO_8859_1);
    +    byte bytes2[] = str.getBytes(StandardCharsets.ISO_8859_1);
    +    
    +    System.out.println(Arrays.equals(bytes1, bytes2));
    +```
    +
    +## Why not use ByteBuffer?
    +  
    +A read only ByteBuffer might seem like it would fit the bill of an immutable byte array wrapper.
    +However, the following program shows two ways that ByteBuffer falls short.  ByteBuffers are great
    +for I/O, but it would not be prudent to use them as map keys.
    +
    +```java
    +    byte[] bytes1 = new byte[] {1,2,3,(byte)250};
    +    ByteBuffer bb1 = ByteBuffer.wrap(bytes1).asReadOnlyBuffer();
    +
    +    System.out.println(bb1.hashCode());
    +    bytes1[2]=89;
    +    System.out.println(bb1.hashCode());
    +    bb1.get();
    +    System.out.println(bb1.hashCode());
    +```
    +
    +The program above prints the following, which is less than ideal when using a
    +ByteBuffer as a HashMap key :
    +
    +```
    +747721
    +830367
    +26786
    +```
    +
    +This little program shows two things.  First, the only guarantee we are getting from
    +`asReadOnlyBuffer()` is that `bb1` can not be used to modify `bytes1`.  However, the originator of
    +the read only buffer can still modify the wrapped byte array.   Java's String and Fluo's Bytes avoid
    +this by always copying data into an internal private array that never escapes.
    +
    +The second issue is that `bb1` has a position and calling `bb1.get()` changes this position.
    +Changing the position conceptually changes the contents of the ByteBuffer.  This is why `hashCode()`
    +returns something different after `bb1.get()` is called.  So even though `bb1` does not enable
    +mutating `bytes1`, `bb1` is itself mutable.  One might think that calling `map.put(bb1.duplicate(),
    +aValue)` would avoid issues.  However, any code iterating over a maps keys could mutate the
    +ByteBuffers position.
    +
    +## Why not use Protobufs ByteString?
    +
    +[Protocol Buffers][pb] has a beautiful implementation of an immutable byte array wrapper called
    +[ByteString].  I would encourage its use when possible.  However in Fluo's case its not really
    +appropriate to use for two reasons.  First any library designer should try to minimize what
    +transitive dependencies they force on users.  Internally Fluo does not currently use Protocol
    +Buffers in its implementation, so this would be a new dependency for Fluo users.  The second reason
    +is going to require some background to explain.
    +
    +Technologies like [OSGI] and [Jigsaw] seek to modularize Java libraries and provide dependency
    +isolation.  Dependency isolation allows a user to use a library without having to share a libraries
    +dependencies.  For example, consider the following hypothetical scenario.
    +
    + * Fluo's implementation uses Protobuf version 2.5
    + * Fluo user code uses Protobuf version 1.8
    +
    +Without dependency isolation the user must converge dependencies and make their application and
    +Fluo's implementation use the same version of Protobuf.  Sometimes this works without issue, but
    +sometimes things will break because Protobuf dropped, changed, or added a method.
    +
    +With dependency isolation, Fluo's implementation and Fluo user code can easily use different versions
    +of Protobuf.  This is only true as long as Fluo's API does not use Protobuf.  So this the second
    +reason that Fluo should not use classes from Protobuf in its API.  If Fluo used Protobuf in its API
    +then it forces the user to have to converge dependencies, even if they are using OSGI or Jigsaw. 
    +
    +## What about the copies?
    +
    +As mentioned earlier, an Immutable type requires a defensive copy at creation time.  When we were
    +designing Fluo's API we were worried about this at first.  However a simple truth became apparent.
    +If the API took a mutable type, then all boundary points between the user and Fluo would require
    +defensive copies.  For example assume Fluo's API took byte arrays and consider the following code.
    +
    +```java
    +//A Fluo transaction
    +Transaction tx = ...
    +byte[] row = ...
    +
    +tx.set(row, col1, v1)
    +tx.set(row, col2, v2)
    +tx.set(row, col3, v3)
    +```
    +
    +Fluo will buffer changes until a transaction is committed.  In the example above since Fluo accepts
    +a mutable row, it would be prudent to do a defensive copy each time set is called above.  
    +
    +In the code below where an immutable byte array wrapper is used, the calls to set do not need to do
    +defensive copy.  So when comparing the two examples, the immutable byte wrapper results in less
    +defensive copies.
    +
    +```java
    +//A Fluo transaction
    +Transaction tx = ...
    +Bytes row = ...
    +
    +tx.set(row, col1, v1)
    +tx.set(row, col2, v2)
    +tx.set(row, col3, v3)
    +```
    +
    +## Improving the situation
    +
    +So far, the following arguments have been presented.
    --- End diff --
    
    End sentence with colon :


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-fluo-website pull request #41: Blog post about immutable bytes

Posted by ctubbsii <gi...@git.apache.org>.
Github user ctubbsii commented on a diff in the pull request:

    https://github.com/apache/incubator-fluo-website/pull/41#discussion_r87273763
  
    --- Diff: _posts/blog/2016-11-09-immutable-bytes.md ---
    @@ -0,0 +1,180 @@
    +---
    +title: "Java needs an immutable byte string"
    +date: 2016-11-09 11:43:00 +0000
    +author: Keith Turner
    +---
    +
    +## Byte Sequences in Java
    +
    +Working with byte arrays in Java can be painful.  To work around this Fluo created [Bytes] and
    +[BytesBuilder].  Bytes is an immutable wrapper around a byte array.  Bytes has good implementations
    +of `hashCode()`, `equals()`, and `compareTo()`.  These functions and its immutability make it
    +suitable to use as a map key.  If you have ever tried to use a byte array as a map key in Java you
    +quickly realize that you need to write a wrapper class.
    +
    +Fluo should not have to create these classes. Java really should offer something out the box as part
    +of its standard library.  However it does not offer any good class for this common case.
    +
    +## Why not use String?
    +
    +Trying to stuff arbitrary binary data in a String can corrupt the data.  The following little
    +program shows this, it will print `false`.
    +
    +```java
    +    byte bytes1[] = new byte[256];
    +    for(int i = 0; i<255; i++)
    +      bytes1[i] = (byte)i;
    +
    +    byte bytes2[] = new String(bytes1).getBytes();
    +    
    +    System.out.println(Arrays.equals(bytes1, bytes2));
    +```
    +
    +String can be made to work by specifying a character set. The following program will print `true`.
    +However this is error prone and inefficient.  Using this method results in copying between byte arrays
    +and internal string char arrays.
    +
    +```java
    +    byte bytes1[] = new byte[256];
    +    for(int i = 0; i<255; i++)
    +      bytes1[i] = (byte)i;
    +
    +    String str = new String(bytes1, StandardCharsets.ISO_8859_1);
    +    byte bytes2[] = str.getBytes(StandardCharsets.ISO_8859_1);
    +    
    +    System.out.println(Arrays.equals(bytes1, bytes2));
    +```
    +
    +## Why not use ByteBuffer?
    +  
    +A read only ByteBuffer might seem like it would fit the bill of an immutable byte array wrapper.
    +However, the following program shows two ways that ByteBuffer falls short.  ByteBuffers are great
    +for I/O, but it would not be prudent to use them as map keys.
    +
    +```java
    +    byte[] bytes1 = new byte[] {1,2,3,(byte)250};
    +    ByteBuffer bb1 = ByteBuffer.wrap(bytes1).asReadOnlyBuffer();
    +
    +    System.out.println(bb1.hashCode());
    +    bytes1[2]=89;
    +    System.out.println(bb1.hashCode());
    +    bb1.get();
    +    System.out.println(bb1.hashCode());
    +```
    +
    +The program above prints the following, which is less than ideal when using a
    +ByteBuffer as a HashMap key :
    +
    +```
    +747721
    +830367
    +26786
    +```
    +
    +This little program shows two things.  First, the only guarantee we are getting from
    +`asReadOnlyBuffer()` is that `bb1` can not be used to modify `bytes1`.  However, the originator of
    +the read only buffer can still modify the wrapped byte array.   Java's String and Fluo's Bytes avoid
    +this by always copying data into an internal private array that never escapes.
    +
    +The second issue is that `bb1` has a position and calling `bb1.get()` changes this position.
    +Changing the position conceptually changes the contents of the ByteBuffer.  This is why `hashCode()`
    +returns something different after `bb1.get()` is called.  So even though `bb1` does not enable
    +mutating `bytes1`, `bb1` is itself mutable.  One might think that calling `map.put(bb1.duplicate(),
    +aValue)` would avoid issues.  However, any code iterating over a maps keys could mutate the
    +ByteBuffers position.
    +
    +## Why not use Protobufs ByteString?
    +
    +[Protocol Buffers][pb] has a beautiful implementation of an immutable byte array wrapper called
    +[ByteString].  I would encourage its use when possible.  However in Fluo's case its not really
    +appropriate to use for two reasons.  First any library designer should try to minimize what
    +transitive dependencies they force on users.  Internally Fluo does not currently use Protocol
    +Buffers in its implementation, so this would be a new dependency for Fluo users.  The second reason
    +is going to require some background to explain.
    +
    +Technologies like [OSGI] and [Jigsaw] seek to modularize Java libraries and provide dependency
    +isolation.  Dependency isolation allows a user to use a library without having to share a libraries
    +dependencies.  For example, consider the following hypothetical scenario.
    +
    + * Fluo's implementation uses Protobuf version 2.5
    + * Fluo user code uses Protobuf version 1.8
    +
    +Without dependency isolation the user must converge dependencies and make their application and
    +Fluo's implementation use the same version of Protobuf.  Sometimes this works without issue, but
    +sometimes things will break because Protobuf dropped, changed, or added a method.
    +
    +With dependency isolation, Fluo's implementation and Fluo user code can easily use different versions
    +of Protobuf.  This is only true as long as Fluo's API does not use Protobuf.  So this the second
    --- End diff --
    
    So, this *is* the second


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-fluo-website issue #41: Blog post about immutable bytes

Posted by mikewalch <gi...@git.apache.org>.
Github user mikewalch commented on the issue:

    https://github.com/apache/incubator-fluo-website/pull/41
  
    +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-fluo-website pull request #41: Blog post about immutable bytes

Posted by keith-turner <gi...@git.apache.org>.
Github user keith-turner commented on a diff in the pull request:

    https://github.com/apache/incubator-fluo-website/pull/41#discussion_r87470953
  
    --- Diff: _posts/blog/2016-11-09-immutable-bytes.md ---
    @@ -0,0 +1,181 @@
    +---
    +title: "Java needs an immutable byte string"
    +date: 2016-11-09 11:43:00 +0000
    +author: Keith Turner
    +reviewers: Christopher Tubbs
    +---
    +
    --- End diff --
    
    Good feedback.  I took a crack a reorganizing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-fluo-website pull request #41: Blog post about immutable bytes

Posted by ctubbsii <gi...@git.apache.org>.
Github user ctubbsii commented on a diff in the pull request:

    https://github.com/apache/incubator-fluo-website/pull/41#discussion_r87273093
  
    --- Diff: _posts/blog/2016-11-09-immutable-bytes.md ---
    @@ -0,0 +1,180 @@
    +---
    +title: "Java needs an immutable byte string"
    +date: 2016-11-09 11:43:00 +0000
    +author: Keith Turner
    +---
    +
    +## Byte Sequences in Java
    +
    +Working with byte arrays in Java can be painful.  To work around this Fluo created [Bytes] and
    +[BytesBuilder].  Bytes is an immutable wrapper around a byte array.  Bytes has good implementations
    +of `hashCode()`, `equals()`, and `compareTo()`.  These functions and its immutability make it
    +suitable to use as a map key.  If you have ever tried to use a byte array as a map key in Java you
    +quickly realize that you need to write a wrapper class.
    +
    +Fluo should not have to create these classes. Java really should offer something out the box as part
    +of its standard library.  However it does not offer any good class for this common case.
    +
    +## Why not use String?
    +
    +Trying to stuff arbitrary binary data in a String can corrupt the data.  The following little
    +program shows this, it will print `false`.
    +
    +```java
    +    byte bytes1[] = new byte[256];
    +    for(int i = 0; i<255; i++)
    +      bytes1[i] = (byte)i;
    +
    +    byte bytes2[] = new String(bytes1).getBytes();
    +    
    +    System.out.println(Arrays.equals(bytes1, bytes2));
    +```
    +
    +String can be made to work by specifying a character set. The following program will print `true`.
    +However this is error prone and inefficient.  Using this method results in copying between byte arrays
    --- End diff --
    
    However,


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-fluo-website pull request #41: Blog post about immutable bytes

Posted by ctubbsii <gi...@git.apache.org>.
Github user ctubbsii commented on a diff in the pull request:

    https://github.com/apache/incubator-fluo-website/pull/41#discussion_r87272823
  
    --- Diff: _posts/blog/2016-11-09-immutable-bytes.md ---
    @@ -0,0 +1,180 @@
    +---
    +title: "Java needs an immutable byte string"
    +date: 2016-11-09 11:43:00 +0000
    +author: Keith Turner
    +---
    +
    +## Byte Sequences in Java
    +
    +Working with byte arrays in Java can be painful.  To work around this Fluo created [Bytes] and
    +[BytesBuilder].  Bytes is an immutable wrapper around a byte array.  Bytes has good implementations
    +of `hashCode()`, `equals()`, and `compareTo()`.  These functions and its immutability make it
    +suitable to use as a map key.  If you have ever tried to use a byte array as a map key in Java you
    +quickly realize that you need to write a wrapper class.
    +
    +Fluo should not have to create these classes. Java really should offer something out the box as part
    --- End diff --
    
    out *of* the box


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-fluo-website pull request #41: Blog post about immutable bytes

Posted by mikewalch <gi...@git.apache.org>.
Github user mikewalch commented on a diff in the pull request:

    https://github.com/apache/incubator-fluo-website/pull/41#discussion_r87404690
  
    --- Diff: _posts/blog/2016-11-09-immutable-bytes.md ---
    @@ -0,0 +1,181 @@
    +---
    +title: "Java needs an immutable byte string"
    +date: 2016-11-09 11:43:00 +0000
    +author: Keith Turner
    +reviewers: Christopher Tubbs
    +---
    +
    --- End diff --
    
    Content is great but I think it would easier to grasp (for those new to Fluo) if you followed the structure below.
    
    * Introduce problem. (i.e. Explain why Fluo needs Bytes objects).  
    * Explain existing solutions (String, ByteBuffer, Protobufs) and how they do not solve the problem
    * Describe how our internal solution (Bytes, ByteBuilders) works and solved it.
    * Conclude with how Java should provide something similar to our internal solution.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-fluo-website pull request #41: Blog post about immutable bytes

Posted by ctubbsii <gi...@git.apache.org>.
Github user ctubbsii commented on a diff in the pull request:

    https://github.com/apache/incubator-fluo-website/pull/41#discussion_r87280245
  
    --- Diff: _posts/blog/2016-11-09-immutable-bytes.md ---
    @@ -0,0 +1,180 @@
    +---
    +title: "Java needs an immutable byte string"
    +date: 2016-11-09 11:43:00 +0000
    +author: Keith Turner
    +---
    +
    +## Byte Sequences in Java
    +
    +Working with byte arrays in Java can be painful.  To work around this Fluo created [Bytes] and
    +[BytesBuilder].  Bytes is an immutable wrapper around a byte array.  Bytes has good implementations
    +of `hashCode()`, `equals()`, and `compareTo()`.  These functions and its immutability make it
    +suitable to use as a map key.  If you have ever tried to use a byte array as a map key in Java you
    +quickly realize that you need to write a wrapper class.
    +
    +Fluo should not have to create these classes. Java really should offer something out the box as part
    +of its standard library.  However it does not offer any good class for this common case.
    +
    +## Why not use String?
    +
    +Trying to stuff arbitrary binary data in a String can corrupt the data.  The following little
    +program shows this, it will print `false`.
    +
    +```java
    +    byte bytes1[] = new byte[256];
    +    for(int i = 0; i<255; i++)
    +      bytes1[i] = (byte)i;
    +
    +    byte bytes2[] = new String(bytes1).getBytes();
    +    
    +    System.out.println(Arrays.equals(bytes1, bytes2));
    +```
    +
    +String can be made to work by specifying a character set. The following program will print `true`.
    +However this is error prone and inefficient.  Using this method results in copying between byte arrays
    +and internal string char arrays.
    +
    +```java
    +    byte bytes1[] = new byte[256];
    +    for(int i = 0; i<255; i++)
    +      bytes1[i] = (byte)i;
    +
    +    String str = new String(bytes1, StandardCharsets.ISO_8859_1);
    +    byte bytes2[] = str.getBytes(StandardCharsets.ISO_8859_1);
    +    
    +    System.out.println(Arrays.equals(bytes1, bytes2));
    +```
    +
    +## Why not use ByteBuffer?
    +  
    +A read only ByteBuffer might seem like it would fit the bill of an immutable byte array wrapper.
    +However, the following program shows two ways that ByteBuffer falls short.  ByteBuffers are great
    +for I/O, but it would not be prudent to use them as map keys.
    +
    +```java
    +    byte[] bytes1 = new byte[] {1,2,3,(byte)250};
    +    ByteBuffer bb1 = ByteBuffer.wrap(bytes1).asReadOnlyBuffer();
    +
    +    System.out.println(bb1.hashCode());
    +    bytes1[2]=89;
    +    System.out.println(bb1.hashCode());
    +    bb1.get();
    +    System.out.println(bb1.hashCode());
    +```
    +
    +The program above prints the following, which is less than ideal when using a
    +ByteBuffer as a HashMap key :
    +
    +```
    +747721
    +830367
    +26786
    +```
    +
    +This little program shows two things.  First, the only guarantee we are getting from
    +`asReadOnlyBuffer()` is that `bb1` can not be used to modify `bytes1`.  However, the originator of
    +the read only buffer can still modify the wrapped byte array.   Java's String and Fluo's Bytes avoid
    +this by always copying data into an internal private array that never escapes.
    +
    +The second issue is that `bb1` has a position and calling `bb1.get()` changes this position.
    +Changing the position conceptually changes the contents of the ByteBuffer.  This is why `hashCode()`
    +returns something different after `bb1.get()` is called.  So even though `bb1` does not enable
    +mutating `bytes1`, `bb1` is itself mutable.  One might think that calling `map.put(bb1.duplicate(),
    +aValue)` would avoid issues.  However, any code iterating over a maps keys could mutate the
    +ByteBuffers position.
    +
    +## Why not use Protobufs ByteString?
    +
    +[Protocol Buffers][pb] has a beautiful implementation of an immutable byte array wrapper called
    +[ByteString].  I would encourage its use when possible.  However in Fluo's case its not really
    +appropriate to use for two reasons.  First any library designer should try to minimize what
    +transitive dependencies they force on users.  Internally Fluo does not currently use Protocol
    +Buffers in its implementation, so this would be a new dependency for Fluo users.  The second reason
    +is going to require some background to explain.
    +
    +Technologies like [OSGI] and [Jigsaw] seek to modularize Java libraries and provide dependency
    +isolation.  Dependency isolation allows a user to use a library without having to share a libraries
    +dependencies.  For example, consider the following hypothetical scenario.
    +
    + * Fluo's implementation uses Protobuf version 2.5
    + * Fluo user code uses Protobuf version 1.8
    +
    +Without dependency isolation the user must converge dependencies and make their application and
    +Fluo's implementation use the same version of Protobuf.  Sometimes this works without issue, but
    +sometimes things will break because Protobuf dropped, changed, or added a method.
    +
    +With dependency isolation, Fluo's implementation and Fluo user code can easily use different versions
    +of Protobuf.  This is only true as long as Fluo's API does not use Protobuf.  So this the second
    +reason that Fluo should not use classes from Protobuf in its API.  If Fluo used Protobuf in its API
    +then it forces the user to have to converge dependencies, even if they are using OSGI or Jigsaw. 
    +
    +## What about the copies?
    +
    +As mentioned earlier, an Immutable type requires a defensive copy at creation time.  When we were
    +designing Fluo's API we were worried about this at first.  However a simple truth became apparent.
    +If the API took a mutable type, then all boundary points between the user and Fluo would require
    +defensive copies.  For example assume Fluo's API took byte arrays and consider the following code.
    +
    +```java
    +//A Fluo transaction
    +Transaction tx = ...
    +byte[] row = ...
    +
    +tx.set(row, col1, v1)
    +tx.set(row, col2, v2)
    +tx.set(row, col3, v3)
    +```
    +
    +Fluo will buffer changes until a transaction is committed.  In the example above since Fluo accepts
    +a mutable row, it would be prudent to do a defensive copy each time set is called above.  
    +
    +In the code below where an immutable byte array wrapper is used, the calls to set do not need to do
    --- End diff --
    
    Put set as `set` so it's clear you're using the method name, and not the word "set".


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-fluo-website pull request #41: Blog post about immutable bytes

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/incubator-fluo-website/pull/41


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-fluo-website pull request #41: Blog post about immutable bytes

Posted by ctubbsii <gi...@git.apache.org>.
Github user ctubbsii commented on a diff in the pull request:

    https://github.com/apache/incubator-fluo-website/pull/41#discussion_r87272869
  
    --- Diff: _posts/blog/2016-11-09-immutable-bytes.md ---
    @@ -0,0 +1,180 @@
    +---
    +title: "Java needs an immutable byte string"
    +date: 2016-11-09 11:43:00 +0000
    +author: Keith Turner
    +---
    +
    +## Byte Sequences in Java
    +
    +Working with byte arrays in Java can be painful.  To work around this Fluo created [Bytes] and
    +[BytesBuilder].  Bytes is an immutable wrapper around a byte array.  Bytes has good implementations
    +of `hashCode()`, `equals()`, and `compareTo()`.  These functions and its immutability make it
    +suitable to use as a map key.  If you have ever tried to use a byte array as a map key in Java you
    +quickly realize that you need to write a wrapper class.
    +
    +Fluo should not have to create these classes. Java really should offer something out the box as part
    +of its standard library.  However it does not offer any good class for this common case.
    --- End diff --
    
    However,


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---