You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by GitBox <gi...@apache.org> on 2021/06/15 14:15:26 UTC

[GitHub] [accumulo] milleruntime opened a new issue #2165: Explore use of Apache Commons IO byte streams

milleruntime opened a new issue #2165:
URL: https://github.com/apache/accumulo/issues/2165


   The java classes for dealing with byte streams are synchronized so might not perform well for certain situations that don't require thread safe guarantees. Explore the use of `org.apache.commons.io.input.UnsynchronizedByteArrayInputStream`
   and `org.apache.commons.io.output.UnsynchronizedByteArrayOutputStream`. A lot of Accumulo modules already include these classes with the Apache commons-io dependency so their use may not require an additional dependency. These classes would replace the java classes `ByteArrayInputStream` and `ByteArrayOutputStream`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] DomGarguilo commented on issue #2165: Explore use of Apache Commons IO byte streams

Posted by GitBox <gi...@apache.org>.
DomGarguilo commented on issue #2165:
URL: https://github.com/apache/accumulo/issues/2165#issuecomment-895476936


   I set up some crude performance tests to see what the difference is between the normal byte streams and the Unsynchronized variants. From what I can tell the Unsynchronized byte streams are slower. I set up identical loops in which I created a `UnsynchronizedByteArrayOutputStream`/`ByteArrayOutputStream` wraped it in a `DataOutputStream` then wrote to it (String, int and double each tested separately). Then created a `ByteArrayInputStream`/`UnsynchronizedByteArrayInputStream` wraped in a `DataInputStream` and read from it, asserting that the sent and received data matched each time. From my data the Unsynchronized version ran on average ~60ms where as the regular streams ran at an average of ~45ms. Below is the code I used to test.
   <details closed>
   
   <summary>Code</summary>
   
   ```java
     @Test
     public void testUnsyncEncodeDecode() throws IOException {
       long start = System.currentTimeMillis();
       for (int i = 0; i < 10_000; i++) {
         try (UnsynchronizedByteArrayOutputStream baos = new UnsynchronizedByteArrayOutputStream();
             DataOutputStream dos = new DataOutputStream(baos)) {
           String out = getRandomString();
           dos.writeUTF(out);
           try (
               UnsynchronizedByteArrayInputStream bais =
                   new UnsynchronizedByteArrayInputStream(baos.toByteArray());
               DataInputStream dis = new DataInputStream(bais)) {
             String in = dis.readUTF();
             assertEquals(in, out);
           }
         }
         try (UnsynchronizedByteArrayOutputStream baos = new UnsynchronizedByteArrayOutputStream();
             DataOutputStream dos = new DataOutputStream(baos)) {
           int out = random.nextInt();
           dos.writeInt(out);
           try (
               UnsynchronizedByteArrayInputStream bais =
                   new UnsynchronizedByteArrayInputStream(baos.toByteArray());
               DataInputStream dis = new DataInputStream(bais)) {
             int in = dis.readInt();
             assertEquals(in, out);
           }
         }
         try (UnsynchronizedByteArrayOutputStream baos = new UnsynchronizedByteArrayOutputStream();
             DataOutputStream dos = new DataOutputStream(baos)) {
           double out = random.nextDouble();
           dos.writeDouble(out);
           try (
               UnsynchronizedByteArrayInputStream bais =
                   new UnsynchronizedByteArrayInputStream(baos.toByteArray());
               DataInputStream dis = new DataInputStream(bais)) {
             double in = dis.readDouble();
             assertEquals(in, out, 0.001);
           }
         }
       }
       long finish = System.currentTimeMillis();
       System.out.println("time taken: " + (finish - start));
     }
   
     @Test
     public void testClassicEncodeDecode() throws IOException {
       long start = System.currentTimeMillis();
       for (int i = 0; i < 10_000; i++) {
         try (ByteArrayOutputStream baos = new ByteArrayOutputStream();
             DataOutputStream dos = new DataOutputStream(baos)) {
           String out = getRandomString();
           dos.writeUTF(out);
           try (ByteArrayInputStream bais = new ByteArrayInputStream(baos.toByteArray());
               DataInputStream dis = new DataInputStream(bais)) {
             String in = dis.readUTF();
             assertEquals(in, out);
           }
         }
         try (ByteArrayOutputStream baos = new ByteArrayOutputStream();
             DataOutputStream dos = new DataOutputStream(baos)) {
           int out = random.nextInt();
           dos.writeInt(out);
           try (ByteArrayInputStream bais = new ByteArrayInputStream(baos.toByteArray());
               DataInputStream dis = new DataInputStream(bais)) {
             int in = dis.readInt();
             assertEquals(in, out);
           }
         }
         try (ByteArrayOutputStream baos = new ByteArrayOutputStream();
             DataOutputStream dos = new DataOutputStream(baos)) {
           double out = random.nextDouble();
           dos.writeDouble(out);
           try (ByteArrayInputStream bais = new ByteArrayInputStream(baos.toByteArray());
               DataInputStream dis = new DataInputStream(bais)) {
             double in = dis.readDouble();
             assertEquals(in, out, 0.001);
           }
         }
       }
       long finish = System.currentTimeMillis();
       System.out.println("time taken: " + (finish - start));
     }
   
   ```
   </details>
   
   I'm not sure if this kind of testing is helpful, just thought I would pass along my findings.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] DomGarguilo commented on issue #2165: Explore use of Apache Commons IO byte streams

Posted by GitBox <gi...@apache.org>.
DomGarguilo commented on issue #2165:
URL: https://github.com/apache/accumulo/issues/2165#issuecomment-896174440


   Moved the tests to JMH and separated the data types into separate benchmarks. An example of one is as follows where the Unsyncronized versions and data types are swapped out for the other:
   <details closed>
   
   <summary>Test example</summary>
   
   ```java
       @Benchmark
       public void testUnsyncInt() throws IOException {
           try (UnsynchronizedByteArrayOutputStream baos = new UnsynchronizedByteArrayOutputStream();
                DataOutputStream dos = new DataOutputStream(baos)) {
               int out = MyState.random.nextInt();
               dos.writeInt(out);
               try (UnsynchronizedByteArrayInputStream bais = new UnsynchronizedByteArrayInputStream(baos.toByteArray());
                    DataInputStream dis = new DataInputStream(bais)) {
                   assertEquals(dis.readInt(), out);
               }
           }
       }
   ```
   </details>
   
   The results of the tests are as follows:
   
   ```
   Benchmark                      Mode  Cnt  Score    Error  Units
   MyBenchmark.testClassicDouble  avgt   25  0.088 ±  0.001  us/op
   MyBenchmark.testClassicInt     avgt   25  0.108 ±  0.001  us/op
   MyBenchmark.testClassicString  avgt   25  0.347 ±  0.002  us/op
   MyBenchmark.testUnsyncDouble   avgt   25  0.200 ±  0.006  us/op
   MyBenchmark.testUnsyncInt      avgt   25  0.228 ±  0.022  us/op
   MyBenchmark.testUnsyncString   avgt   25  0.419 ±  0.012  us/op
   ```
   Not too sure how well these tests map to our use cases. If there are any suggestions for additional tests/test tweaks that might be able to better show which of these byte streams would be best for us that would be great.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] keith-turner commented on issue #2165: Explore use of Apache Commons IO byte streams

Posted by GitBox <gi...@apache.org>.
keith-turner commented on issue #2165:
URL: https://github.com/apache/accumulo/issues/2165#issuecomment-861669679


   Would be nice to do some perf test using DataOutputStream wrapping ByteArrayOutputStream or UnsynchronizedByteArrayOutputStream and serializing a few million different data type like key to verify that is makes a performance difference.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] dlmarion commented on issue #2165: Explore use of Apache Commons IO byte streams

Posted by GitBox <gi...@apache.org>.
dlmarion commented on issue #2165:
URL: https://github.com/apache/accumulo/issues/2165#issuecomment-895487731


   Suggest taking a look at JMH for writing a Java benchmark.
   https://github.com/openjdk/jmh
   
   On Mon, Aug 9, 2021 at 3:19 PM Dom G. ***@***.***> wrote:
   
   > I set up some crude performance tests to see what the difference is
   > between the normal byte streams and the Unsynchronized variants. From what
   > I can tell the Unsynchronized byte streams are slower. I set up identical
   > loops in which I created a UnsynchronizedByteArrayOutputStream/
   > ByteArrayOutputStream wraped it in a DataOutputStream then wrote to it
   > (String, int and double each tested separately). Then created a
   > ByteArrayInputStream/UnsynchronizedByteArrayInputStream wraped in a
   > DataInputStream and read from it, asserting that the sent and received
   > data matched each time. From my data the Unsynchronized version ran on
   > average ~60ms where as the regular streams ran at an average of ~45ms.
   > Below is the code I used to test.
   > Code
   >
   >   @Test
   >   public void testUnsyncEncodeDecode() throws IOException {
   >     long start = System.currentTimeMillis();
   >     for (int i = 0; i < 10_000; i++) {
   >       try (UnsynchronizedByteArrayOutputStream baos = new UnsynchronizedByteArrayOutputStream();
   >           DataOutputStream dos = new DataOutputStream(baos)) {
   >         String out = getRandomString();
   >         dos.writeUTF(out);
   >         try (
   >             UnsynchronizedByteArrayInputStream bais =
   >                 new UnsynchronizedByteArrayInputStream(baos.toByteArray());
   >             DataInputStream dis = new DataInputStream(bais)) {
   >           String in = dis.readUTF();
   >           assertEquals(in, out);
   >         }
   >       }
   >       try (UnsynchronizedByteArrayOutputStream baos = new UnsynchronizedByteArrayOutputStream();
   >           DataOutputStream dos = new DataOutputStream(baos)) {
   >         int out = random.nextInt();
   >         dos.writeInt(out);
   >         try (
   >             UnsynchronizedByteArrayInputStream bais =
   >                 new UnsynchronizedByteArrayInputStream(baos.toByteArray());
   >             DataInputStream dis = new DataInputStream(bais)) {
   >           int in = dis.readInt();
   >           assertEquals(in, out);
   >         }
   >       }
   >       try (UnsynchronizedByteArrayOutputStream baos = new UnsynchronizedByteArrayOutputStream();
   >           DataOutputStream dos = new DataOutputStream(baos)) {
   >         double out = random.nextDouble();
   >         dos.writeDouble(out);
   >         try (
   >             UnsynchronizedByteArrayInputStream bais =
   >                 new UnsynchronizedByteArrayInputStream(baos.toByteArray());
   >             DataInputStream dis = new DataInputStream(bais)) {
   >           double in = dis.readDouble();
   >           assertEquals(in, out, 0.001);
   >         }
   >       }
   >     }
   >     long finish = System.currentTimeMillis();
   >     System.out.println("time taken: " + (finish - start));
   >   }
   >
   >   @Test
   >   public void testClassicEncodeDecode() throws IOException {
   >     long start = System.currentTimeMillis();
   >     for (int i = 0; i < 10_000; i++) {
   >       try (ByteArrayOutputStream baos = new ByteArrayOutputStream();
   >           DataOutputStream dos = new DataOutputStream(baos)) {
   >         String out = getRandomString();
   >         dos.writeUTF(out);
   >         try (ByteArrayInputStream bais = new ByteArrayInputStream(baos.toByteArray());
   >             DataInputStream dis = new DataInputStream(bais)) {
   >           String in = dis.readUTF();
   >           assertEquals(in, out);
   >         }
   >       }
   >       try (ByteArrayOutputStream baos = new ByteArrayOutputStream();
   >           DataOutputStream dos = new DataOutputStream(baos)) {
   >         int out = random.nextInt();
   >         dos.writeInt(out);
   >         try (ByteArrayInputStream bais = new ByteArrayInputStream(baos.toByteArray());
   >             DataInputStream dis = new DataInputStream(bais)) {
   >           int in = dis.readInt();
   >           assertEquals(in, out);
   >         }
   >       }
   >       try (ByteArrayOutputStream baos = new ByteArrayOutputStream();
   >           DataOutputStream dos = new DataOutputStream(baos)) {
   >         double out = random.nextDouble();
   >         dos.writeDouble(out);
   >         try (ByteArrayInputStream bais = new ByteArrayInputStream(baos.toByteArray());
   >             DataInputStream dis = new DataInputStream(bais)) {
   >           double in = dis.readDouble();
   >           assertEquals(in, out, 0.001);
   >         }
   >       }
   >     }
   >     long finish = System.currentTimeMillis();
   >     System.out.println("time taken: " + (finish - start));
   >   }
   >
   > I'm not sure if this kind of testing is helpful, just thought I would pass
   > along my findings.
   >
   > —
   > You are receiving this because you are subscribed to this thread.
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/accumulo/issues/2165#issuecomment-895476936>,
   > or unsubscribe
   > <https://github.com/notifications/unsubscribe-auth/AAEKUZZDJQPEFWD44YZKM3TT4AS3NANCNFSM46XLZSTA>
   > .
   > Triage notifications on the go with GitHub Mobile for iOS
   > <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
   > or Android
   > <https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email>
   > .
   >
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org