You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucenenet.apache.org by GitBox <gi...@apache.org> on 2021/03/11 14:58:27 UTC

[GitHub] [lucenenet] NightOwl888 commented on pull request #436: introduce buffered crc32 algorithm to calculate a checksum

NightOwl888 commented on pull request #436:
URL: https://github.com/apache/lucenenet/pull/436#issuecomment-796797535


   Thanks for the PR, and for the great work.
   
   You managed to get the update functionality working with .NET's `HashAlgorithm` class, which is great. Unfortunately, through no fault of yours, the `HashAlgorithm` class creates a significant number of allocations compared to the current implementation based on `IChecksum` even though it outperforms in raw speed. The allocations would certainly more than negate any improvement due to the additional amount of garbage collection.
   
   ```c#
       [MemoryDiagnoser]
       public class BenchmarkCrc32
       {
           private const int RandomSeed = 1234;
           private const int Iterations = 100000;
   
           [Benchmark]
           public void UnbufferedCRC32_RunSequence()
           {
               var Random = new Random(RandomSeed);
               var data = new byte[Random.Next(1024)];
               Random.NextBytes(data);
   
               var c1 = new CRC32();
   
               for (int i = 0; i < Iterations; i++)
               {
   
                   c1.Update(data);
   
                   c1.Reset();
   
                   Random.NextBytes(data);
                   c1.Update(data);
   
                   var _ = c1.Value;
               }
           }
   
           [Benchmark]
           public void BufferedCRC32_RunSequence()
           {
               var Random = new Random(RandomSeed);
               var data = new byte[Random.Next(1024)];
               Random.NextBytes(data);
   
               var c1 = new BufferedChecksum(new CRC32());
   
               for (int i = 0; i < Iterations; i++)
               {
   
                   c1.Update(data);
   
                   c1.Reset();
   
                   Random.NextBytes(data);
                   c1.Update(data);
   
                   var _ = c1.Value;
               }
           }
   
           [Benchmark]
           public void BufferedCrc32Algorithm_RunSequence()
           {
               var Random = new Random(RandomSeed);
               var data = new byte[Random.Next(1024)];
               Random.NextBytes(data);
   
               var c2 = new BufferedCrc32Algorithm();
   
               for (int i = 0; i < Iterations; i++)
               {
                   c2.TransformFinalBlock(data, 0, data.Length);
   
                   c2.Initialize();
   
                   Random.NextBytes(data);
                   c2.Update(data);
   
                   var _ = c2.Value;
               }
           }
       }
   ```
   ``` ini
   
   BenchmarkDotNet=v0.12.0, OS=Windows 10.0.19041
   Intel Core i7-8850H CPU 2.60GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
   .NET Core SDK=5.0.104
     [Host]     : .NET Core 3.1.13 (CoreCLR 4.700.21.11102, CoreFX 4.700.21.11602), X64 RyuJIT
     DefaultJob : .NET Core 3.1.13 (CoreCLR 4.700.21.11102, CoreFX 4.700.21.11602), X64 RyuJIT
   
   
   ```
   |                             Method |     Mean |    Error |   StdDev |      Gen 0 | Gen 1 | Gen 2 |  Allocated |
   |----------------------------------- |---------:|---------:|---------:|-----------:|------:|------:|-----------:|
   |        UnbufferedCRC32_RunSequence | 539.0 ms | 10.54 ms | 16.42 ms |          - |     - |     - |      736 B |
   |          BufferedCRC32_RunSequence | 531.9 ms | 10.57 ms | 21.84 ms |          - |     - |     - |     1056 B |
   | BufferedCrc32Algorithm_RunSequence | 429.8 ms |  8.50 ms | 15.54 ms | 11000.0000 |     - |     - | 52800800 B |
   
   
   I attempted to modify the `Crc32Algorithm` class to reduce allocations, but most of them appear to be coming from the `HashAlgorithm` base class.
   
   So by exploring this, it is now clear that:
   
   1. We cannot make a `BufferedHashAlgorithm` class that works with any `HashAlgorthm` because it doesn't expose APIs to update the algorithm.
   2. The raw speed of `BuffferedCrc32Algorithm` is better than a buffered `CRC32`, but the large number of allocations by the former would cause performance degradation.
   
   So, while this seemed to be sensible when writing up #255, our understanding of the issues with doing this have changed as we are neither gaining in API usability or in performance. In short, there is no benefit to replacing the `IChecksum` interface with `HashAlgorthm` and it is best to leave it marked internal.
   
   Thanks for putting this together so we could explore this option. You definitely deserve all of the credit for closing #255 and we'd welcome your further participation in the project.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org