You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by GitBox <gi...@apache.org> on 2022/01/30 21:17:56 UTC

[GitHub] [avro] tgnm opened a new pull request #1358: AVRO-3223: Support optional codecs in C# library

tgnm opened a new pull request #1358:
URL: https://github.com/apache/avro/pull/1358


   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Avro Jira](https://issues.apache.org/jira/projects/AVRO/issues/AVRO-3223) issues and references them in the PR title.
   
   ### Tests
   
   - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: `FileTests.TestOptionalCodecs`.
   
   ### Commits
   
   - [x] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "[How to write a good git commit message](https://chris.beams.io/posts/git-commit/)":
     1. Subject is separated from body by a blank line
     1. Subject is limited to 50 characters (not including Jira issue reference)
     1. Subject does not end with a period
     1. Subject uses the imperative mood ("add", not "adding")
     1. Body wraps at 72 characters
     1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [x] In case of new functionality, my PR adds documentation that describes how to use it.
     - All the public functions and the classes in the PR contain Javadoc that explain what it does
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] martin-g commented on a change in pull request #1358: AVRO-3223: Support optional codecs in C# library

Posted by GitBox <gi...@apache.org>.
martin-g commented on a change in pull request #1358:
URL: https://github.com/apache/avro/pull/1358#discussion_r795450525



##########
File path: lang/csharp/src/apache/main/File/Codec.cs
##########
@@ -43,9 +44,10 @@ public abstract class Codec
         /// <summary>
         /// Decompress data using implemented codec
         /// </summary>
-        /// <param name="compressedData"></param>
+        /// <param name="compressedData">The buffer holding data to decompress.</param>
+        /// <param name="length">The actual length of bytes to decompress from the buffer.</param>
         /// <returns></returns>
-        abstract public byte[] Decompress(byte[] compressedData);
+        abstract public byte[] Decompress(byte[] compressedData, int length);

Review comment:
       The best is to use new commits after a review. Otherwise we have to re-review the whole thing again and again.
   Once the PR is approved we use GitHub's "Squash & Merge" button. This way all the commit messages are preserved for the final commit. So, better use good commit messages! :-)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] zcsizmadia edited a comment on pull request #1358: AVRO-3223: Support optional codecs in C# library

Posted by GitBox <gi...@apache.org>.
zcsizmadia edited a comment on pull request #1358:
URL: https://github.com/apache/avro/pull/1358#issuecomment-1027178601


   @tgnm What is the advantage using a CodecResolver delagate, instaed of assigning a codec to a name at registration time?
   
   Something like this, instead of RegisterResolver, have a RegisterCodec:
   
   ```
   private static readonly ConcurrentDictionary<string, Codec> _registeredCodecs = new ConcurrentDictionary<string, Codec>();
   
   public static void RegisterCodec(string codecType, Codec codec)
   {
       _registeredCodecs[codecType] = codec;
   }
   
   public static Codec CreateCodecFromString(string codecType)
   {
       // Check registered codecs
       if (_registeredCodecs.TryGetValue(codecType, out Codec codec))
       {
           return codec;
       }
   
       ...
   }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] zcsizmadia edited a comment on pull request #1358: AVRO-3223: Support optional codecs in C# library

Posted by GitBox <gi...@apache.org>.
zcsizmadia edited a comment on pull request #1358:
URL: https://github.com/apache/avro/pull/1358#issuecomment-1027178601


   @tgnm What is the advantage using a CodecResolver delagate, instaed of assigning a codec to a name at registration time?
   
   Something like this, instead of RegisterResolver, have a RegisterCodec:
   
   ```
   private static readonly ConcurrentDictionary<string, Codec> _registeredCodecs = new ConcurrentDictionary<string, Codec>();
   
   public static void RegisterCodec(string codecString, Codec codec)
   {
       _codecResolvers[codecString] = codec;
   }
   
   public static Codec CreateCodecFromString(string codecType)
   {
       // Check registered codecs
       if (TryGetValue(codecType, out Codec codec))
       {
           return codec;
       }
   
       ...
   }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] zcsizmadia commented on pull request #1358: AVRO-3223: Support optional codecs in C# library

Posted by GitBox <gi...@apache.org>.
zcsizmadia commented on pull request #1358:
URL: https://github.com/apache/avro/pull/1358#issuecomment-1027212294


   I definetely like the idea registering codecs. I would love to have xz, however many of the codecs are not trivial to add to the base Avro library, because of other wierd package dependencies.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] martin-g merged pull request #1358: AVRO-3223: Support optional codecs in C# library

Posted by GitBox <gi...@apache.org>.
martin-g merged pull request #1358:
URL: https://github.com/apache/avro/pull/1358


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] tgnm commented on pull request #1358: AVRO-3223: Support optional codecs in C# library

Posted by GitBox <gi...@apache.org>.
tgnm commented on pull request #1358:
URL: https://github.com/apache/avro/pull/1358#issuecomment-1027198862


   @zcsizmadia I guess the main advantage is that the function call uses a factory approach which allows for flexibility when it comes to the lifetime management of the `Codec` instance. I like the idea that I can either return a new instance or reused it across calls (depending on whether or not the Codec implementation supports reusing the instance). This is totally transparent to the Avro library.
   
   These approaches are also different when it comes to registrations in that the function approach iterates over all resolvers until the first match is found. The approach you suggested replaces a previous registration.
   
   From a performance point of view, there's not a lot of meaningful differences simply because I don't expect there to be more than one registration in practice... the dictionary approach is a bit less space efficient but this is negligible.
   
   Btw, I don't think using a concurrent data structure is really justified here. I'd expect consumers to first register any resolvers before they actually start using it to encode/decode data. Certainly don't expect this to be something that occurs dynamically at runtime. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] martin-g commented on a change in pull request #1358: AVRO-3223: Support optional codecs in C# library

Posted by GitBox <gi...@apache.org>.
martin-g commented on a change in pull request #1358:
URL: https://github.com/apache/avro/pull/1358#discussion_r795460794



##########
File path: lang/csharp/src/apache/main/File/Codec.cs
##########
@@ -43,9 +44,10 @@ public abstract class Codec
         /// <summary>
         /// Decompress data using implemented codec
         /// </summary>
-        /// <param name="compressedData"></param>
+        /// <param name="compressedData">The buffer holding data to decompress.</param>
+        /// <param name="length">The actual length of bytes to decompress from the buffer.</param>
         /// <returns></returns>
-        abstract public byte[] Decompress(byte[] compressedData);
+        abstract public byte[] Decompress(byte[] compressedData, int length);

Review comment:
       > The change I've pushed is simply to overload the method so that you have both method signatures, the one without a length argument and the one with it.
   
   I still think my suggestion to mark the old method as deprecated/Obsolete and to implement it as in https://github.com/apache/avro/pull/1358/files#r795248968 better.
   Currently https://github.com/apache/avro/pull/1358/files#diff-f6b2b3f158e175d309773571f11f7cf979b6ba983e3c00f28b0d7e869910f6b3R310 calls the new method. So, at some point we should be able to remove the old one.
   I don't know whether C# supports `default method in interface` as in Java/Scala. It would be the best if the old method is left as Obsolete only in Codec.cs




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] tgnm commented on pull request #1358: AVRO-3223: Support optional codecs in C# library

Posted by GitBox <gi...@apache.org>.
tgnm commented on pull request #1358:
URL: https://github.com/apache/avro/pull/1358#issuecomment-1027146538


   @martin-g I've just merged master into the branch to resolve a couple of conflicts. Do you think there's any chance this change can go into the next release? This would be a pretty useful feature to add to the dotnet library as it is missing zstd and snappy support atm.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] tgnm commented on pull request #1358: AVRO-3223: Support optional codecs in C# library

Posted by GitBox <gi...@apache.org>.
tgnm commented on pull request #1358:
URL: https://github.com/apache/avro/pull/1358#issuecomment-968272954


   I have just rebased master and fixed a few code quality warnings I had missed. Would be great to get more feedback on this PR and see if we can get this merged. Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] martin-g commented on a change in pull request #1358: AVRO-3223: Support optional codecs in C# library

Posted by GitBox <gi...@apache.org>.
martin-g commented on a change in pull request #1358:
URL: https://github.com/apache/avro/pull/1358#discussion_r795426443



##########
File path: lang/csharp/src/apache/main/File/Codec.cs
##########
@@ -43,9 +44,10 @@ public abstract class Codec
         /// <summary>
         /// Decompress data using implemented codec
         /// </summary>
-        /// <param name="compressedData"></param>
+        /// <param name="compressedData">The buffer holding data to decompress.</param>
+        /// <param name="length">The actual length of bytes to decompress from the buffer.</param>
         /// <returns></returns>
-        abstract public byte[] Decompress(byte[] compressedData);
+        abstract public byte[] Decompress(byte[] compressedData, int length);

Review comment:
       Since you force pushed it is hard to see how you improved it.
   Now we have two similar methods in the API and at least to me it is not very clear when to use each one.
   I'll let someone else who has experience with C# to decide whether this PR should be merged.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] martin-g commented on pull request #1358: AVRO-3223: Support optional codecs in C# library

Posted by GitBox <gi...@apache.org>.
martin-g commented on pull request #1358:
URL: https://github.com/apache/avro/pull/1358#issuecomment-1027229426


   Thank you for the contribution, @tgnm !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] Indomitable commented on pull request #1358: AVRO-3223: Support optional codecs in C# library

Posted by GitBox <gi...@apache.org>.
Indomitable commented on pull request #1358:
URL: https://github.com/apache/avro/pull/1358#issuecomment-938493645


   I have a local changes that add Snappy codec and it also needs `_blockSize` passed to the `Decompress` method, but why not keep it passed as `long` and then the codec can cast it to `int` if needed. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] tgnm commented on a change in pull request #1358: AVRO-3223: Support optional codecs in C# library

Posted by GitBox <gi...@apache.org>.
tgnm commented on a change in pull request #1358:
URL: https://github.com/apache/avro/pull/1358#discussion_r795430079



##########
File path: lang/csharp/src/apache/main/File/Codec.cs
##########
@@ -43,9 +44,10 @@ public abstract class Codec
         /// <summary>
         /// Decompress data using implemented codec
         /// </summary>
-        /// <param name="compressedData"></param>
+        /// <param name="compressedData">The buffer holding data to decompress.</param>
+        /// <param name="length">The actual length of bytes to decompress from the buffer.</param>
         /// <returns></returns>
-        abstract public byte[] Decompress(byte[] compressedData);
+        abstract public byte[] Decompress(byte[] compressedData, int length);

Review comment:
       I thought I had read somewhere in the contribution guidelines that a single commit per issue was preferred. However, I've read the guidelines again and I cannot find it. Is the preference to have multiple commits and squash merge at the end?
   
   The change I've pushed is simply to overload the method so that you have both method signatures, the one without a length argument and the one with it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] tgnm commented on pull request #1358: AVRO-3223: Support optional codecs in C# library

Posted by GitBox <gi...@apache.org>.
tgnm commented on pull request #1358:
URL: https://github.com/apache/avro/pull/1358#issuecomment-938610523


   @Indomitable the reason why I force the abstract class to use `int` is because arrays in .NET don't exceed 2GB in size, unless  `gcAllowVeryLargeObjects` is set. Regardless, the indexer on the array is `int`.
   If we look at more recent .NET APIs, even the new `Span<T>` type doesn't support a 64-bit length (for the reasons described here: https://github.com/dotnet/apireviews/tree/main/2016/11-04-SpanOfT).
   
   Finally, the library itself doesn't support block sizes exceeding 32-bit:
   ```
   _blockSize = _decoder.ReadLong();           // read block size
   if (_blockSize > System.Int32.MaxValue || _blockSize < 0)
   {
           throw new AvroRuntimeException("Block size invalid or too large for this " +
                                                      "implementation: " + _blockSize);
   ```
   
   So having a long length isn't actually possible and would otherwise be misleading for people implementing `Codec`.
   Having said this, I'd like to see the library adopting the new Memory<T> and Span<T> APIs. There's a lot of opportunities for performance improvements and modernising the API of this library. However, this is out of scope for the issue I've raised and PR :)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] zcsizmadia commented on pull request #1358: AVRO-3223: Support optional codecs in C# library

Posted by GitBox <gi...@apache.org>.
zcsizmadia commented on pull request #1358:
URL: https://github.com/apache/avro/pull/1358#issuecomment-1027178601


   @tgnm What is the advantage using a CodecResolver delagate, instaed of assigning a codec to a name at registration time?
   
   Something like this, instead of RegisterResolver, have a RegisterCodec:
   `
   private static readonly ConcurrentDictionary<string, Codec> _registeredCodecs = new ConcurrentDictionary<string, Codec>();
   
   public static void RegisterCodec(string codecString, Codec codec)
   {
       _codecResolvers[codecString] = codec;
   }
   
   public static Codec CreateCodecFromString(string codecType)
   {
       // Check registered codecs
       if (TryGetValue(codecType, out Codec codec))
       {
           return codec;
       }
   
       ...
   }
   `


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] martin-g commented on pull request #1358: AVRO-3223: Support optional codecs in C# library

Posted by GitBox <gi...@apache.org>.
martin-g commented on pull request #1358:
URL: https://github.com/apache/avro/pull/1358#issuecomment-1025233276


   @KyleSchoonover @zcsizmadia Any comments ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] martin-g commented on pull request #1358: AVRO-3223: Support optional codecs in C# library

Posted by GitBox <gi...@apache.org>.
martin-g commented on pull request #1358:
URL: https://github.com/apache/avro/pull/1358#issuecomment-1025233385


   Closed/reopened the PR to trigger the CI checks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] tgnm commented on a change in pull request #1358: AVRO-3223: Support optional codecs in C# library

Posted by GitBox <gi...@apache.org>.
tgnm commented on a change in pull request #1358:
URL: https://github.com/apache/avro/pull/1358#discussion_r795529985



##########
File path: lang/csharp/src/apache/main/File/Codec.cs
##########
@@ -43,9 +44,10 @@ public abstract class Codec
         /// <summary>
         /// Decompress data using implemented codec
         /// </summary>
-        /// <param name="compressedData"></param>
+        /// <param name="compressedData">The buffer holding data to decompress.</param>
+        /// <param name="length">The actual length of bytes to decompress from the buffer.</param>
         /// <returns></returns>
-        abstract public byte[] Decompress(byte[] compressedData);
+        abstract public byte[] Decompress(byte[] compressedData, int length);

Review comment:
       Right, I assumed you were concerned about library consumers who could potentially still be using `Codec` as it's marked `public` so I avoided making changes that could break it in any way (including marking as obsolete as that can break builds if they treat warnings as errors).
   I have pushed the changes now so that the old method is `Obsolete`. Thanks!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] martin-g commented on a change in pull request #1358: AVRO-3223: Support optional codecs in C# library

Posted by GitBox <gi...@apache.org>.
martin-g commented on a change in pull request #1358:
URL: https://github.com/apache/avro/pull/1358#discussion_r795248968



##########
File path: lang/csharp/src/apache/main/File/Codec.cs
##########
@@ -43,9 +44,10 @@ public abstract class Codec
         /// <summary>
         /// Decompress data using implemented codec
         /// </summary>
-        /// <param name="compressedData"></param>
+        /// <param name="compressedData">The buffer holding data to decompress.</param>
+        /// <param name="length">The actual length of bytes to decompress from the buffer.</param>
         /// <returns></returns>
-        abstract public byte[] Decompress(byte[] compressedData);
+        abstract public byte[] Decompress(byte[] compressedData, int length);

Review comment:
       Can we preserve the old method for backward compatibility ?
   E.g. 
   ```C#
   public byte[] Decompress(byte[] compressedData) {
      return Decompress(compressedData.length);
   }
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] martin-g closed pull request #1358: AVRO-3223: Support optional codecs in C# library

Posted by GitBox <gi...@apache.org>.
martin-g closed pull request #1358:
URL: https://github.com/apache/avro/pull/1358


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] zcsizmadia edited a comment on pull request #1358: AVRO-3223: Support optional codecs in C# library

Posted by GitBox <gi...@apache.org>.
zcsizmadia edited a comment on pull request #1358:
URL: https://github.com/apache/avro/pull/1358#issuecomment-1027178601






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org