You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by GitBox <gi...@apache.org> on 2021/11/22 21:03:46 UTC

[GitHub] [accumulo] trietopsoft opened a new issue #2366: Support remaining Hadoop compression codecs

trietopsoft opened a new issue #2366:
URL: https://github.com/apache/accumulo/issues/2366


   **Is your feature request related to a problem? Please describe.**
   Some compression codecs defined in Hadoop are unable to be used with Accumulo.
   
   **Describe the solution you'd like**
   Ability to choose any compression codec defined by Hadoop.  The remaining codecs to support are BZip2 and LZ4.
   
   **Describe alternatives you've considered**
   Define a pluggable interface for compression.  Would require a much larger scope of design and implementation.
   
   **Additional context**
   ZStandard was added as a codec, remaining codecs should also be an option.  Historical Jira link: https://issues.apache.org/jira/browse/ACCUMULO-351
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on issue #2366: Support remaining Hadoop compression codecs

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on issue #2366:
URL: https://github.com/apache/accumulo/issues/2366#issuecomment-976103926


   The original issue was marked as "Won't Fix" because the benefits were deemed negligible at the time. It seems trivial to support, since the actual codecs are in Hadoop libraries, and we'd just have to wrap them in a standard way, so I'm not necessarily opposed. However, I'd prefer we not have to wrap anything, and we just completely offload the implementation to an SPI that the user provides an implementation for on the class path, rather than require us to try to stay in sync with whatever implementation classes are available in a given version of Hadoop.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] milleruntime closed issue #2366: Support remaining Hadoop compression codecs

Posted by GitBox <gi...@apache.org>.
milleruntime closed issue #2366:
URL: https://github.com/apache/accumulo/issues/2366


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on issue #2366: Support remaining Hadoop compression codecs

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on issue #2366:
URL: https://github.com/apache/accumulo/issues/2366#issuecomment-976909806


   @trietopsoft No, I'm not opposed. I tagged @milleruntime to request a review, because I believe he was the one who added ZStd and would be most familiar with the wrapping code you added. I'm in favor of the additional support... I just think that in future, maybe we can provide support in a way that is more modular/pluggable, rather than require a fork, if Hadoop adds new ones, so the bar is lowered for extending Accumulo like this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] trietopsoft commented on issue #2366: Support remaining Hadoop compression codecs

Posted by GitBox <gi...@apache.org>.
trietopsoft commented on issue #2366:
URL: https://github.com/apache/accumulo/issues/2366#issuecomment-976639818


   @ctubbsii thank you for your consideration.  Existing 1.x customer uses forked version of accumulo to support lz4, bzip2, and zstd.  They would like all codecs from the Hadoop ecosystem to be available within the project.  Accelerators are used in various stages of the pipeline to provide performance benefits over gz etc.  Please let us know if you are going to reject this PR as we will have to continue fork and maintenance for 2.x.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org