You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Andrew Kyle Purtell (Jira)" <ji...@apache.org> on 2023/02/14 03:06:00 UTC

[jira] [Comment Edited] (HBASE-27637) Zero length value would cause value compressor read nothing and not advance the position of the InputStream

    [ https://issues.apache.org/jira/browse/HBASE-27637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17688251#comment-17688251 ] 

Andrew Kyle Purtell edited comment on HBASE-27637 at 2/14/23 3:05 AM:
----------------------------------------------------------------------

bq. it turned out that, if the value length is 0, then the compressed length will be 4, but while reading, we will read nothing so we will not read the 4 bytes

Ah. Value compression should do nothing if the value is zero, this is the code bug. 

Its been a while since I've looked at this code. If we unconditionally use the compressor, to "write" 0 bytes, then the compression codec will emit overheads... hadoop compressionstream header, compression bitstream header. All of that should be skipped so the value size we write on disk is 0 and truly no value data follows the length. I see you have already taken this issue [~zhangduo]. Let me know if you'd rather I patch it, as this is my code that is not doing the correct thing here.


was (Author: apurtell):
bq. it turned out that, if the value length is 0, then the compressed length will be 4, but while reading, we will read nothing so we will not read the 4 bytes

Ah. Value compression should do nothing if the value is zero, this is the code bug. 

Its been a while since I've looked at this code. If we unconditionally use the compressor, to "write" 0 bytes, then the compression codec will emit overheads... hadoop compressionstream header, compression bitstream header. All of that should be skipped. I see you have already taken this issue [~zhangduo]. Let me know if you'd rather I patch it, as this is my code that is not doing the correct thing here.

> Zero length value would cause value compressor read nothing and not advance the position of the InputStream
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-27637
>                 URL: https://issues.apache.org/jira/browse/HBASE-27637
>             Project: HBase
>          Issue Type: Bug
>          Components: dataloss, wal
>            Reporter: Duo Zhang
>            Assignee: Duo Zhang
>            Priority: Critical
>
> This is a code sniff from the discussion of HBASE-27073
> {code}
>   public static void main(String[] args) throws Exception {
>     CompressionContext ctx =
>       new CompressionContext(LRUDictionary.class, false, false, true, Compression.Algorithm.GZ);
>     ValueCompressor compressor = ctx.getValueCompressor();
>     byte[] compressed = compressor.compress(new byte[0], 0, 0);
>     System.out.println("compressed length: " + compressed.length);
>     ByteArrayInputStream bis = new ByteArrayInputStream(compressed);
>     int read = compressor.decompress(bis, compressed.length, new byte[0], 0, 0);
>     System.out.println("read length: " + read);
>     System.out.println("position: " + (compressed.length - bis.available()));
> {code}
> And the output is
> {noformat}
> compressed length: 20
> read length: 0
> position: 0
> {noformat}
> So it turns out that, when compressing, an empty array will still generate some output bytes but while reading, we will skip reading anything if we find the output length is zero, so next time when we read from the stream, we will start at a wrong position...



--
This message was sent by Atlassian Jira
(v8.20.10#820010)