You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/10/28 00:46:49 UTC

[GitHub] [spark] HeartSaVioR commented on pull request #30162: [SPARK-33263][SS] Configurable StateStore compression codec

HeartSaVioR commented on pull request #30162:
URL: https://github.com/apache/spark/pull/30162#issuecomment-717625880

At least we shouldn't rely on the configuration while reading the file - this isn't same as others, e.g. event log compression. For event log compression, the file has a postfix for the file compression, so regardless of the configuration, reader can extract the file correctly.

One thing making it uneasy is that sometimes HDFS state store requires knowing about the exact file name to read without listing, if I remember correctly. That makes us unable to add the postfix for the compression.

That said, you'll need to either 1) make it as a configuration but prevent the value to be changed after the query starts (like we do in state store formats) or 2) add the information to the separate metadata file and let state store read it. Probably even for the case of 2) you'll want to prevent the compression codec to be changed across the lifetime of the query - if you allow arbitrary changes of the compression codec across batches, these information should be written and referenced which would become non-trivial overhead.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org