You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/04 13:43:04 UTC

[GitHub] [beam] damccorm opened a new issue, #19828: Files managed by beam should have associated AVPs such as content-type and content-encoding instead of merely mimeType

damccorm opened a new issue, #19828:
URL: https://github.com/apache/beam/issues/19828

   From customer:
   
    
   > We've updated our DataFlow templates to read and write with gzip compression. I noticed when .gz file is written the object's metadata defaults to "application/octet-stream" for Content-Type because it doesn't know what it is. I would like to have each file be plain/text for content-type and gzip for content-encoding. We may also add other metadata key/value pairs. I can't find a way to programmatically set these and other metadata values per object within DataFlow. I'm using TextIO right now and just doing .withCompression. I didn't see any other functions to achieve this or any DataFlow doc on it. Am I missing something?
   >  
   
   The MIME type of the output file can be set by supplying your own WritableByteChannelFactory to TextIO which sets the MIME type to your desired value[0].
   
   The default WritableByteChannelFactory for TextIO is "text/plain", but when "withCompression" is used, this becomes "application/octet-stream"[1][2].
   
   Unfortunately, FileSystems.create does not support setting a content-encoding on the output channel. I will ensure that this specific point is captured in the feature request, though at this point it becomes an upstream change to Beam rather than a change to Dataflow.
   
   [0] https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextIO.java#L1175
   
   [1] https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileBasedSink.java#L874
   
   [2] https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/util/MimeTypes.java
   
   [3] https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileSystems.java#L224
   
   Imported from Jira [BEAM-8180](https://issues.apache.org/jira/browse/BEAM-8180). Original Jira may contain additional context.
   Reported by: cjac.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org