You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "George Pongracz (JIRA)" <ji...@apache.org> on 2017/08/18 00:48:01 UTC

[jira] [Comment Edited] (SPARK-21702) Structured Streaming S3A SSE Encryption Not Visible through AWS S3 GUI when PartitionBy Used

    [ https://issues.apache.org/jira/browse/SPARK-21702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131537#comment-16131537 ] 

George Pongracz edited comment on SPARK-21702 at 8/18/17 12:47 AM:
-------------------------------------------------------------------

*Update:*

The data bearing files (files that contain the data payload from the stream) written to s3 when viewed through the AWS S3 GUI and selected using their LHS check-box encryption in the properties section report their encryption as "-".

All related non-data bearing files when selected using their LHS check-box encryption in the properties section report their encryption  as "AES-256".

When clicking through the name of a single data bearing file, which brings up a dedicated overview screen for the file, reports it as having AES-256 encryption.

As one can see, this labelling of encryption is inconsistent and can cause confusion that a file on first inspection seems unencrypted, whilst really the files on deeper via click-through report as encrypted.

I think this lowers the weight of this issue and I can close if deemed a non issue, however it would be good if the files would written would all present consistently and correctly, whether data or non-data bearing. 

I must say I lost a bit of time believing I had not encrypted and tried to debug until I stumbled upon what I just described in this update.

Obviously only happening when PartitionBy is used.


was (Author: gpongracz):
*Update:*

The data bearing files (files that contain the data payload from the stream) written to s3 when viewed through the AWS S3 GUI and selected using their LHS check-box encryption in the properties section report their encryption as "-".

All related non-data bearing files when selected using their LHS check-box encryption in the properties section report their encryption  as "AES-256".

When clicking through the name of a single data bearing file, which brings up a dedicated overview screen for the file, reports it as having AES-256 encryption.

As one can see, this labelling of encryption is inconsistent and can cause confusion that a file on first inspection seems unencrypted, whilst really the files on deeper via click-through report as encrypted.

I think this lowers the weight of this issue and I can close if deemed a non issue, however it would be good if the files would written would all present consistently and correctly, whether data or non-data bearing. 

Obviously only happening when PartitionBy is used.

> Structured Streaming S3A SSE Encryption Not Visible through AWS S3 GUI when PartitionBy Used
> --------------------------------------------------------------------------------------------
>
>                 Key: SPARK-21702
>                 URL: https://issues.apache.org/jira/browse/SPARK-21702
>             Project: Spark
>          Issue Type: Bug
>          Components: Structured Streaming
>    Affects Versions: 2.2.0
>         Environment: Hadoop 2.7.3: AWS SDK 1.7.4
> Hadoop 2.8.1: AWS SDK 1.10.6
>            Reporter: George Pongracz
>            Priority: Minor
>              Labels: security
>
> Settings:
>       .config("spark.hadoop.fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
>       .config("spark.hadoop.fs.s3a.server-side-encryption-algorithm", "AES256")
> When writing to an S3 sink from structured streaming the files are being encrypted using AES-256
> When introducing a "PartitionBy" the output data files are unencrypted. 
> All other supporting files, metadata are encrypted
> Suspect write to temp is encrypted and move/rename is not applying the SSE.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org