You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "sweisdb (via GitHub)" <gi...@apache.org> on 2024/03/06 00:22:39 UTC

[PR] [Work in Progress] Experimenting to move TransportCipher to GCM based on Google Tink [spark]

sweisdb opened a new pull request, #45394:
URL: https://github.com/apache/spark/pull/45394

   ### What changes were proposed in this pull request?
   The high level issue is that Apache Spark's RPC encryption is using unauthenticated CTR. We want to switch to GCM.
   
   The complication is Spark currently uses Apache Commons' CryptoInputStream and CryptoOutputStream, which do not support GCM. GCM also increases the ciphertext size by prepending an IV and adding an authentication tag.
   
   So far, this is requiring a larger rewrite to work with Spark's ByteBufs and FileRegions. This diff is currently a Work in Progress
   
   
   ### Why are the changes needed?
   
   AES CTR mode with no authentication is vulnerable to someone modifying ciphertexts. That would allow them to change RPC call contents.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No
   
   ### How was this patch tested?
   
   Existing unit tests
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [Work in Progress] Experimenting to move TransportCipher to GCM based on Google Tink [spark]

Posted by "sweisdb (via GitHub)" <gi...@apache.org>.
sweisdb commented on PR #45394:
URL: https://github.com/apache/spark/pull/45394#issuecomment-1980063354

   @mridulm At its core, using AES-CTR mode without authentication is insecure because someone can change RPC contents by simply XORing the ciphertext. This can be demonstrated by modifying traffic between a master and worker node. 
   
   It would need to use an authenticated mode of encryption to address the problem, e.g. AES-GCM mode.
   
   I would much rather ditch all the TransportCipher and AuthEngine code and just use TLS. I don't know if that is feasible, so we may need a 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [Work in Progress] Experimenting to move TransportCipher to GCM based on Google Tink [spark]

Posted by "mridulm (via GitHub)" <gi...@apache.org>.
mridulm commented on PR #45394:
URL: https://github.com/apache/spark/pull/45394#issuecomment-1979944365

   It is not clear to me why we should be making this change, what the benefits are and what the current limitations are.
   Note that Spark 4.0 support TLS - so if this is still required in that context.
   
   If this ends up being a nontrivial and user facing change, would be good to go through [an SPIP](https://spark.apache.org/improvement-proposals.html)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [Work in Progress] Experimenting to move TransportCipher to GCM based on Google Tink [spark]

Posted by "sweisdb (via GitHub)" <gi...@apache.org>.
sweisdb commented on PR #45394:
URL: https://github.com/apache/spark/pull/45394#issuecomment-1986452977

   Closing this for now as we decide how to proceed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [Work in Progress] Experimenting to move TransportCipher to GCM based on Google Tink [spark]

Posted by "sweisdb (via GitHub)" <gi...@apache.org>.
sweisdb closed pull request #45394: [Work in Progress] Experimenting to move TransportCipher to GCM based on Google Tink
URL: https://github.com/apache/spark/pull/45394


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org