You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Imran Rashid (JIRA)" <ji...@apache.org> on 2019/05/20 16:20:00 UTC

[jira] [Created] (SPARK-27780) Shuffle server & client should be versioned to enable smoother upgrade

Imran Rashid created SPARK-27780:
------------------------------------

             Summary: Shuffle server & client should be versioned to enable smoother upgrade
                 Key: SPARK-27780
                 URL: https://issues.apache.org/jira/browse/SPARK-27780
             Project: Spark
          Issue Type: New Feature
          Components: Shuffle, Spark Core
    Affects Versions: 3.0.0
            Reporter: Imran Rashid


The external shuffle service is often upgraded at a different time than spark itself.  However, this causes problems when the protocol changes between the shuffle service and the spark runtime -- this forces users to upgrade everything simultaneously.

We should add versioning to the shuffle client & server, so they know what messages the other will support.  This would allow better handling of mixed versions, from better error msgs to allowing some mismatched versions (with reduced capabilities).

This originally came up in a discussion here: https://github.com/apache/spark/pull/24565#issuecomment-493496466

There are a few ways we could do the versioning which we still need to discuss:

1) Version specified by config.  This allows for mixed versions across the cluster and rolling upgrades.  It also will let a spark 3.0 client talk to a 2.4 shuffle service.  But, may be a nuisance for users to get this right.

2) Auto-detection during registration with local shuffle service.  This makes the versioning easy for the end user, and can even handle a 2.4 shuffle service though it does not support the new versioning.  However, it will not handle a rolling upgrade correctly -- if the local shuffle service has been upgraded, but other nodes in the cluster have not, it will get the version wrong.

3) Exchange versions per-connection.  When a connection is opened, the server & client could first exchange messages with their versions, so they know how to continue communication after that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org