You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by GitBox <gi...@apache.org> on 2020/06/23 23:03:07 UTC

[GitHub] [accumulo] keith-turner opened a new issue #1642: Rename accumulo master package

keith-turner opened a new issue #1642:
URL: https://github.com/apache/accumulo/issues/1642


   After a name is selected, would need to rename the master package. There are some user facing plugins that are under that package.  Need to make a decision about what to do with these. The stability of these plugins over time has been very poor because they use lots of internal types.  The plugins could be moved to the SPI package which has build automation that prevents usage of internal types which makes the plugin interfaces much more stable.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] brianloss commented on issue #1642: Rename accumulo master package

Posted by GitBox <gi...@apache.org>.
brianloss commented on issue #1642:
URL: https://github.com/apache/accumulo/issues/1642#issuecomment-765682788


   I've been looking into this issue and see at least a couple problematic areas:
   1. [org.apache.accumulo.server.master.balancer.TabletBalancer](https://github.com/apache/accumulo/blob/main/server/base/src/main/java/org/apache/accumulo/server/master/balancer/TabletBalancer.java) and its class hierarchy. The balancers are configured using site and/or zookeeper properties. We could handle renaming those properties much like we're doing in #1640. However, users also have implemented custom balancers, and renaming the package would break custom balancers. This is not just because of the package holding TabletBalancer (and the built-in balancers) but because the method signature for `getAssignments` and `balance` also includes thrift classes in the org.apache.accumulo.core.master.thrift package. While none of this falls under the [public API](https://accumulo.apache.org/api/), it would still be disruptive to have 2.1 cause breaking code changes for users. In an offline discussion with @keith-turner, he suggested re-creating the balancers in the [spi package](https:/
 /github.com/apache/accumulo/blob/main/core/src/main/java/org/apache/accumulo/core/spi/package-info.java) and ensuring the interfaces use types that are in the public API. I believe the idea is to deprecate all masters and add a new SPI version in 2.1. Then for 3.0, remove all of the deprecated master code.
   2. Thrift interfaces in the master package would cause a binary incompatibility for older client code. For example, [TableOperationsImpl](https://github.com/apache/accumulo/blob/main/core/src/main/java/org/apache/accumulo/core/clientImpl/TableOperationsImpl.java) contacts [MasterClientService](https://github.com/apache/accumulo/blob/main/core/src/main/thrift/master.thrift#L216). If we rename that service to ManagerClientService, then 2.0 client code would no longer be able to talk to a 2.1 manager. While there's no requirement to support that, I'm wondering if it's better to make some of the changes in 3.0 given that the first bullet point above also means keeping some thrift-generated classes where they are until 3.0.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on issue #1642: Rename accumulo master package

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on issue #1642:
URL: https://github.com/apache/accumulo/issues/1642#issuecomment-670340859


   The name agreed to in a vote on the mailing list was 'manager'.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on issue #1642: Rename accumulo master package

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on issue #1642:
URL: https://github.com/apache/accumulo/issues/1642#issuecomment-767079633


   > Does it have to be an all or nothing? Would it make sense to rename as much as we can now (everything that doesn't impact the balancers) and then complete the renaming in 3.0?
   
   No, it doesn't have to be all or nothing. It might make sense to rename as much as we can now... the value is that it may highlight more potential breakage points than the balancer so there aren't surprises with the final renames for 3.0. The downside is that it's extra work... and and increased risk of other breakage if we miss something.
   
   > Or, given that an SPI will probably have objects for the tablet server status and migrations that wrap the existing thrift objects, maybe we could do something similar with renamed thrift objects for the existing balancers. That is, leave the existing thrift classes (that are used by balancers) alone and then generate the new ones in the new package for use by the tablet servers and SPI. The master would then copy data from the new thrift objects that are returned by the tservers into old package versions and send those to the old balancer (if in use). That would allow the current balancers to be deprecated but still used. Given that the thrift objects have public fields, we'd have to copy all of the data so there'd be a performance cost, but we'd only have to do that work if the deprecated balancer is in use.
   
   I think that's a reasonable solution if we really need to support compatibility with the older non-public balancer API. But, rather than assume we need to support that compatibility, I'd like to take a moment to wonder about the benefits of doing so, and whether the maintenance cost of the migration path is worth it:
   
   Are users who require custom balancers the kind of users who would be willing to tolerate performance hits? Or, are they the kind of users who would prefer to migrate to the stable SPI, rather than stay on the non-public API? I suspect they are much more likely to be in the latter group. And, if that's true, then it would be extra maintenance and development work to retain compatibility with an internal non-public API that should see very little use.
   
   It's probably not much work to support the compatibility, so it's *probably* worth it to stay compatible, but I think it's worth at least asking the question.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] brianloss commented on issue #1642: Rename accumulo master package

Posted by GitBox <gi...@apache.org>.
brianloss commented on issue #1642:
URL: https://github.com/apache/accumulo/issues/1642#issuecomment-766828727


   > I'd be fine waiting until 3.0 to do the package rename, if we want to have the balancer stuff go through a deprecation cycle, as though it were public API.
   
   Does it have to be an all or nothing? Would it make sense to rename as much as we can now (everything that doesn't impact the balancers) and then complete the renaming in 3.0? Or, given that an SPI will probably have objects for the tablet server status and migrations that wrap the existing thrift objects, maybe we could do something similar with renamed thrift objects for the existing balancers. That is, leave the existing thrift classes (that are used by balancers) alone and then generate the new ones in the new package for use by the tablet servers and SPI. The master would then copy data from the new thrift objects that are returned by the tservers into old package versions and send those to the old balancer (if in use). That would allow the current balancers to be deprecated but still used. Given that the thrift objects have public fields, we'd have to copy all of the data so there'd be a performance cost, but we'd only have to do that work if the deprecated balancer is in us
 e.
   
   > In either case, the road block seems to be the lack of a balancer SPI that avoids internal and thrift-specific types.
   
   Agreed. I added #1880 to separate that work and make it easier to review independent of any changes from this issue.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on issue #1642: Rename accumulo master package

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on issue #1642:
URL: https://github.com/apache/accumulo/issues/1642#issuecomment-765696679


   FWIW, 2.0 can't talk to 2.1 anyway, because we've already bumped the wire version in d9ed7b411c9ca686c89870a363ca199042057d2d
   
   I'd be fine waiting until 3.0 to do the package rename, if we want to have the balancer stuff go through a deprecation cycle, as though it were public API. I'd also be fine ripping off the bandage quickly, and relying on the fact that it's not public API and has always been subject to change. In either case, the road block seems to be the lack of a balancer SPI that avoids internal and thrift-specific types.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] brianloss closed issue #1642: Rename accumulo master package

Posted by GitBox <gi...@apache.org>.
brianloss closed issue #1642:
URL: https://github.com/apache/accumulo/issues/1642


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] brianloss commented on issue #1642: Rename accumulo master package

Posted by GitBox <gi...@apache.org>.
brianloss commented on issue #1642:
URL: https://github.com/apache/accumulo/issues/1642#issuecomment-766828727


   > I'd be fine waiting until 3.0 to do the package rename, if we want to have the balancer stuff go through a deprecation cycle, as though it were public API.
   
   Does it have to be an all or nothing? Would it make sense to rename as much as we can now (everything that doesn't impact the balancers) and then complete the renaming in 3.0? Or, given that an SPI will probably have objects for the tablet server status and migrations that wrap the existing thrift objects, maybe we could do something similar with renamed thrift objects for the existing balancers. That is, leave the existing thrift classes (that are used by balancers) alone and then generate the new ones in the new package for use by the tablet servers and SPI. The master would then copy data from the new thrift objects that are returned by the tservers into old package versions and send those to the old balancer (if in use). That would allow the current balancers to be deprecated but still used. Given that the thrift objects have public fields, we'd have to copy all of the data so there'd be a performance cost, but we'd only have to do that work if the deprecated balancer is in us
 e.
   
   > In either case, the road block seems to be the lack of a balancer SPI that avoids internal and thrift-specific types.
   
   Agreed. I added #1880 to separate that work and make it easier to review independent of any changes from this issue.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii edited a comment on issue #1642: Rename accumulo master package

Posted by GitBox <gi...@apache.org>.
ctubbsii edited a comment on issue #1642:
URL: https://github.com/apache/accumulo/issues/1642#issuecomment-765696679


   FWIW, 2.0 can't talk to 2.1 anyway, because we've already bumped the wire version in d9ed7b411c9ca686c89870a363ca199042057d2d (EDIT: actually, that might not affect clients; in any case, we definitely don't guarantee wire compatibility with clients across minor releases.)
   
   I'd be fine waiting until 3.0 to do the package rename, if we want to have the balancer stuff go through a deprecation cycle, as though it were public API. I'd also be fine ripping off the bandage quickly, and relying on the fact that it's not public API and has always been subject to change. In either case, the road block seems to be the lack of a balancer SPI that avoids internal and thrift-specific types.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] ctubbsii commented on issue #1642: Rename accumulo master package

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on issue #1642:
URL: https://github.com/apache/accumulo/issues/1642#issuecomment-767079633


   > Does it have to be an all or nothing? Would it make sense to rename as much as we can now (everything that doesn't impact the balancers) and then complete the renaming in 3.0?
   
   No, it doesn't have to be all or nothing. It might make sense to rename as much as we can now... the value is that it may highlight more potential breakage points than the balancer so there aren't surprises with the final renames for 3.0. The downside is that it's extra work... and and increased risk of other breakage if we miss something.
   
   > Or, given that an SPI will probably have objects for the tablet server status and migrations that wrap the existing thrift objects, maybe we could do something similar with renamed thrift objects for the existing balancers. That is, leave the existing thrift classes (that are used by balancers) alone and then generate the new ones in the new package for use by the tablet servers and SPI. The master would then copy data from the new thrift objects that are returned by the tservers into old package versions and send those to the old balancer (if in use). That would allow the current balancers to be deprecated but still used. Given that the thrift objects have public fields, we'd have to copy all of the data so there'd be a performance cost, but we'd only have to do that work if the deprecated balancer is in use.
   
   I think that's a reasonable solution if we really need to support compatibility with the older non-public balancer API. But, rather than assume we need to support that compatibility, I'd like to take a moment to wonder about the benefits of doing so, and whether the maintenance cost of the migration path is worth it:
   
   Are users who require custom balancers the kind of users who would be willing to tolerate performance hits? Or, are they the kind of users who would prefer to migrate to the stable SPI, rather than stay on the non-public API? I suspect they are much more likely to be in the latter group. And, if that's true, then it would be extra maintenance and development work to retain compatibility with an internal non-public API that should see very little use.
   
   It's probably not much work to support the compatibility, so it's *probably* worth it to stay compatible, but I think it's worth at least asking the question.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org