You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Timothy Farkas <tf...@mapr.com> on 2017/09/21 22:46:51 UTC

Backwards Compatibility Policy

Hi All,

I recently made a change to the option system which impacted the fields contained in OptionValues and hence the format of option information we are storing in zookeeper. So it is currently not backward compatible with old system options stored in zookeeper. Two ways to resolve the issue are to require old data to be purged from zookeeper when upgrading the cluster or to attempt to allow backward compatibility by modifying the deserializer for OptionValue. So my question is what is our stance on backwards compatibility?

Thanks,
Tim

Re: Backwards Compatibility Policy

Posted by Timothy Farkas <tf...@mapr.com>.
Hi Paul,

That sounds like a good plan. I'll implement the points that you listed here.

Thanks,
Tim

________________________________
From: Paul Rogers <pr...@mapr.com>
Sent: Friday, September 22, 2017 10:43:25 AM
To: dev@drill.apache.org
Subject: Re: Backwards Compatibility Policy

Hi Tim,

Thanks for the fix!

Here’s my random thought on this, FWIW.

We don’t have to emulate the old, broken behavior — that is what you have generously fixed and we want to keep.

The problem we *do* want to solve is to make the transition seamless for users. Let’s focus on ZK.

The most important data in ZK is the option value itself. All other data is derived and should never have been stored in ZK in the first place. Consider. Only system options are stored in ZK, so no need to store the scope in which the option is set: only one scope is stored in ZK.

Consider other information: data type and metadata. We need a data type to know how to interpret the value. But all other metadata can be obtained from the new metadata structure you created.

Even the data type does not absolutely have to be stored in ZK: we know the type from the metadata. (Ignoring, here the corner case of someone changing the type of an existing option from one release to the next - we don’t handle that anyway.)

So, when retrieving the old data, we can throw away everything except the value. That is, we parse the old JSON, extract the value into the new structure, and discard the rest. We then write new values into ZK using the new structure (which, I would hope, has a form optimized for external storage as we discussed above.)

Although you are focusing on ZK, Drill has a “persistence” abstraction: data can also be stored in local files, by default in /tmp/drill. So, you can use an old Drill to create a test set of options persisted to a file. Save that as a test input. Then, in the latest version, create a test for your old-to-new converter using the saved test data.

Doing the removes ZK from the picture, making unit testing far easier.

The issue of table schema is a whole other topic which we can discuss later. I still wonder, does anyone actually depend on the current schema? If not, the point is moot and we’re in good shape already.

Thanks,

- Paul

> On Sep 22, 2017, at 12:12 AM, Timothy Farkas <tf...@mapr.com> wrote:
>
> Hi Paul,
>
> I've implemented the ZK fix. But I'm uncertain about how to make the options table backwards compatible. The main issue with the options table is that formerly there was a "type" column. The "type" column meant different things in different parts of the code, and the meaning was overall inconsistent. Now the "type" column has been renamed to "accessibleScopes" and was assigned consistent meaning throughout the code base. As part of that change the possible values for "accessibleScopes" has changed and how it is assigned those values has changed as well. Making it truly backward compatible would require emulating the previously bad behavior, and I'm not sure how useful that would be. It would be possible to revert the name for the column back to "type", but I think that would have limited usefulness since the value stored for "type" will be different in almost every case from what it was before, so that would also break any scripts that intimately depended on it as well. What are your thoughts on how to handle this?
>
> Thanks,
> Tim
>
>
> ________________________________
> From: Paul Rogers <pr...@mapr.com>
> Sent: Thursday, September 21, 2017 5:50:26 PM
> To: dev@drill.apache.org
> Subject: Re: Backwards Compatibility Policy
>
> Hi Tim,
>
> Unfortunately, Drill has no version compatibility guidelines. We break compatibility all the time — often by accident. As Drill matures, we should consider defining such a policy.
>
> Experience with other systems suggests that provide an automatic way to handle the inevitable evolution of APIs, data formats and the like. This is done for user convenience, and (in commercial products) to avoid support calls.
>
> Would have been great if we had a version number in ZK. Would solve problems not just with options, but with storage plugins when we change their classes (and hence the JSON stored in ZK.) But, we don’t…
>
> Vitalii recently added versioning for the Parquet metadata file to avoid the need for users to delete all the metadata each time we make a change. Would be great if we could follow that example for other areas.
>
> In the meantime, perhaps you can implement a way to read the old format ZK but write the new format.
>
> I believe you also changed the layout of the system table for options. These were long-needed improvements. Still, I wonder if anyone has a script that depends on the old format? Do we need a way to support the old format, while offering a new table with the new format? Jyothsna recently did that as part of her work in options; I wonder if something like that is needed here also? In fact, you may be able to simply alter the table that she added: it hasn’t see the light of a Drill release yet.
>
> Thanks,
>
> - Paul
>
>> On Sep 21, 2017, at 4:16 PM, Timothy Farkas <tf...@mapr.com> wrote:
>>
>> Makes sense Abhishek, I'll work on making it backwards compatible then.
>>
>> Thanks,
>> Tim
>>
>> ________________________________
>> From: Abhishek Girish <ag...@apache.org>
>> Sent: Thursday, September 21, 2017 3:58:55 PM
>> To: dev@drill.apache.org
>> Subject: Re: Backwards Compatibility Policy
>>
>> Hey Tim,
>>
>> Requiring users to purge Drill's ZK data is not advisable and we might not
>> want to go that route. We need to have a seamless upgrade path - for
>> instance modifying values found to be in an older format to the new one,
>> without explicit user interaction.
>>
>> Regards,
>> Abhishek
>>
>> On Thu, Sep 21, 2017 at 3:46 PM, Timothy Farkas <tf...@mapr.com> wrote:
>>
>>> Hi All,
>>>
>>> I recently made a change to the option system which impacted the fields
>>> contained in OptionValues and hence the format of option information we are
>>> storing in zookeeper. So it is currently not backward compatible with old
>>> system options stored in zookeeper. Two ways to resolve the issue are to
>>> require old data to be purged from zookeeper when upgrading the cluster or
>>> to attempt to allow backward compatibility by modifying the deserializer
>>> for OptionValue. So my question is what is our stance on backwards
>>> compatibility?
>>>
>>> Thanks,
>>> Tim
>>>
>


Re: Backwards Compatibility Policy

Posted by Paul Rogers <pr...@mapr.com>.
Hi Tim,

Thanks for the fix!

Here’s my random thought on this, FWIW.

We don’t have to emulate the old, broken behavior — that is what you have generously fixed and we want to keep.

The problem we *do* want to solve is to make the transition seamless for users. Let’s focus on ZK.

The most important data in ZK is the option value itself. All other data is derived and should never have been stored in ZK in the first place. Consider. Only system options are stored in ZK, so no need to store the scope in which the option is set: only one scope is stored in ZK.

Consider other information: data type and metadata. We need a data type to know how to interpret the value. But all other metadata can be obtained from the new metadata structure you created.

Even the data type does not absolutely have to be stored in ZK: we know the type from the metadata. (Ignoring, here the corner case of someone changing the type of an existing option from one release to the next - we don’t handle that anyway.)

So, when retrieving the old data, we can throw away everything except the value. That is, we parse the old JSON, extract the value into the new structure, and discard the rest. We then write new values into ZK using the new structure (which, I would hope, has a form optimized for external storage as we discussed above.)

Although you are focusing on ZK, Drill has a “persistence” abstraction: data can also be stored in local files, by default in /tmp/drill. So, you can use an old Drill to create a test set of options persisted to a file. Save that as a test input. Then, in the latest version, create a test for your old-to-new converter using the saved test data.

Doing the removes ZK from the picture, making unit testing far easier.

The issue of table schema is a whole other topic which we can discuss later. I still wonder, does anyone actually depend on the current schema? If not, the point is moot and we’re in good shape already.

Thanks,

- Paul

> On Sep 22, 2017, at 12:12 AM, Timothy Farkas <tf...@mapr.com> wrote:
> 
> Hi Paul,
> 
> I've implemented the ZK fix. But I'm uncertain about how to make the options table backwards compatible. The main issue with the options table is that formerly there was a "type" column. The "type" column meant different things in different parts of the code, and the meaning was overall inconsistent. Now the "type" column has been renamed to "accessibleScopes" and was assigned consistent meaning throughout the code base. As part of that change the possible values for "accessibleScopes" has changed and how it is assigned those values has changed as well. Making it truly backward compatible would require emulating the previously bad behavior, and I'm not sure how useful that would be. It would be possible to revert the name for the column back to "type", but I think that would have limited usefulness since the value stored for "type" will be different in almost every case from what it was before, so that would also break any scripts that intimately depended on it as well. What are your thoughts on how to handle this?
> 
> Thanks,
> Tim
> 
> 
> ________________________________
> From: Paul Rogers <pr...@mapr.com>
> Sent: Thursday, September 21, 2017 5:50:26 PM
> To: dev@drill.apache.org
> Subject: Re: Backwards Compatibility Policy
> 
> Hi Tim,
> 
> Unfortunately, Drill has no version compatibility guidelines. We break compatibility all the time — often by accident. As Drill matures, we should consider defining such a policy.
> 
> Experience with other systems suggests that provide an automatic way to handle the inevitable evolution of APIs, data formats and the like. This is done for user convenience, and (in commercial products) to avoid support calls.
> 
> Would have been great if we had a version number in ZK. Would solve problems not just with options, but with storage plugins when we change their classes (and hence the JSON stored in ZK.) But, we don’t…
> 
> Vitalii recently added versioning for the Parquet metadata file to avoid the need for users to delete all the metadata each time we make a change. Would be great if we could follow that example for other areas.
> 
> In the meantime, perhaps you can implement a way to read the old format ZK but write the new format.
> 
> I believe you also changed the layout of the system table for options. These were long-needed improvements. Still, I wonder if anyone has a script that depends on the old format? Do we need a way to support the old format, while offering a new table with the new format? Jyothsna recently did that as part of her work in options; I wonder if something like that is needed here also? In fact, you may be able to simply alter the table that she added: it hasn’t see the light of a Drill release yet.
> 
> Thanks,
> 
> - Paul
> 
>> On Sep 21, 2017, at 4:16 PM, Timothy Farkas <tf...@mapr.com> wrote:
>> 
>> Makes sense Abhishek, I'll work on making it backwards compatible then.
>> 
>> Thanks,
>> Tim
>> 
>> ________________________________
>> From: Abhishek Girish <ag...@apache.org>
>> Sent: Thursday, September 21, 2017 3:58:55 PM
>> To: dev@drill.apache.org
>> Subject: Re: Backwards Compatibility Policy
>> 
>> Hey Tim,
>> 
>> Requiring users to purge Drill's ZK data is not advisable and we might not
>> want to go that route. We need to have a seamless upgrade path - for
>> instance modifying values found to be in an older format to the new one,
>> without explicit user interaction.
>> 
>> Regards,
>> Abhishek
>> 
>> On Thu, Sep 21, 2017 at 3:46 PM, Timothy Farkas <tf...@mapr.com> wrote:
>> 
>>> Hi All,
>>> 
>>> I recently made a change to the option system which impacted the fields
>>> contained in OptionValues and hence the format of option information we are
>>> storing in zookeeper. So it is currently not backward compatible with old
>>> system options stored in zookeeper. Two ways to resolve the issue are to
>>> require old data to be purged from zookeeper when upgrading the cluster or
>>> to attempt to allow backward compatibility by modifying the deserializer
>>> for OptionValue. So my question is what is our stance on backwards
>>> compatibility?
>>> 
>>> Thanks,
>>> Tim
>>> 
> 


Re: Backwards Compatibility Policy

Posted by Timothy Farkas <tf...@mapr.com>.
Hi Paul,

I've implemented the ZK fix. But I'm uncertain about how to make the options table backwards compatible. The main issue with the options table is that formerly there was a "type" column. The "type" column meant different things in different parts of the code, and the meaning was overall inconsistent. Now the "type" column has been renamed to "accessibleScopes" and was assigned consistent meaning throughout the code base. As part of that change the possible values for "accessibleScopes" has changed and how it is assigned those values has changed as well. Making it truly backward compatible would require emulating the previously bad behavior, and I'm not sure how useful that would be. It would be possible to revert the name for the column back to "type", but I think that would have limited usefulness since the value stored for "type" will be different in almost every case from what it was before, so that would also break any scripts that intimately depended on it as well. What are your thoughts on how to handle this?

Thanks,
Tim


________________________________
From: Paul Rogers <pr...@mapr.com>
Sent: Thursday, September 21, 2017 5:50:26 PM
To: dev@drill.apache.org
Subject: Re: Backwards Compatibility Policy

Hi Tim,

Unfortunately, Drill has no version compatibility guidelines. We break compatibility all the time — often by accident. As Drill matures, we should consider defining such a policy.

Experience with other systems suggests that provide an automatic way to handle the inevitable evolution of APIs, data formats and the like. This is done for user convenience, and (in commercial products) to avoid support calls.

Would have been great if we had a version number in ZK. Would solve problems not just with options, but with storage plugins when we change their classes (and hence the JSON stored in ZK.) But, we don’t…

Vitalii recently added versioning for the Parquet metadata file to avoid the need for users to delete all the metadata each time we make a change. Would be great if we could follow that example for other areas.

In the meantime, perhaps you can implement a way to read the old format ZK but write the new format.

I believe you also changed the layout of the system table for options. These were long-needed improvements. Still, I wonder if anyone has a script that depends on the old format? Do we need a way to support the old format, while offering a new table with the new format? Jyothsna recently did that as part of her work in options; I wonder if something like that is needed here also? In fact, you may be able to simply alter the table that she added: it hasn’t see the light of a Drill release yet.

Thanks,

- Paul

> On Sep 21, 2017, at 4:16 PM, Timothy Farkas <tf...@mapr.com> wrote:
>
> Makes sense Abhishek, I'll work on making it backwards compatible then.
>
> Thanks,
> Tim
>
> ________________________________
> From: Abhishek Girish <ag...@apache.org>
> Sent: Thursday, September 21, 2017 3:58:55 PM
> To: dev@drill.apache.org
> Subject: Re: Backwards Compatibility Policy
>
> Hey Tim,
>
> Requiring users to purge Drill's ZK data is not advisable and we might not
> want to go that route. We need to have a seamless upgrade path - for
> instance modifying values found to be in an older format to the new one,
> without explicit user interaction.
>
> Regards,
> Abhishek
>
> On Thu, Sep 21, 2017 at 3:46 PM, Timothy Farkas <tf...@mapr.com> wrote:
>
>> Hi All,
>>
>> I recently made a change to the option system which impacted the fields
>> contained in OptionValues and hence the format of option information we are
>> storing in zookeeper. So it is currently not backward compatible with old
>> system options stored in zookeeper. Two ways to resolve the issue are to
>> require old data to be purged from zookeeper when upgrading the cluster or
>> to attempt to allow backward compatibility by modifying the deserializer
>> for OptionValue. So my question is what is our stance on backwards
>> compatibility?
>>
>> Thanks,
>> Tim
>>


Re: Backwards Compatibility Policy

Posted by Paul Rogers <pr...@mapr.com>.
Hi Tim,

Unfortunately, Drill has no version compatibility guidelines. We break compatibility all the time — often by accident. As Drill matures, we should consider defining such a policy.

Experience with other systems suggests that provide an automatic way to handle the inevitable evolution of APIs, data formats and the like. This is done for user convenience, and (in commercial products) to avoid support calls.

Would have been great if we had a version number in ZK. Would solve problems not just with options, but with storage plugins when we change their classes (and hence the JSON stored in ZK.) But, we don’t…

Vitalii recently added versioning for the Parquet metadata file to avoid the need for users to delete all the metadata each time we make a change. Would be great if we could follow that example for other areas.

In the meantime, perhaps you can implement a way to read the old format ZK but write the new format.

I believe you also changed the layout of the system table for options. These were long-needed improvements. Still, I wonder if anyone has a script that depends on the old format? Do we need a way to support the old format, while offering a new table with the new format? Jyothsna recently did that as part of her work in options; I wonder if something like that is needed here also? In fact, you may be able to simply alter the table that she added: it hasn’t see the light of a Drill release yet.

Thanks,

- Paul

> On Sep 21, 2017, at 4:16 PM, Timothy Farkas <tf...@mapr.com> wrote:
> 
> Makes sense Abhishek, I'll work on making it backwards compatible then.
> 
> Thanks,
> Tim
> 
> ________________________________
> From: Abhishek Girish <ag...@apache.org>
> Sent: Thursday, September 21, 2017 3:58:55 PM
> To: dev@drill.apache.org
> Subject: Re: Backwards Compatibility Policy
> 
> Hey Tim,
> 
> Requiring users to purge Drill's ZK data is not advisable and we might not
> want to go that route. We need to have a seamless upgrade path - for
> instance modifying values found to be in an older format to the new one,
> without explicit user interaction.
> 
> Regards,
> Abhishek
> 
> On Thu, Sep 21, 2017 at 3:46 PM, Timothy Farkas <tf...@mapr.com> wrote:
> 
>> Hi All,
>> 
>> I recently made a change to the option system which impacted the fields
>> contained in OptionValues and hence the format of option information we are
>> storing in zookeeper. So it is currently not backward compatible with old
>> system options stored in zookeeper. Two ways to resolve the issue are to
>> require old data to be purged from zookeeper when upgrading the cluster or
>> to attempt to allow backward compatibility by modifying the deserializer
>> for OptionValue. So my question is what is our stance on backwards
>> compatibility?
>> 
>> Thanks,
>> Tim
>> 


Re: Backwards Compatibility Policy

Posted by Timothy Farkas <tf...@mapr.com>.
Makes sense Abhishek, I'll work on making it backwards compatible then.

Thanks,
Tim

________________________________
From: Abhishek Girish <ag...@apache.org>
Sent: Thursday, September 21, 2017 3:58:55 PM
To: dev@drill.apache.org
Subject: Re: Backwards Compatibility Policy

Hey Tim,

Requiring users to purge Drill's ZK data is not advisable and we might not
want to go that route. We need to have a seamless upgrade path - for
instance modifying values found to be in an older format to the new one,
without explicit user interaction.

Regards,
Abhishek

On Thu, Sep 21, 2017 at 3:46 PM, Timothy Farkas <tf...@mapr.com> wrote:

> Hi All,
>
> I recently made a change to the option system which impacted the fields
> contained in OptionValues and hence the format of option information we are
> storing in zookeeper. So it is currently not backward compatible with old
> system options stored in zookeeper. Two ways to resolve the issue are to
> require old data to be purged from zookeeper when upgrading the cluster or
> to attempt to allow backward compatibility by modifying the deserializer
> for OptionValue. So my question is what is our stance on backwards
> compatibility?
>
> Thanks,
> Tim
>

Re: Backwards Compatibility Policy

Posted by Abhishek Girish <ag...@apache.org>.
Hey Tim,

Requiring users to purge Drill's ZK data is not advisable and we might not
want to go that route. We need to have a seamless upgrade path - for
instance modifying values found to be in an older format to the new one,
without explicit user interaction.

Regards,
Abhishek

On Thu, Sep 21, 2017 at 3:46 PM, Timothy Farkas <tf...@mapr.com> wrote:

> Hi All,
>
> I recently made a change to the option system which impacted the fields
> contained in OptionValues and hence the format of option information we are
> storing in zookeeper. So it is currently not backward compatible with old
> system options stored in zookeeper. Two ways to resolve the issue are to
> require old data to be purged from zookeeper when upgrading the cluster or
> to attempt to allow backward compatibility by modifying the deserializer
> for OptionValue. So my question is what is our stance on backwards
> compatibility?
>
> Thanks,
> Tim
>