You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Etienne Chauchot <ec...@apache.org> on 2020/03/06 10:26:30 UTC

Re: A new reworked Elasticsearch 7+ IO module

Hi all,

it's been 3 weeks since the survey on ES versions the users use.

The survey received very few responses: only 9 responses for now 
(multiple versions possible of course). The responses are the following:

ES2: 0 clients, ES5: 1, ES6: 5, ES7: 8

It tends to go toward a drop of ES2 support but for now it is still not 
very representative.

I'm cross-posting to @users to let you know that I'm closing the survey 
within 1 or 2 weeks. So please respond if you're using ESIO.

Best

Etienne

On 13/02/2020 12:37, Etienne Chauchot wrote:
>
> Hi Cham, thanks for your comments !
>
> I just sent an email to user ML with a survey link to count ES uses 
> per version:
>
> https://lists.apache.org/thread.html/rc8185afb8af86a2a032909c13f569e18bd89e75a5839894d5b5d4082%40%3Cuser.beam.apache.org%3E
>
> Best
>
> Etienne
>
> On 10/02/2020 19:46, Chamikara Jayalath wrote:
>>
>>
>> On Thu, Feb 6, 2020 at 8:13 AM Etienne Chauchot <echauchot@apache.org 
>> <ma...@apache.org>> wrote:
>>
>>     Hi,
>>
>>     please see my comments inline
>>
>>     On 06/02/2020 16:24, Alexey Romanenko wrote:
>>>     Please, see my comments inline.
>>>
>>>>     On 6 Feb 2020, at 10:50, Etienne Chauchot <echauchot@apache.org
>>>>     <ma...@apache.org>> wrote:
>>>>>
>>>>>>>             1. regarding version support: ES v2 is no more
>>>>>>>             maintained by Elastic since 2018/02 so we plan to
>>>>>>>             remove it from the IO. In the past we already
>>>>>>>             retired versions (like spark 1.6 for instance).
>>>>>>>
>>>>>
>>>>>         My only concern here is that there might be users who use
>>>>>         the existing module who might not be able to easily
>>>>>         upgrade the Beam version if we remove it. But given that
>>>>>         V2 is 5 versions behind the latest release this might be OK.
>>>>>
>>>>>
>>>>>     It seems we have a consensus on this.
>>>>>     I think there should be another general discussion on the long
>>>>>     term support of our prefered tool IO modules.
>>>>
>>>>     => yes, consensus, let's drop ESV2
>>>>
>>>     We had (and still have) a similar problem with KafkaIO to
>>>     support different versions of Kafka, especially very old version
>>>     0.9. We raised this question on user@ and it appears that there
>>>     are users who for some reasons still use old Kafka versions. So,
>>>     before dropping a support of any ES versions, I’d suggest to ask
>>>     it user@ and see if any people will be affected by this.
>>     Yes we can do a survey among users but the question is, should we
>>     support an ES version that is no more supported by Elastic
>>     themselves ?
>>
>>
>> +1 for asking in the user list. I guess this is more about whether 
>> users need this specific version that we hope to drop support for. 
>> Whether we need to support unsupported versions is a more generic 
>> question that should prob. be addressed in the dev list. (and I 
>> personally don't think we should unless there's a large enough user 
>> base for a given version).
>>
>>>>>>>             2. regarding the user: the aim is to unlock some new
>>>>>>>             features (listed by Ludovic) and give the user more
>>>>>>>             flexibility on his request. For that, it requires to
>>>>>>>             use high level java ES client in place of the low
>>>>>>>             level REST client (that was used because it is the
>>>>>>>             only one compatible with all ES versions). We plan
>>>>>>>             to replace the API (json document in and out) by
>>>>>>>             more complete standard ES objects that contain de
>>>>>>>             request logic (insert/update, doc routing etc...)
>>>>>>>             and the data. There are already IOs like SpannerIO
>>>>>>>             that use similar objects in input PCollection rather
>>>>>>>             than pure POJOs.
>>>>>>>
>>>>>
>>>>>         Won't this be a breaking change for all users ? IMO using
>>>>>         POJOs in PCollections is safer since we have to worry
>>>>>         about changes to the underlying client library API.
>>>>>         Exception would be when underlying client library offers a
>>>>>         backwards compatibility guarantee that we can rely on for
>>>>>         the foreseeable future (for example, BQ TableRow).
>>>>>
>>>>>
>>>>>     Agreed but actually, there will be POJOs in order to abstract
>>>>>     Elasticsearch's version support. The following third point
>>>>>     explains this.
>>>>
>>>>     => indeed it will be a breaking change, hence this email to get
>>>>     a consensus on that. Also I think our wrappers of ES request
>>>>     objects will offer a backward compatible as the underlying objects
>>>>
>>>     I just want to remind that according to what we agreed some time
>>>     ago on dev@ (at least, for IOs), all breaking user API changes
>>>     have to be added along with deprecation of old API that could be
>>>     removed after 3 consecutive Beam releases. In this case, users
>>>     will have a time to move to new API smoothly.
>>
>>     We are more discussing the target architecture of the new module
>>     here but the process of deprecation is important to recall, I
>>     agree. When I say DTOs backward compatible above I mean between
>>     per-version sub-modules inside the new module. Anyway, sure, for
>>     some time, both modules (the old REST-based that supports v2-7
>>     and the new that supports v5-7) will cohabit and the old one will
>>     receive the deprecation annotations.
>>
>>
>> +1 for supporting both versions for at least three minor versions to 
>> give users time to migrate. Also, we should try to produce a warning 
>> for users who use the deprecated versions.
>>
>> Thanks,
>> Cham
>>
>>     Best
>>
>>     Etienne
>>
>>>
>>>

Re: A new reworked Elasticsearch 7+ IO module

Posted by Etienne Chauchot <ec...@apache.org>.
Hi all,

The survey regarding Elasticsearch support in Beam is now closed.

Here are the results after 38 days:

users using

ESv2: 0

ESV5: 1

ESV6: 5

ESV7: 8

So, the new version of ElasticsearchIO after the refactoring discussed 
in this thread will no more support Elasticsearch v2.

Regards

Etienne Chauchot.


On 06/03/2020 11:26, Etienne Chauchot wrote:
>
> Hi all,
>
> it's been 3 weeks since the survey on ES versions the users use.
>
> The survey received very few responses: only 9 responses for now 
> (multiple versions possible of course). The responses are the following:
>
> ES2: 0 clients, ES5: 1, ES6: 5, ES7: 8
>
> It tends to go toward a drop of ES2 support but for now it is still 
> not very representative.
>
> I'm cross-posting to @users to let you know that I'm closing the 
> survey within 1 or 2 weeks. So please respond if you're using ESIO.
>
> Best
>
> Etienne
>
> On 13/02/2020 12:37, Etienne Chauchot wrote:
>>
>> Hi Cham, thanks for your comments !
>>
>> I just sent an email to user ML with a survey link to count ES uses 
>> per version:
>>
>> https://lists.apache.org/thread.html/rc8185afb8af86a2a032909c13f569e18bd89e75a5839894d5b5d4082%40%3Cuser.beam.apache.org%3E
>>
>> Best
>>
>> Etienne
>>
>> On 10/02/2020 19:46, Chamikara Jayalath wrote:
>>>
>>>
>>> On Thu, Feb 6, 2020 at 8:13 AM Etienne Chauchot 
>>> <echauchot@apache.org <ma...@apache.org>> wrote:
>>>
>>>     Hi,
>>>
>>>     please see my comments inline
>>>
>>>     On 06/02/2020 16:24, Alexey Romanenko wrote:
>>>>     Please, see my comments inline.
>>>>
>>>>>     On 6 Feb 2020, at 10:50, Etienne Chauchot
>>>>>     <echauchot@apache.org <ma...@apache.org>> wrote:
>>>>>>
>>>>>>>>             1. regarding version support: ES v2 is no more
>>>>>>>>             maintained by Elastic since 2018/02 so we plan to
>>>>>>>>             remove it from the IO. In the past we already
>>>>>>>>             retired versions (like spark 1.6 for instance).
>>>>>>>>
>>>>>>
>>>>>>         My only concern here is that there might be users who use
>>>>>>         the existing module who might not be able to easily
>>>>>>         upgrade the Beam version if we remove it. But given that
>>>>>>         V2 is 5 versions behind the latest release this might be OK.
>>>>>>
>>>>>>
>>>>>>     It seems we have a consensus on this.
>>>>>>     I think there should be another general discussion on the
>>>>>>     long term support of our prefered tool IO modules.
>>>>>
>>>>>     => yes, consensus, let's drop ESV2
>>>>>
>>>>     We had (and still have) a similar problem with KafkaIO to
>>>>     support different versions of Kafka, especially very old
>>>>     version 0.9. We raised this question on user@ and it appears
>>>>     that there are users who for some reasons still use old Kafka
>>>>     versions. So, before dropping a support of any ES versions, I’d
>>>>     suggest to ask it user@ and see if any people will be affected
>>>>     by this.
>>>     Yes we can do a survey among users but the question is, should
>>>     we support an ES version that is no more supported by Elastic
>>>     themselves ?
>>>
>>>
>>> +1 for asking in the user list. I guess this is more about whether 
>>> users need this specific version that we hope to drop support for. 
>>> Whether we need to support unsupported versions is a more generic 
>>> question that should prob. be addressed in the dev list. (and I 
>>> personally don't think we should unless there's a large enough user 
>>> base for a given version).
>>>
>>>>>>>>             2. regarding the user: the aim is to unlock some
>>>>>>>>             new features (listed by Ludovic) and give the user
>>>>>>>>             more flexibility on his request. For that, it
>>>>>>>>             requires to use high level java ES client in place
>>>>>>>>             of the low level REST client (that was used because
>>>>>>>>             it is the only one compatible with all ES
>>>>>>>>             versions). We plan to replace the API (json
>>>>>>>>             document in and out) by more complete standard ES
>>>>>>>>             objects that contain de request logic
>>>>>>>>             (insert/update, doc routing etc...) and the data.
>>>>>>>>             There are already IOs like SpannerIO that use
>>>>>>>>             similar objects in input PCollection rather than
>>>>>>>>             pure POJOs.
>>>>>>>>
>>>>>>
>>>>>>         Won't this be a breaking change for all users ? IMO using
>>>>>>         POJOs in PCollections is safer since we have to worry
>>>>>>         about changes to the underlying client library API.
>>>>>>         Exception would be when underlying client library offers
>>>>>>         a backwards compatibility guarantee that we can rely on
>>>>>>         for the foreseeable future (for example, BQ TableRow).
>>>>>>
>>>>>>
>>>>>>     Agreed but actually, there will be POJOs in order to abstract
>>>>>>     Elasticsearch's version support. The following third point
>>>>>>     explains this.
>>>>>
>>>>>     => indeed it will be a breaking change, hence this email to
>>>>>     get a consensus on that. Also I think our wrappers of ES
>>>>>     request objects will offer a backward compatible as the
>>>>>     underlying objects
>>>>>
>>>>     I just want to remind that according to what we agreed some
>>>>     time ago on dev@ (at least, for IOs), all breaking user API
>>>>     changes have to be added along with deprecation of old API that
>>>>     could be removed after 3 consecutive Beam releases. In this
>>>>     case, users will have a time to move to new API smoothly.
>>>
>>>     We are more discussing the target architecture of the new module
>>>     here but the process of deprecation is important to recall, I
>>>     agree. When I say DTOs backward compatible above I mean between
>>>     per-version sub-modules inside the new module. Anyway, sure, for
>>>     some time, both modules (the old REST-based that supports v2-7
>>>     and the new that supports v5-7) will cohabit and the old one
>>>     will receive the deprecation annotations.
>>>
>>>
>>> +1 for supporting both versions for at least three minor versions to 
>>> give users time to migrate. Also, we should try to produce a warning 
>>> for users who use the deprecated versions.
>>>
>>> Thanks,
>>> Cham
>>>
>>>     Best
>>>
>>>     Etienne
>>>
>>>>
>>>>

Re: A new reworked Elasticsearch 7+ IO module

Posted by Etienne Chauchot <ec...@apache.org>.
Hi all,

The survey regarding Elasticsearch support in Beam is now closed.

Here are the results after 38 days:

users using

ESv2: 0

ESV5: 1

ESV6: 5

ESV7: 8

So, the new version of ElasticsearchIO after the refactoring discussed 
in this thread will no more support Elasticsearch v2.

Regards

Etienne Chauchot.


On 06/03/2020 11:26, Etienne Chauchot wrote:
>
> Hi all,
>
> it's been 3 weeks since the survey on ES versions the users use.
>
> The survey received very few responses: only 9 responses for now 
> (multiple versions possible of course). The responses are the following:
>
> ES2: 0 clients, ES5: 1, ES6: 5, ES7: 8
>
> It tends to go toward a drop of ES2 support but for now it is still 
> not very representative.
>
> I'm cross-posting to @users to let you know that I'm closing the 
> survey within 1 or 2 weeks. So please respond if you're using ESIO.
>
> Best
>
> Etienne
>
> On 13/02/2020 12:37, Etienne Chauchot wrote:
>>
>> Hi Cham, thanks for your comments !
>>
>> I just sent an email to user ML with a survey link to count ES uses 
>> per version:
>>
>> https://lists.apache.org/thread.html/rc8185afb8af86a2a032909c13f569e18bd89e75a5839894d5b5d4082%40%3Cuser.beam.apache.org%3E
>>
>> Best
>>
>> Etienne
>>
>> On 10/02/2020 19:46, Chamikara Jayalath wrote:
>>>
>>>
>>> On Thu, Feb 6, 2020 at 8:13 AM Etienne Chauchot 
>>> <echauchot@apache.org <ma...@apache.org>> wrote:
>>>
>>>     Hi,
>>>
>>>     please see my comments inline
>>>
>>>     On 06/02/2020 16:24, Alexey Romanenko wrote:
>>>>     Please, see my comments inline.
>>>>
>>>>>     On 6 Feb 2020, at 10:50, Etienne Chauchot
>>>>>     <echauchot@apache.org <ma...@apache.org>> wrote:
>>>>>>
>>>>>>>>             1. regarding version support: ES v2 is no more
>>>>>>>>             maintained by Elastic since 2018/02 so we plan to
>>>>>>>>             remove it from the IO. In the past we already
>>>>>>>>             retired versions (like spark 1.6 for instance).
>>>>>>>>
>>>>>>
>>>>>>         My only concern here is that there might be users who use
>>>>>>         the existing module who might not be able to easily
>>>>>>         upgrade the Beam version if we remove it. But given that
>>>>>>         V2 is 5 versions behind the latest release this might be OK.
>>>>>>
>>>>>>
>>>>>>     It seems we have a consensus on this.
>>>>>>     I think there should be another general discussion on the
>>>>>>     long term support of our prefered tool IO modules.
>>>>>
>>>>>     => yes, consensus, let's drop ESV2
>>>>>
>>>>     We had (and still have) a similar problem with KafkaIO to
>>>>     support different versions of Kafka, especially very old
>>>>     version 0.9. We raised this question on user@ and it appears
>>>>     that there are users who for some reasons still use old Kafka
>>>>     versions. So, before dropping a support of any ES versions, I’d
>>>>     suggest to ask it user@ and see if any people will be affected
>>>>     by this.
>>>     Yes we can do a survey among users but the question is, should
>>>     we support an ES version that is no more supported by Elastic
>>>     themselves ?
>>>
>>>
>>> +1 for asking in the user list. I guess this is more about whether 
>>> users need this specific version that we hope to drop support for. 
>>> Whether we need to support unsupported versions is a more generic 
>>> question that should prob. be addressed in the dev list. (and I 
>>> personally don't think we should unless there's a large enough user 
>>> base for a given version).
>>>
>>>>>>>>             2. regarding the user: the aim is to unlock some
>>>>>>>>             new features (listed by Ludovic) and give the user
>>>>>>>>             more flexibility on his request. For that, it
>>>>>>>>             requires to use high level java ES client in place
>>>>>>>>             of the low level REST client (that was used because
>>>>>>>>             it is the only one compatible with all ES
>>>>>>>>             versions). We plan to replace the API (json
>>>>>>>>             document in and out) by more complete standard ES
>>>>>>>>             objects that contain de request logic
>>>>>>>>             (insert/update, doc routing etc...) and the data.
>>>>>>>>             There are already IOs like SpannerIO that use
>>>>>>>>             similar objects in input PCollection rather than
>>>>>>>>             pure POJOs.
>>>>>>>>
>>>>>>
>>>>>>         Won't this be a breaking change for all users ? IMO using
>>>>>>         POJOs in PCollections is safer since we have to worry
>>>>>>         about changes to the underlying client library API.
>>>>>>         Exception would be when underlying client library offers
>>>>>>         a backwards compatibility guarantee that we can rely on
>>>>>>         for the foreseeable future (for example, BQ TableRow).
>>>>>>
>>>>>>
>>>>>>     Agreed but actually, there will be POJOs in order to abstract
>>>>>>     Elasticsearch's version support. The following third point
>>>>>>     explains this.
>>>>>
>>>>>     => indeed it will be a breaking change, hence this email to
>>>>>     get a consensus on that. Also I think our wrappers of ES
>>>>>     request objects will offer a backward compatible as the
>>>>>     underlying objects
>>>>>
>>>>     I just want to remind that according to what we agreed some
>>>>     time ago on dev@ (at least, for IOs), all breaking user API
>>>>     changes have to be added along with deprecation of old API that
>>>>     could be removed after 3 consecutive Beam releases. In this
>>>>     case, users will have a time to move to new API smoothly.
>>>
>>>     We are more discussing the target architecture of the new module
>>>     here but the process of deprecation is important to recall, I
>>>     agree. When I say DTOs backward compatible above I mean between
>>>     per-version sub-modules inside the new module. Anyway, sure, for
>>>     some time, both modules (the old REST-based that supports v2-7
>>>     and the new that supports v5-7) will cohabit and the old one
>>>     will receive the deprecation annotations.
>>>
>>>
>>> +1 for supporting both versions for at least three minor versions to 
>>> give users time to migrate. Also, we should try to produce a warning 
>>> for users who use the deprecated versions.
>>>
>>> Thanks,
>>> Cham
>>>
>>>     Best
>>>
>>>     Etienne
>>>
>>>>
>>>>

Re: A new reworked Elasticsearch 7+ IO module

Posted by Jean-Baptiste Onofre <jb...@nanthrax.net>.
Hi,

I think WARN makes sense and the safest approach. It allows users to be notify and eventually update or back on previous Beam IO version.

Regards
JB

> Le 6 mars 2020 à 18:49, Kenneth Knowles <ke...@apache.org> a écrit :
> 
> Since the user provides backendVersion, here are some possible levels of things to add in expand() based on that (these are extra niceties beyond the agreed number of releases to remove)
> 
>  - WARN for backendVersion < n
>  - reject for backendVersion < n with opt-in pipeline option to keep it working one more version (gets their attention and indicates urgency)
>  - reject completely
> 
> Kenn
> 
> On Fri, Mar 6, 2020 at 2:26 AM Etienne Chauchot <echauchot@apache.org <ma...@apache.org>> wrote:
> Hi all, 
> 
> it's been 3 weeks since the survey on ES versions the users use. 
> 
> The survey received very few responses: only 9 responses for now (multiple versions possible of course). The responses are the following:
> 
> ES2: 0 clients, ES5: 1, ES6: 5, ES7: 8 
> 
> It tends to go toward a drop of ES2 support but for now it is still not very representative.
> 
> I'm cross-posting to @users to let you know that I'm closing the survey within 1 or 2 weeks. So please respond if you're using ESIO.
> 
> Best
> 
> Etienne
> 
> On 13/02/2020 12:37, Etienne Chauchot wrote:
>> Hi Cham, thanks for your comments !
>> 
>> I just sent an email to user ML with a survey link to count ES uses per version:
>> 
>> https://lists.apache.org/thread.html/rc8185afb8af86a2a032909c13f569e18bd89e75a5839894d5b5d4082%40%3Cuser.beam.apache.org%3E <https://lists.apache.org/thread.html/rc8185afb8af86a2a032909c13f569e18bd89e75a5839894d5b5d4082%40%3Cuser.beam.apache.org%3E>
>> Best
>> 
>> Etienne
>> 
>> On 10/02/2020 19:46, Chamikara Jayalath wrote:
>>> 
>>> 
>>> On Thu, Feb 6, 2020 at 8:13 AM Etienne Chauchot <echauchot@apache.org <ma...@apache.org>> wrote:
>>> Hi,
>>> 
>>> please see my comments inline
>>> 
>>> On 06/02/2020 16:24, Alexey Romanenko wrote:
>>>> Please, see my comments inline.
>>>> 
>>>>> On 6 Feb 2020, at 10:50, Etienne Chauchot <echauchot@apache.org <ma...@apache.org>> wrote:
>>>>>>>> 1. regarding version support: ES v2 is no more maintained by Elastic since 2018/02 so we plan to remove it from the IO. In the past we already retired versions (like spark 1.6 for instance).
>>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> My only concern here is that there might be users who use the existing module who might not be able to easily upgrade the Beam version if we remove it. But given that V2 is 5 versions behind the latest release this might be OK.
>>>>>> 
>>>>>> It seems we have a consensus on this.
>>>>>> I think there should be another general discussion on the long term support of our prefered tool IO modules.
>>>>> => yes, consensus, let's drop ESV2
>>>>> 
>>>> We had (and still have) a similar problem with KafkaIO to support different versions of Kafka, especially very old version 0.9. We raised this question on user@ and it appears that there are users who for some reasons still use old Kafka versions. So, before dropping a support of any ES versions, I’d suggest to ask it user@ and see if any people will be affected by this.
>>> Yes we can do a survey among users but the question is, should we support an ES version that is no more supported by Elastic themselves ?
>>> 
>>> +1 for asking in the user list. I guess this is more about whether users need this specific version that we hope to drop support for. Whether we need to support unsupported versions is a more generic question that should prob. be addressed in the dev list. (and I personally don't think we should unless there's a large enough user base for a given version).
>>> 
>>>>> 
>>>>>>>> 2. regarding the user: the aim is to unlock some new features (listed by Ludovic) and give the user more flexibility on his request. For that, it requires to use high level java ES client in place of the low level REST client (that was used because it is the only one compatible with all ES versions). We plan to replace the API (json document in and out) by more complete standard ES objects that contain de request logic (insert/update, doc routing etc...) and the data. There are already IOs like SpannerIO that use similar objects in input PCollection rather than pure POJOs. 
>>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Won't this be a breaking change for all users ? IMO using POJOs in PCollections is safer since we have to worry about changes to the underlying client library API. Exception would be when underlying client library offers a backwards compatibility guarantee that we can rely on for the foreseeable future (for example, BQ TableRow).
>>>>>> 
>>>>>> Agreed but actually, there will be POJOs in order to abstract Elasticsearch's version support. The following third point explains this.
>>>>> => indeed it will be a breaking change, hence this email to get a consensus on that. Also I think our wrappers of ES request objects will offer a backward compatible as the underlying objects
>>>>> 
>>>> I just want to remind that according to what we agreed some time ago on dev@ (at least, for IOs), all breaking user API changes have to be added along with deprecation of old API that could be removed after 3 consecutive Beam releases. In this case, users will have a time to move to new API smoothly. 
>>> We are more discussing the target architecture of the new module here but the process of deprecation is important to recall, I agree. When I say DTOs backward compatible above I mean between per-version sub-modules inside the new module. Anyway, sure, for some time, both modules (the old REST-based that supports v2-7 and the new that supports v5-7) will cohabit and the old one will receive the deprecation annotations. 
>>> 
>>> 
>>> +1 for supporting both versions for at least three minor versions to give users time to migrate. Also, we should try to produce a warning for users who use the deprecated versions.
>>> 
>>> Thanks,
>>> Cham
>>>  
>>> 
>>> Best 
>>> 
>>> Etienne
>>> 
>>>> 
>>>> 


Re: A new reworked Elasticsearch 7+ IO module

Posted by Jean-Baptiste Onofre <jb...@nanthrax.net>.
Hi,

I think WARN makes sense and the safest approach. It allows users to be notify and eventually update or back on previous Beam IO version.

Regards
JB

> Le 6 mars 2020 à 18:49, Kenneth Knowles <ke...@apache.org> a écrit :
> 
> Since the user provides backendVersion, here are some possible levels of things to add in expand() based on that (these are extra niceties beyond the agreed number of releases to remove)
> 
>  - WARN for backendVersion < n
>  - reject for backendVersion < n with opt-in pipeline option to keep it working one more version (gets their attention and indicates urgency)
>  - reject completely
> 
> Kenn
> 
> On Fri, Mar 6, 2020 at 2:26 AM Etienne Chauchot <echauchot@apache.org <ma...@apache.org>> wrote:
> Hi all, 
> 
> it's been 3 weeks since the survey on ES versions the users use. 
> 
> The survey received very few responses: only 9 responses for now (multiple versions possible of course). The responses are the following:
> 
> ES2: 0 clients, ES5: 1, ES6: 5, ES7: 8 
> 
> It tends to go toward a drop of ES2 support but for now it is still not very representative.
> 
> I'm cross-posting to @users to let you know that I'm closing the survey within 1 or 2 weeks. So please respond if you're using ESIO.
> 
> Best
> 
> Etienne
> 
> On 13/02/2020 12:37, Etienne Chauchot wrote:
>> Hi Cham, thanks for your comments !
>> 
>> I just sent an email to user ML with a survey link to count ES uses per version:
>> 
>> https://lists.apache.org/thread.html/rc8185afb8af86a2a032909c13f569e18bd89e75a5839894d5b5d4082%40%3Cuser.beam.apache.org%3E <https://lists.apache.org/thread.html/rc8185afb8af86a2a032909c13f569e18bd89e75a5839894d5b5d4082%40%3Cuser.beam.apache.org%3E>
>> Best
>> 
>> Etienne
>> 
>> On 10/02/2020 19:46, Chamikara Jayalath wrote:
>>> 
>>> 
>>> On Thu, Feb 6, 2020 at 8:13 AM Etienne Chauchot <echauchot@apache.org <ma...@apache.org>> wrote:
>>> Hi,
>>> 
>>> please see my comments inline
>>> 
>>> On 06/02/2020 16:24, Alexey Romanenko wrote:
>>>> Please, see my comments inline.
>>>> 
>>>>> On 6 Feb 2020, at 10:50, Etienne Chauchot <echauchot@apache.org <ma...@apache.org>> wrote:
>>>>>>>> 1. regarding version support: ES v2 is no more maintained by Elastic since 2018/02 so we plan to remove it from the IO. In the past we already retired versions (like spark 1.6 for instance).
>>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> My only concern here is that there might be users who use the existing module who might not be able to easily upgrade the Beam version if we remove it. But given that V2 is 5 versions behind the latest release this might be OK.
>>>>>> 
>>>>>> It seems we have a consensus on this.
>>>>>> I think there should be another general discussion on the long term support of our prefered tool IO modules.
>>>>> => yes, consensus, let's drop ESV2
>>>>> 
>>>> We had (and still have) a similar problem with KafkaIO to support different versions of Kafka, especially very old version 0.9. We raised this question on user@ and it appears that there are users who for some reasons still use old Kafka versions. So, before dropping a support of any ES versions, I’d suggest to ask it user@ and see if any people will be affected by this.
>>> Yes we can do a survey among users but the question is, should we support an ES version that is no more supported by Elastic themselves ?
>>> 
>>> +1 for asking in the user list. I guess this is more about whether users need this specific version that we hope to drop support for. Whether we need to support unsupported versions is a more generic question that should prob. be addressed in the dev list. (and I personally don't think we should unless there's a large enough user base for a given version).
>>> 
>>>>> 
>>>>>>>> 2. regarding the user: the aim is to unlock some new features (listed by Ludovic) and give the user more flexibility on his request. For that, it requires to use high level java ES client in place of the low level REST client (that was used because it is the only one compatible with all ES versions). We plan to replace the API (json document in and out) by more complete standard ES objects that contain de request logic (insert/update, doc routing etc...) and the data. There are already IOs like SpannerIO that use similar objects in input PCollection rather than pure POJOs. 
>>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Won't this be a breaking change for all users ? IMO using POJOs in PCollections is safer since we have to worry about changes to the underlying client library API. Exception would be when underlying client library offers a backwards compatibility guarantee that we can rely on for the foreseeable future (for example, BQ TableRow).
>>>>>> 
>>>>>> Agreed but actually, there will be POJOs in order to abstract Elasticsearch's version support. The following third point explains this.
>>>>> => indeed it will be a breaking change, hence this email to get a consensus on that. Also I think our wrappers of ES request objects will offer a backward compatible as the underlying objects
>>>>> 
>>>> I just want to remind that according to what we agreed some time ago on dev@ (at least, for IOs), all breaking user API changes have to be added along with deprecation of old API that could be removed after 3 consecutive Beam releases. In this case, users will have a time to move to new API smoothly. 
>>> We are more discussing the target architecture of the new module here but the process of deprecation is important to recall, I agree. When I say DTOs backward compatible above I mean between per-version sub-modules inside the new module. Anyway, sure, for some time, both modules (the old REST-based that supports v2-7 and the new that supports v5-7) will cohabit and the old one will receive the deprecation annotations. 
>>> 
>>> 
>>> +1 for supporting both versions for at least three minor versions to give users time to migrate. Also, we should try to produce a warning for users who use the deprecated versions.
>>> 
>>> Thanks,
>>> Cham
>>>  
>>> 
>>> Best 
>>> 
>>> Etienne
>>> 
>>>> 
>>>> 


Re: A new reworked Elasticsearch 7+ IO module

Posted by Etienne Chauchot <ec...@apache.org>.
Hi Kenn,

The user does not specify the backendVersion targeted (at least on the 
current version of the IO) it is transparent to him: the IO detects the 
version with a REST call and adapts its behavior. But, anyway, I agree, 
we need to put at least a WARN if detected version is 2. As the new IO 
will not be compatible with ESV2 (because ES classes differ too much to 
have a common production basis), the only option on the new IO is to 
reject completely if version is 2 IMHO.

Best

Etienne

On 06/03/2020 18:49, Kenneth Knowles wrote:
> Since the user provides backendVersion, here are some possible levels 
> of things to add in expand() based on that (these are extra niceties 
> beyond the agreed number of releases to remove)
>
>  - WARN for backendVersion < n
>  - reject for backendVersion < n with opt-in pipeline option to keep 
> it working one more version (gets their attention and indicates urgency)
>  - reject completely
>
> Kenn
>
> On Fri, Mar 6, 2020 at 2:26 AM Etienne Chauchot <echauchot@apache.org 
> <ma...@apache.org>> wrote:
>
>     Hi all,
>
>     it's been 3 weeks since the survey on ES versions the users use.
>
>     The survey received very few responses: only 9 responses for now
>     (multiple versions possible of course). The responses are the
>     following:
>
>     ES2: 0 clients, ES5: 1, ES6: 5, ES7: 8
>
>     It tends to go toward a drop of ES2 support but for now it is
>     still not very representative.
>
>     I'm cross-posting to @users to let you know that I'm closing the
>     survey within 1 or 2 weeks. So please respond if you're using ESIO.
>
>     Best
>
>     Etienne
>
>     On 13/02/2020 12:37, Etienne Chauchot wrote:
>>
>>     Hi Cham, thanks for your comments !
>>
>>     I just sent an email to user ML with a survey link to count ES
>>     uses per version:
>>
>>     https://lists.apache.org/thread.html/rc8185afb8af86a2a032909c13f569e18bd89e75a5839894d5b5d4082%40%3Cuser.beam.apache.org%3E
>>
>>     Best
>>
>>     Etienne
>>
>>     On 10/02/2020 19:46, Chamikara Jayalath wrote:
>>>
>>>
>>>     On Thu, Feb 6, 2020 at 8:13 AM Etienne Chauchot
>>>     <echauchot@apache.org <ma...@apache.org>> wrote:
>>>
>>>         Hi,
>>>
>>>         please see my comments inline
>>>
>>>         On 06/02/2020 16:24, Alexey Romanenko wrote:
>>>>         Please, see my comments inline.
>>>>
>>>>>         On 6 Feb 2020, at 10:50, Etienne Chauchot
>>>>>         <echauchot@apache.org <ma...@apache.org>> wrote:
>>>>>>
>>>>>>>>                 1. regarding version support: ES v2 is no more
>>>>>>>>                 maintained by Elastic since 2018/02 so we plan
>>>>>>>>                 to remove it from the IO. In the past we
>>>>>>>>                 already retired versions (like spark 1.6 for
>>>>>>>>                 instance).
>>>>>>>>
>>>>>>
>>>>>>             My only concern here is that there might be users who
>>>>>>             use the existing module who might not be able to
>>>>>>             easily upgrade the Beam version if we remove it. But
>>>>>>             given that V2 is 5 versions behind the latest release
>>>>>>             this might be OK.
>>>>>>
>>>>>>
>>>>>>         It seems we have a consensus on this.
>>>>>>         I think there should be another general discussion on the
>>>>>>         long term support of our prefered tool IO modules.
>>>>>
>>>>>         => yes, consensus, let's drop ESV2
>>>>>
>>>>         We had (and still have) a similar problem with KafkaIO to
>>>>         support different versions of Kafka, especially very old
>>>>         version 0.9. We raised this question on user@ and it
>>>>         appears that there are users who for some reasons still use
>>>>         old Kafka versions. So, before dropping a support of any ES
>>>>         versions, I’d suggest to ask it user@ and see if any people
>>>>         will be affected by this.
>>>         Yes we can do a survey among users but the question is,
>>>         should we support an ES version that is no more supported by
>>>         Elastic themselves ?
>>>
>>>
>>>     +1 for asking in the user list. I guess this is more about
>>>     whether users need this specific version that we hope to drop
>>>     support for. Whether we need to support unsupported versions is
>>>     a more generic question that should prob. be addressed in the
>>>     dev list. (and I personally don't think we should unless there's
>>>     a large enough user base for a given version).
>>>
>>>>>>>>                 2. regarding the user: the aim is to unlock
>>>>>>>>                 some new features (listed by Ludovic) and give
>>>>>>>>                 the user more flexibility on his request. For
>>>>>>>>                 that, it requires to use high level java ES
>>>>>>>>                 client in place of the low level REST client
>>>>>>>>                 (that was used because it is the only one
>>>>>>>>                 compatible with all ES versions). We plan to
>>>>>>>>                 replace the API (json document in and out) by
>>>>>>>>                 more complete standard ES objects that contain
>>>>>>>>                 de request logic (insert/update, doc routing
>>>>>>>>                 etc...) and the data. There are already IOs
>>>>>>>>                 like SpannerIO that use similar objects in
>>>>>>>>                 input PCollection rather than pure POJOs.
>>>>>>>>
>>>>>>
>>>>>>             Won't this be a breaking change for all users ? IMO
>>>>>>             using POJOs in PCollections is safer since we have to
>>>>>>             worry about changes to the underlying client library
>>>>>>             API. Exception would be when underlying client
>>>>>>             library offers a backwards compatibility guarantee
>>>>>>             that we can rely on for the foreseeable future (for
>>>>>>             example, BQ TableRow).
>>>>>>
>>>>>>
>>>>>>         Agreed but actually, there will be POJOs in order to
>>>>>>         abstract Elasticsearch's version support. The following
>>>>>>         third point explains this.
>>>>>
>>>>>         => indeed it will be a breaking change, hence this email
>>>>>         to get a consensus on that. Also I think our wrappers of
>>>>>         ES request objects will offer a backward compatible as the
>>>>>         underlying objects
>>>>>
>>>>         I just want to remind that according to what we agreed some
>>>>         time ago on dev@ (at least, for IOs), all breaking user API
>>>>         changes have to be added along with deprecation of old API
>>>>         that could be removed after 3 consecutive Beam releases. In
>>>>         this case, users will have a time to move to new API smoothly.
>>>
>>>         We are more discussing the target architecture of the new
>>>         module here but the process of deprecation is important to
>>>         recall, I agree. When I say DTOs backward compatible above I
>>>         mean between per-version sub-modules inside the new module.
>>>         Anyway, sure, for some time, both modules (the old
>>>         REST-based that supports v2-7 and the new that supports
>>>         v5-7) will cohabit and the old one will receive the
>>>         deprecation annotations.
>>>
>>>
>>>     +1 for supporting both versions for at least three minor
>>>     versions to give users time to migrate. Also, we should try to
>>>     produce a warning for users who use the deprecated versions.
>>>
>>>     Thanks,
>>>     Cham
>>>
>>>         Best
>>>
>>>         Etienne
>>>
>>>>
>>>>

Re: A new reworked Elasticsearch 7+ IO module

Posted by Kenneth Knowles <ke...@apache.org>.
Since the user provides backendVersion, here are some possible levels of
things to add in expand() based on that (these are extra niceties beyond
the agreed number of releases to remove)

 - WARN for backendVersion < n
 - reject for backendVersion < n with opt-in pipeline option to keep it
working one more version (gets their attention and indicates urgency)
 - reject completely

Kenn

On Fri, Mar 6, 2020 at 2:26 AM Etienne Chauchot <ec...@apache.org>
wrote:

> Hi all,
>
> it's been 3 weeks since the survey on ES versions the users use.
>
> The survey received very few responses: only 9 responses for now (multiple
> versions possible of course). The responses are the following:
>
> ES2: 0 clients, ES5: 1, ES6: 5, ES7: 8
>
> It tends to go toward a drop of ES2 support but for now it is still not
> very representative.
>
> I'm cross-posting to @users to let you know that I'm closing the survey
> within 1 or 2 weeks. So please respond if you're using ESIO.
>
> Best
>
> Etienne
> On 13/02/2020 12:37, Etienne Chauchot wrote:
>
> Hi Cham, thanks for your comments !
>
> I just sent an email to user ML with a survey link to count ES uses per
> version:
>
>
> https://lists.apache.org/thread.html/rc8185afb8af86a2a032909c13f569e18bd89e75a5839894d5b5d4082%40%3Cuser.beam.apache.org%3E
>
> Best
>
> Etienne
> On 10/02/2020 19:46, Chamikara Jayalath wrote:
>
>
>
> On Thu, Feb 6, 2020 at 8:13 AM Etienne Chauchot <ec...@apache.org>
> wrote:
>
>> Hi,
>>
>> please see my comments inline
>> On 06/02/2020 16:24, Alexey Romanenko wrote:
>>
>> Please, see my comments inline.
>>
>> On 6 Feb 2020, at 10:50, Etienne Chauchot <ec...@apache.org> wrote:
>>
>> 1. regarding version support: ES v2 is no more maintained by Elastic
>>>> since 2018/02 so we plan to remove it from the IO. In the past we already
>>>> retired versions (like spark 1.6 for instance).
>>>>
>>>>
>>> My only concern here is that there might be users who use the existing
>>> module who might not be able to easily upgrade the Beam version if we
>>> remove it. But given that V2 is 5 versions behind the latest release this
>>> might be OK.
>>>
>>
>> It seems we have a consensus on this.
>> I think there should be another general discussion on the long term
>> support of our prefered tool IO modules.
>>
>> => yes, consensus, let's drop ESV2
>>
>> We had (and still have) a similar problem with KafkaIO to support
>> different versions of Kafka, especially very old version 0.9. We raised
>> this question on user@ and it appears that there are users who for some
>> reasons still use old Kafka versions. So, before dropping a support of any
>> ES versions, I’d suggest to ask it user@ and see if any people will be
>> affected by this.
>>
>> Yes we can do a survey among users but the question is, should we support
>> an ES version that is no more supported by Elastic themselves ?
>>
>
> +1 for asking in the user list. I guess this is more about whether users
> need this specific version that we hope to drop support for. Whether we
> need to support unsupported versions is a more generic question that should
> prob. be addressed in the dev list. (and I personally don't think we should
> unless there's a large enough user base for a given version).
>
> 2. regarding the user: the aim is to unlock some new features (listed by
>>>> Ludovic) and give the user more flexibility on his request. For that, it
>>>> requires to use high level java ES client in place of the low level REST
>>>> client (that was used because it is the only one compatible with all ES
>>>> versions). We plan to replace the API (json document in and out) by more
>>>> complete standard ES objects that contain de request logic (insert/update,
>>>> doc routing etc...) and the data. There are already IOs like SpannerIO that
>>>> use similar objects in input PCollection rather than pure POJOs.
>>>>
>>>>
>>> Won't this be a breaking change for all users ? IMO using POJOs in
>>> PCollections is safer since we have to worry about changes to the
>>> underlying client library API. Exception would be when underlying client
>>> library offers a backwards compatibility guarantee that we can rely on for
>>> the foreseeable future (for example, BQ TableRow).
>>>
>>
>> Agreed but actually, there will be POJOs in order to abstract
>> Elasticsearch's version support. The following third point explains this.
>>
>> => indeed it will be a breaking change, hence this email to get a
>> consensus on that. Also I think our wrappers of ES request objects will
>> offer a backward compatible as the underlying objects
>>
>> I just want to remind that according to what we agreed some time ago on
>> dev@ (at least, for IOs), all breaking user API changes have to be added
>> along with deprecation of old API that could be removed after 3 consecutive
>> Beam releases. In this case, users will have a time to move to new API
>> smoothly.
>>
>> We are more discussing the target architecture of the new module here but
>> the process of deprecation is important to recall, I agree. When I say DTOs
>> backward compatible above I mean between per-version sub-modules inside the
>> new module. Anyway, sure, for some time, both modules (the old REST-based
>> that supports v2-7 and the new that supports v5-7) will cohabit and the old
>> one will receive the deprecation annotations.
>>
>
> +1 for supporting both versions for at least three minor versions to give
> users time to migrate. Also, we should try to produce a warning for users
> who use the deprecated versions.
>
> Thanks,
> Cham
>
>
>> Best
>>
>> Etienne
>>
>>
>>
>>

Re: A new reworked Elasticsearch 7+ IO module

Posted by Kenneth Knowles <ke...@apache.org>.
Since the user provides backendVersion, here are some possible levels of
things to add in expand() based on that (these are extra niceties beyond
the agreed number of releases to remove)

 - WARN for backendVersion < n
 - reject for backendVersion < n with opt-in pipeline option to keep it
working one more version (gets their attention and indicates urgency)
 - reject completely

Kenn

On Fri, Mar 6, 2020 at 2:26 AM Etienne Chauchot <ec...@apache.org>
wrote:

> Hi all,
>
> it's been 3 weeks since the survey on ES versions the users use.
>
> The survey received very few responses: only 9 responses for now (multiple
> versions possible of course). The responses are the following:
>
> ES2: 0 clients, ES5: 1, ES6: 5, ES7: 8
>
> It tends to go toward a drop of ES2 support but for now it is still not
> very representative.
>
> I'm cross-posting to @users to let you know that I'm closing the survey
> within 1 or 2 weeks. So please respond if you're using ESIO.
>
> Best
>
> Etienne
> On 13/02/2020 12:37, Etienne Chauchot wrote:
>
> Hi Cham, thanks for your comments !
>
> I just sent an email to user ML with a survey link to count ES uses per
> version:
>
>
> https://lists.apache.org/thread.html/rc8185afb8af86a2a032909c13f569e18bd89e75a5839894d5b5d4082%40%3Cuser.beam.apache.org%3E
>
> Best
>
> Etienne
> On 10/02/2020 19:46, Chamikara Jayalath wrote:
>
>
>
> On Thu, Feb 6, 2020 at 8:13 AM Etienne Chauchot <ec...@apache.org>
> wrote:
>
>> Hi,
>>
>> please see my comments inline
>> On 06/02/2020 16:24, Alexey Romanenko wrote:
>>
>> Please, see my comments inline.
>>
>> On 6 Feb 2020, at 10:50, Etienne Chauchot <ec...@apache.org> wrote:
>>
>> 1. regarding version support: ES v2 is no more maintained by Elastic
>>>> since 2018/02 so we plan to remove it from the IO. In the past we already
>>>> retired versions (like spark 1.6 for instance).
>>>>
>>>>
>>> My only concern here is that there might be users who use the existing
>>> module who might not be able to easily upgrade the Beam version if we
>>> remove it. But given that V2 is 5 versions behind the latest release this
>>> might be OK.
>>>
>>
>> It seems we have a consensus on this.
>> I think there should be another general discussion on the long term
>> support of our prefered tool IO modules.
>>
>> => yes, consensus, let's drop ESV2
>>
>> We had (and still have) a similar problem with KafkaIO to support
>> different versions of Kafka, especially very old version 0.9. We raised
>> this question on user@ and it appears that there are users who for some
>> reasons still use old Kafka versions. So, before dropping a support of any
>> ES versions, I’d suggest to ask it user@ and see if any people will be
>> affected by this.
>>
>> Yes we can do a survey among users but the question is, should we support
>> an ES version that is no more supported by Elastic themselves ?
>>
>
> +1 for asking in the user list. I guess this is more about whether users
> need this specific version that we hope to drop support for. Whether we
> need to support unsupported versions is a more generic question that should
> prob. be addressed in the dev list. (and I personally don't think we should
> unless there's a large enough user base for a given version).
>
> 2. regarding the user: the aim is to unlock some new features (listed by
>>>> Ludovic) and give the user more flexibility on his request. For that, it
>>>> requires to use high level java ES client in place of the low level REST
>>>> client (that was used because it is the only one compatible with all ES
>>>> versions). We plan to replace the API (json document in and out) by more
>>>> complete standard ES objects that contain de request logic (insert/update,
>>>> doc routing etc...) and the data. There are already IOs like SpannerIO that
>>>> use similar objects in input PCollection rather than pure POJOs.
>>>>
>>>>
>>> Won't this be a breaking change for all users ? IMO using POJOs in
>>> PCollections is safer since we have to worry about changes to the
>>> underlying client library API. Exception would be when underlying client
>>> library offers a backwards compatibility guarantee that we can rely on for
>>> the foreseeable future (for example, BQ TableRow).
>>>
>>
>> Agreed but actually, there will be POJOs in order to abstract
>> Elasticsearch's version support. The following third point explains this.
>>
>> => indeed it will be a breaking change, hence this email to get a
>> consensus on that. Also I think our wrappers of ES request objects will
>> offer a backward compatible as the underlying objects
>>
>> I just want to remind that according to what we agreed some time ago on
>> dev@ (at least, for IOs), all breaking user API changes have to be added
>> along with deprecation of old API that could be removed after 3 consecutive
>> Beam releases. In this case, users will have a time to move to new API
>> smoothly.
>>
>> We are more discussing the target architecture of the new module here but
>> the process of deprecation is important to recall, I agree. When I say DTOs
>> backward compatible above I mean between per-version sub-modules inside the
>> new module. Anyway, sure, for some time, both modules (the old REST-based
>> that supports v2-7 and the new that supports v5-7) will cohabit and the old
>> one will receive the deprecation annotations.
>>
>
> +1 for supporting both versions for at least three minor versions to give
> users time to migrate. Also, we should try to produce a warning for users
> who use the deprecated versions.
>
> Thanks,
> Cham
>
>
>> Best
>>
>> Etienne
>>
>>
>>
>>