You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@beam.apache.org by Etienne Chauchot <ec...@gmail.com> on 2017/06/23 09:51:35 UTC

[DISCUSS] support different versions of backends in an IO

Hi guys,

I'm working on Elasticsearch 5.x support for Beam IO (it only supports 
Elasticsearch 2.x right now). I wanted to have your opinion on some 
points related to maintenance.

In this ES case a big part of the code of the IO is common between ES 
v2.x and ES v5.x. Still, there are some differences:

- initialization of UT (change in embedded test framework)

- Minor differences in one message format

- New feature that will allow improving the split or new feature that is 
worth leveraging (ES pipelines)


=> Question is: what do you think is the best way to architecture the IO 
to reduce maintenance


1. We could have an elasticsearchio-common package and two packages that 
are specific to each version of the backend. But I find it confusing for 
the users to have separate packages and more complex to maintain for us.

2. I'm more in favor of detecting the version at IO initialization time 
and then, in the parts that are different do a simple if (version == x). 
But it will make code paths more complex. Note that for example project 
es-hadoop (ES connectors for big data engines) chose this way.


Another thing related to unit tests: in fact they are more close to 
integration tests as they use an embedded backend server. I did it that 
way because I wanted to unit test things like split that require a real 
instance.
=> What is the recommended way of testing on both supported versions 
knowing that both the test code and the test dependencies are different?

For integration tests (they are mainly used as load testing), the test 
code and the test dependencies are the same between versions because 
there is no embedded ES. So, it will be only needed to run them twice 
against 2 versions of the backend.


What do you think?

PS: sorry for the long email :)

Best!
Etienne

Re: [DISCUSS] support different versions of backends in an IO

Posted by Chamikara Jayalath <ch...@apache.org>.

Probably this will be a common question from IO transform authors as Beam
matures. Probably we should add a section on this to IO authoring guide
[1][2] ?

Thanks,
Cham

[1] https://beam.apache.org/documentation/io/authoring-overview/
[2] https://issues.apache.org/jira/browse/BEAM-1025

On Fri, Jun 23, 2017 at 2:57 AM Jean-Baptiste Onofré <jb...@nanthrax.net>
wrote:

> Hi,
>
> It's something we already discussed in the past (for Kafka by instance).
>
> For Kafka, we were able to use a single IO with spring-el to detect the
> version.
> That's certainly the preferred approach, but it would not be possible in
> all cases.
>
> I would suggest, if first approach doesn't work:
>
> * In term of Maven modules:
>
> - sdk/java/io/elasticsearch/common that could contain shared code + itests
> - sdk/java/io/elasticsearch/2.x (artifactId elasticsearch-2.x), specific
> code +
> utests
> - sdk/java/io/elasticsearch/5.x (artifactId elasticsearch-5.x), specific
> code +
> utests
>
> Regards
> JB
>
> On 06/23/2017 11:51 AM, Etienne Chauchot wrote:
> > Hi guys,
> >
> > I'm working on Elasticsearch 5.x support for Beam IO (it only supports
> > Elasticsearch 2.x right now). I wanted to have your opinion on some
> points
> > related to maintenance.
> >
> > In this ES case a big part of the code of the IO is common between ES
> v2.x and
> > ES v5.x. Still, there are some differences:
> >
> > - initialization of UT (change in embedded test framework)
> >
> > - Minor differences in one message format
> >
> > - New feature that will allow improving the split or new feature that is
> worth
> > leveraging (ES pipelines)
> >
> >
> > => Question is: what do you think is the best way to architecture the IO
> to
> > reduce maintenance
> >
> >
> > 1. We could have an elasticsearchio-common package and two packages that
> are
> > specific to each version of the backend. But I find it confusing for the
> users
> > to have separate packages and more complex to maintain for us.
> >
> > 2. I'm more in favor of detecting the version at IO initialization time
> and
> > then, in the parts that are different do a simple if (version == x). But
> it will
> > make code paths more complex. Note that for example project es-hadoop (ES
> > connectors for big data engines) chose this way.
> >
> >
> > Another thing related to unit tests: in fact they are more close to
> integration
> > tests as they use an embedded backend server. I did it that way because
> I wanted
> > to unit test things like split that require a real instance.
> > => What is the recommended way of testing on both supported versions
> knowing
> > that both the test code and the test dependencies are different?
> >
> > For integration tests (they are mainly used as load testing), the test
> code and
> > the test dependencies are the same between versions because there is no
> embedded
> > ES. So, it will be only needed to run them twice against 2 versions of
> the backend.
> >
> >
> > What do you think?
> >
> > PS: sorry for the long email :)
> >
> > Best!
> > Etienne
> >
> >
> >
> >
> >
> >
> >
> >
>
> --
> Jean-Baptiste Onofré
> jbonofre@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>

Re: [DISCUSS] support different versions of backends in an IO

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.

Hi,

It's something we already discussed in the past (for Kafka by instance).

For Kafka, we were able to use a single IO with spring-el to detect the version. 
That's certainly the preferred approach, but it would not be possible in all cases.

I would suggest, if first approach doesn't work:

* In term of Maven modules:

- sdk/java/io/elasticsearch/common that could contain shared code + itests
- sdk/java/io/elasticsearch/2.x (artifactId elasticsearch-2.x), specific code + 
utests
- sdk/java/io/elasticsearch/5.x (artifactId elasticsearch-5.x), specific code + 
utests

Regards
JB

On 06/23/2017 11:51 AM, Etienne Chauchot wrote:
> Hi guys,
> 
> I'm working on Elasticsearch 5.x support for Beam IO (it only supports 
> Elasticsearch 2.x right now). I wanted to have your opinion on some points 
> related to maintenance.
> 
> In this ES case a big part of the code of the IO is common between ES v2.x and 
> ES v5.x. Still, there are some differences:
> 
> - initialization of UT (change in embedded test framework)
> 
> - Minor differences in one message format
> 
> - New feature that will allow improving the split or new feature that is worth 
> leveraging (ES pipelines)
> 
> 
> => Question is: what do you think is the best way to architecture the IO to 
> reduce maintenance
> 
> 
> 1. We could have an elasticsearchio-common package and two packages that are 
> specific to each version of the backend. But I find it confusing for the users 
> to have separate packages and more complex to maintain for us.
> 
> 2. I'm more in favor of detecting the version at IO initialization time and 
> then, in the parts that are different do a simple if (version == x). But it will 
> make code paths more complex. Note that for example project es-hadoop (ES 
> connectors for big data engines) chose this way.
> 
> 
> Another thing related to unit tests: in fact they are more close to integration 
> tests as they use an embedded backend server. I did it that way because I wanted 
> to unit test things like split that require a real instance.
> => What is the recommended way of testing on both supported versions knowing 
> that both the test code and the test dependencies are different?
> 
> For integration tests (they are mainly used as load testing), the test code and 
> the test dependencies are the same between versions because there is no embedded 
> ES. So, it will be only needed to run them twice against 2 versions of the backend.
> 
> 
> What do you think?
> 
> PS: sorry for the long email :)
> 
> Best!
> Etienne
> 
> 
> 
> 
> 
> 
> 
> 

-- 
Jean-Baptiste Onofré
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com