You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@bahir.apache.org by Nacho Garcia Fernandez <na...@gmail.com> on 2017/11/03 10:45:30 UTC

Re: New Flink connector

Can somebody out there please reply my last question?  Thanks in advance :D

On 27 October 2017 at 14:23, Nacho Garcia Fernandez <nachogarcia91@gmail.com
> wrote:

> Hello all.
>
> I'm a little bit stuck with one issue that I hope you can help me with.
>
> I'm developing a flink-connector-kudu extension that allows to read from
> Kudu and write to Flink and Kudu. This connector addresses the issue
> [BAHIR-99] and is a full re-implemnetation of https://github.com/apache/
> bahir-flink/pull/17.
>
> I'm struggle with testing: How is it supposed to be handled when the data
> storage (kudu) do not provide an embedded driver?
>
> In the case of Kudu, it does not provide any embedded java-based driver
> yet and I need a built Kudu to perform testing against it, otherwise I
> cannot test (e2e) this connector with a "real" data storage.
>
> Because of that I see three main possibilities for this scenario:
>
> * Create a Mock for Kudu classes (KuduSession, KuduTable, KuduClient, etc).
>
> * Use MiniKuduCluster utility of Kudu to instantiate a local cluster: it
> is not possible due to the fact that this needs a real build of Kudu in the
> local machine.
>
> * Update travis.yml to install a Kudu server: it would fix the problem for
> CI, but tests would fail locally. Moreover, bulding Kudu takes so long
> (more than 20 minutes), which is not feasible for CI.
>
> * Ignore testing: not an option :)
>
>
> In the case of Kudu, I saw that other connectors for other distributed
> analytics platforms (i.e spark) are directly implemented in the Kudu repo (
> https://github.com/apache/kudu/tree/master/java/kudu-spark) instead of
> using bahir-spark. I think this is good because when you execute the tests
> you have a real build of Kudu to perform testing against it.
>
> What is the best place (kudu vs bahir) for this connector if we take into
> consideration the abovementioned issues?
>
> If the answer is bahir-flink, how should I proceed with my tests? :)
>
> Thanks in advance.
>
>

Re: New Flink connector

Posted by Ted Yu <yu...@gmail.com>.

+1 on Robert's proposal.

On Sat, Nov 18, 2017 at 6:21 AM, Robert Metzger <rm...@apache.org> wrote:

> I'm really sorry that I didn't respond yet.
>
> Regarding the question "should the code be in the bahir-flink repo, or the
> kudu repo"
> I have the feeling that the kudu repo might actually the better spot,
> because flume, mapreduce, hive, spark, and others are already there. It
> seems that the Kudu project is accepting such contributions and is also
> willing to maintain them.
>
> If the Kudu project rejects the contribution for now, I would suggest to
> provide a script that builds the flink-kudu connector, starts kudu (maybe
> from a docker image?) and then runs a few test jobs.
>
>
> On Fri, Nov 3, 2017 at 11:45 AM, Nacho Garcia Fernandez <
> nachogarcia91@gmail.com> wrote:
>
> > Can somebody out there please reply my last question?  Thanks in advance
> :D
> >
> > On 27 October 2017 at 14:23, Nacho Garcia Fernandez <
> > nachogarcia91@gmail.com
> > > wrote:
> >
> > > Hello all.
> > >
> > > I'm a little bit stuck with one issue that I hope you can help me with.
> > >
> > > I'm developing a flink-connector-kudu extension that allows to read
> from
> > > Kudu and write to Flink and Kudu. This connector addresses the issue
> > > [BAHIR-99] and is a full re-implemnetation of
> https://github.com/apache/
> > > bahir-flink/pull/17.
> > >
> > > I'm struggle with testing: How is it supposed to be handled when the
> data
> > > storage (kudu) do not provide an embedded driver?
> > >
> > > In the case of Kudu, it does not provide any embedded java-based driver
> > > yet and I need a built Kudu to perform testing against it, otherwise I
> > > cannot test (e2e) this connector with a "real" data storage.
> > >
> > > Because of that I see three main possibilities for this scenario:
> > >
> > > * Create a Mock for Kudu classes (KuduSession, KuduTable, KuduClient,
> > etc).
> > >
> > > * Use MiniKuduCluster utility of Kudu to instantiate a local cluster:
> it
> > > is not possible due to the fact that this needs a real build of Kudu in
> > the
> > > local machine.
> > >
> > > * Update travis.yml to install a Kudu server: it would fix the problem
> > for
> > > CI, but tests would fail locally. Moreover, bulding Kudu takes so long
> > > (more than 20 minutes), which is not feasible for CI.
> > >
> > > * Ignore testing: not an option :)
> > >
> > >
> > > In the case of Kudu, I saw that other connectors for other distributed
> > > analytics platforms (i.e spark) are directly implemented in the Kudu
> > repo (
> > > https://github.com/apache/kudu/tree/master/java/kudu-spark) instead of
> > > using bahir-spark. I think this is good because when you execute the
> > tests
> > > you have a real build of Kudu to perform testing against it.
> > >
> > > What is the best place (kudu vs bahir) for this connector if we take
> into
> > > consideration the abovementioned issues?
> > >
> > > If the answer is bahir-flink, how should I proceed with my tests? :)
> > >
> > > Thanks in advance.
> > >
> > >
> >
>

Re: New Flink connector

Posted by Robert Metzger <rm...@apache.org>.

I'm really sorry that I didn't respond yet.

Regarding the question "should the code be in the bahir-flink repo, or the
kudu repo"
I have the feeling that the kudu repo might actually the better spot,
because flume, mapreduce, hive, spark, and others are already there. It
seems that the Kudu project is accepting such contributions and is also
willing to maintain them.

If the Kudu project rejects the contribution for now, I would suggest to
provide a script that builds the flink-kudu connector, starts kudu (maybe
from a docker image?) and then runs a few test jobs.


On Fri, Nov 3, 2017 at 11:45 AM, Nacho Garcia Fernandez <
nachogarcia91@gmail.com> wrote:

> Can somebody out there please reply my last question?  Thanks in advance :D
>
> On 27 October 2017 at 14:23, Nacho Garcia Fernandez <
> nachogarcia91@gmail.com
> > wrote:
>
> > Hello all.
> >
> > I'm a little bit stuck with one issue that I hope you can help me with.
> >
> > I'm developing a flink-connector-kudu extension that allows to read from
> > Kudu and write to Flink and Kudu. This connector addresses the issue
> > [BAHIR-99] and is a full re-implemnetation of https://github.com/apache/
> > bahir-flink/pull/17.
> >
> > I'm struggle with testing: How is it supposed to be handled when the data
> > storage (kudu) do not provide an embedded driver?
> >
> > In the case of Kudu, it does not provide any embedded java-based driver
> > yet and I need a built Kudu to perform testing against it, otherwise I
> > cannot test (e2e) this connector with a "real" data storage.
> >
> > Because of that I see three main possibilities for this scenario:
> >
> > * Create a Mock for Kudu classes (KuduSession, KuduTable, KuduClient,
> etc).
> >
> > * Use MiniKuduCluster utility of Kudu to instantiate a local cluster: it
> > is not possible due to the fact that this needs a real build of Kudu in
> the
> > local machine.
> >
> > * Update travis.yml to install a Kudu server: it would fix the problem
> for
> > CI, but tests would fail locally. Moreover, bulding Kudu takes so long
> > (more than 20 minutes), which is not feasible for CI.
> >
> > * Ignore testing: not an option :)
> >
> >
> > In the case of Kudu, I saw that other connectors for other distributed
> > analytics platforms (i.e spark) are directly implemented in the Kudu
> repo (
> > https://github.com/apache/kudu/tree/master/java/kudu-spark) instead of
> > using bahir-spark. I think this is good because when you execute the
> tests
> > you have a real build of Kudu to perform testing against it.
> >
> > What is the best place (kudu vs bahir) for this connector if we take into
> > consideration the abovementioned issues?
> >
> > If the answer is bahir-flink, how should I proceed with my tests? :)
> >
> > Thanks in advance.
> >
> >
>