You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kudu.apache.org by Clemens Valiente <Cl...@trivago.com> on 2017/12/27 14:01:41 UTC

Possible Race Condition in kudu-mapreduce

Hi,

I hope everyone had a nice long holiday weekend!


I have a question regarding the kudu-mapreduce package, in particular this line in the KuduTableInputFormat where getSplits() shuts down our Kudu client:

https://github.com/cloudera/kudu/blob/master/java/kudu-mapreduce/src/main/java/org/apache/kudu/mapreduce/KuduTableInputFormat.java#L164

Is there a reason for shutting down the client here?

This does not work with Hive:

In FetchInputFormatSplit, Hive uses the same InputFormat for fetching the splits and getting the recordReader (in our case, it is the KuduTableInputFormat.TableRecordReader).

If Hive then tries to initialize that record reader, it runs into an error here:

https://github.com/cloudera/kudu/blob/master/java/kudu-mapreduce/src/main/java/org/apache/kudu/mapreduce/KuduTableInputFormat.java#L397

since the TableRecordReader uses the same client of the KuduTableInputFormat that was already shut down by getSplits()


Since the client is already shut down by the close() method on the KuduTableInputFormat, I don't see a need to do the same in getSplits()

If there are no objections or a reason for keeping the shutdown call there, I would open a ticket and submit a patch for this.


Cheers

Clemens


Re: Possible Race Condition in kudu-mapreduce

Posted by Clemens Valiente <Cl...@trivago.com>.
We'll open one KuduClient per accessed Table (for the getSplits) and one Kuduclient per accessed Tablets per Table.

In the MR world we would usually spawn one Mapper per inputsplit and therefore one separate KuduClient anyway, this change would not increase the number of open clients a lot.

To be honest I think having the getSplits() and getRecordreader() called in one and the same container might be a special case anyway, I'll have to check that again tomorrow.

Anyway, I'll submit a change request sometime this week hopefully.


Cheers

Clemens

________________________________
From: Jean-Daniel Cryans <jd...@apache.org>
Sent: 28 December 2017 00:04:53
To: dev@kudu.apache.org
Subject: Re: Possible Race Condition in kudu-mapreduce

Hey,

No worries.

TBH I haven't been in the MR world for a while, so if you think it makes
more sense like that and that we won't be opening tons of KuduClients then
I'm all for it.

Thanks,

J-D

On Wed, Dec 27, 2017 at 2:47 PM, Clemens Valiente <
Clemens.Valiente@trivago.com> wrote:

> Hi J-D,
>
> thanks for the answer. I missed that part in the comment, teaches me to
> check more closely in the future. Now I understand the reason.
>
> It might probably be a better idea to instantiate a separate KuduClient
> for each RecordReader since they are not really designed to share the same
> client anyway.
>
> Thoughts?
>
>
> Clemens
>
> ________________________________
> From: Jean-Daniel Cryans <jd...@apache.org>
> Sent: 27 December 2017 23:37:54
> To: dev@kudu.apache.org
> Subject: Re: Possible Race Condition in kudu-mapreduce
>
> Hi Clemens,
>
> The reason is here
> https://github.com/apache/kudu/blob/master/java/kudu-
> mapreduce/src/main/java/org/apache/kudu/mapreduce/
> KuduTableInputFormat.java#L72
>
> I'd still be worried about that. You can look at what HBase does for
> inspiration in fixing this:
> https://github.com/apache/hbase/blob/master/hbase-
> mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/
> TableInputFormatBase.java#L238
>
> Thanks,
>
> J-D
>
> On Wed, Dec 27, 2017 at 6:01 AM, Clemens Valiente <
> Clemens.Valiente@trivago.com> wrote:
>
> > Hi,
> >
> > I hope everyone had a nice long holiday weekend!
> >
> >
> > I have a question regarding the kudu-mapreduce package, in particular
> this
> > line in the KuduTableInputFormat where getSplits() shuts down our Kudu
> > client:
> >
> > https://github.com/cloudera/kudu/blob/master/java/kudu-
> > mapreduce/src/main/java/org/apache/kudu/mapreduce/
> > KuduTableInputFormat.java#L164
> >
> > Is there a reason for shutting down the client here?
> >
> > This does not work with Hive:
> >
> > In FetchInputFormatSplit, Hive uses the same InputFormat for fetching the
> > splits and getting the recordReader (in our case, it is the
> > KuduTableInputFormat.TableRecordReader).
> >
> > If Hive then tries to initialize that record reader, it runs into an
> error
> > here:
> >
> > https://github.com/cloudera/kudu/blob/master/java/kudu-
> > mapreduce/src/main/java/org/apache/kudu/mapreduce/
> > KuduTableInputFormat.java#L397
> >
> > since the TableRecordReader uses the same client of the
> > KuduTableInputFormat that was already shut down by getSplits()
> >
> >
> > Since the client is already shut down by the close() method on the
> > KuduTableInputFormat, I don't see a need to do the same in getSplits()
> >
> > If there are no objections or a reason for keeping the shutdown call
> > there, I would open a ticket and submit a patch for this.
> >
> >
> > Cheers
> >
> > Clemens
> >
> >
>

Re: Possible Race Condition in kudu-mapreduce

Posted by Jean-Daniel Cryans <jd...@apache.org>.
Hey,

No worries.

TBH I haven't been in the MR world for a while, so if you think it makes
more sense like that and that we won't be opening tons of KuduClients then
I'm all for it.

Thanks,

J-D

On Wed, Dec 27, 2017 at 2:47 PM, Clemens Valiente <
Clemens.Valiente@trivago.com> wrote:

> Hi J-D,
>
> thanks for the answer. I missed that part in the comment, teaches me to
> check more closely in the future. Now I understand the reason.
>
> It might probably be a better idea to instantiate a separate KuduClient
> for each RecordReader since they are not really designed to share the same
> client anyway.
>
> Thoughts?
>
>
> Clemens
>
> ________________________________
> From: Jean-Daniel Cryans <jd...@apache.org>
> Sent: 27 December 2017 23:37:54
> To: dev@kudu.apache.org
> Subject: Re: Possible Race Condition in kudu-mapreduce
>
> Hi Clemens,
>
> The reason is here
> https://github.com/apache/kudu/blob/master/java/kudu-
> mapreduce/src/main/java/org/apache/kudu/mapreduce/
> KuduTableInputFormat.java#L72
>
> I'd still be worried about that. You can look at what HBase does for
> inspiration in fixing this:
> https://github.com/apache/hbase/blob/master/hbase-
> mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/
> TableInputFormatBase.java#L238
>
> Thanks,
>
> J-D
>
> On Wed, Dec 27, 2017 at 6:01 AM, Clemens Valiente <
> Clemens.Valiente@trivago.com> wrote:
>
> > Hi,
> >
> > I hope everyone had a nice long holiday weekend!
> >
> >
> > I have a question regarding the kudu-mapreduce package, in particular
> this
> > line in the KuduTableInputFormat where getSplits() shuts down our Kudu
> > client:
> >
> > https://github.com/cloudera/kudu/blob/master/java/kudu-
> > mapreduce/src/main/java/org/apache/kudu/mapreduce/
> > KuduTableInputFormat.java#L164
> >
> > Is there a reason for shutting down the client here?
> >
> > This does not work with Hive:
> >
> > In FetchInputFormatSplit, Hive uses the same InputFormat for fetching the
> > splits and getting the recordReader (in our case, it is the
> > KuduTableInputFormat.TableRecordReader).
> >
> > If Hive then tries to initialize that record reader, it runs into an
> error
> > here:
> >
> > https://github.com/cloudera/kudu/blob/master/java/kudu-
> > mapreduce/src/main/java/org/apache/kudu/mapreduce/
> > KuduTableInputFormat.java#L397
> >
> > since the TableRecordReader uses the same client of the
> > KuduTableInputFormat that was already shut down by getSplits()
> >
> >
> > Since the client is already shut down by the close() method on the
> > KuduTableInputFormat, I don't see a need to do the same in getSplits()
> >
> > If there are no objections or a reason for keeping the shutdown call
> > there, I would open a ticket and submit a patch for this.
> >
> >
> > Cheers
> >
> > Clemens
> >
> >
>

Re: Possible Race Condition in kudu-mapreduce

Posted by Clemens Valiente <Cl...@trivago.com>.
Hi J-D,

thanks for the answer. I missed that part in the comment, teaches me to check more closely in the future. Now I understand the reason.

It might probably be a better idea to instantiate a separate KuduClient for each RecordReader since they are not really designed to share the same client anyway.

Thoughts?


Clemens

________________________________
From: Jean-Daniel Cryans <jd...@apache.org>
Sent: 27 December 2017 23:37:54
To: dev@kudu.apache.org
Subject: Re: Possible Race Condition in kudu-mapreduce

Hi Clemens,

The reason is here
https://github.com/apache/kudu/blob/master/java/kudu-mapreduce/src/main/java/org/apache/kudu/mapreduce/KuduTableInputFormat.java#L72

I'd still be worried about that. You can look at what HBase does for
inspiration in fixing this:
https://github.com/apache/hbase/blob/master/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java#L238

Thanks,

J-D

On Wed, Dec 27, 2017 at 6:01 AM, Clemens Valiente <
Clemens.Valiente@trivago.com> wrote:

> Hi,
>
> I hope everyone had a nice long holiday weekend!
>
>
> I have a question regarding the kudu-mapreduce package, in particular this
> line in the KuduTableInputFormat where getSplits() shuts down our Kudu
> client:
>
> https://github.com/cloudera/kudu/blob/master/java/kudu-
> mapreduce/src/main/java/org/apache/kudu/mapreduce/
> KuduTableInputFormat.java#L164
>
> Is there a reason for shutting down the client here?
>
> This does not work with Hive:
>
> In FetchInputFormatSplit, Hive uses the same InputFormat for fetching the
> splits and getting the recordReader (in our case, it is the
> KuduTableInputFormat.TableRecordReader).
>
> If Hive then tries to initialize that record reader, it runs into an error
> here:
>
> https://github.com/cloudera/kudu/blob/master/java/kudu-
> mapreduce/src/main/java/org/apache/kudu/mapreduce/
> KuduTableInputFormat.java#L397
>
> since the TableRecordReader uses the same client of the
> KuduTableInputFormat that was already shut down by getSplits()
>
>
> Since the client is already shut down by the close() method on the
> KuduTableInputFormat, I don't see a need to do the same in getSplits()
>
> If there are no objections or a reason for keeping the shutdown call
> there, I would open a ticket and submit a patch for this.
>
>
> Cheers
>
> Clemens
>
>

Re: Possible Race Condition in kudu-mapreduce

Posted by Jean-Daniel Cryans <jd...@apache.org>.
Hi Clemens,

The reason is here
https://github.com/apache/kudu/blob/master/java/kudu-mapreduce/src/main/java/org/apache/kudu/mapreduce/KuduTableInputFormat.java#L72

I'd still be worried about that. You can look at what HBase does for
inspiration in fixing this:
https://github.com/apache/hbase/blob/master/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java#L238

Thanks,

J-D

On Wed, Dec 27, 2017 at 6:01 AM, Clemens Valiente <
Clemens.Valiente@trivago.com> wrote:

> Hi,
>
> I hope everyone had a nice long holiday weekend!
>
>
> I have a question regarding the kudu-mapreduce package, in particular this
> line in the KuduTableInputFormat where getSplits() shuts down our Kudu
> client:
>
> https://github.com/cloudera/kudu/blob/master/java/kudu-
> mapreduce/src/main/java/org/apache/kudu/mapreduce/
> KuduTableInputFormat.java#L164
>
> Is there a reason for shutting down the client here?
>
> This does not work with Hive:
>
> In FetchInputFormatSplit, Hive uses the same InputFormat for fetching the
> splits and getting the recordReader (in our case, it is the
> KuduTableInputFormat.TableRecordReader).
>
> If Hive then tries to initialize that record reader, it runs into an error
> here:
>
> https://github.com/cloudera/kudu/blob/master/java/kudu-
> mapreduce/src/main/java/org/apache/kudu/mapreduce/
> KuduTableInputFormat.java#L397
>
> since the TableRecordReader uses the same client of the
> KuduTableInputFormat that was already shut down by getSplits()
>
>
> Since the client is already shut down by the close() method on the
> KuduTableInputFormat, I don't see a need to do the same in getSplits()
>
> If there are no objections or a reason for keeping the shutdown call
> there, I would open a ticket and submit a patch for this.
>
>
> Cheers
>
> Clemens
>
>