You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@bigtop.apache.org by Konstantin Boudnik <co...@apache.org> on 2016/01/22 21:27:02 UTC

Spark dependency issues (BIGTOP-2154)

Hey all.

I hate to pollute the list with this extra traffic, but there's a quite ugly
issue that we are facing because of the infamous Hive-Spark collusion. I am
talking about https://issues.apache.org/jira/browse/BIGTOP-2154

I have proposed a patch to work around it and let the 1.1 out of the door. I
have also open BIGTOP-2268 so we don't forget to fix it properly in the next
release.

I'd appreciate someone else taking a look and sharing the feedback, so we can
finally unblock the RC1.

Thanks,
  Cos

Re: Spark dependency issues (BIGTOP-2154)

Posted by Konstantin Boudnik <co...@apache.org>.

On Sun, Jan 24, 2016 at 08:03PM, Evans Ye wrote:
> Oh, yes. I confused myself,... Should be /usr/lib...sorry!
> 
> For that hive thing, I mean, if we can't have a proper working spark env
> w/o datanucleus, we can only accept it and put them together even though I
> don't use hive. This should be a spark issue we can't fix it at our side.
> 
> Merge spark-datanucleus into spark-core seems reasonable given the current
> context, but maybe in the future spark will have a better decoupled code?
> It's much more comfortable to me to apply a workaround and expect they can
> be decoupled in the future.

The workaround is in for now. But I won't hold my breath to see a better
system organization from Spark or, for that matter, any other project in the
space. Some of them are using shader to slam together tons of the libs and
deliver it as a binary artifact ;) Historically, Bigtop did this neat and
somewhat clean grooming for Hadoop. Given there's a room for improvement, it
is perhaps quite well organized. Hopefully, it might be done for other
components in the stack, if there's a volunteer for it, of course

Cos

> 2016-01-24 9:35 GMT+08:00 Konstantin Boudnik <co...@apache.org>:
> 
> > On Sun, Jan 24, 2016 at 09:22AM, Evans Ye wrote:
> > > I actually feels OK about having the dependency, if that's what spark
> > want.
> >
> > This isn't what Spark wants; this is what spark-hive integration needs.
> > But my
> > point is a bit different: let's suppose we all agree we need the Hive
> > support
> > in Spark (which I am not convinced, but don't care enough to make an
> > argument). Now, without datanucleus libs spark-shell is broken and won't
> > work.
> > Perhaps other clients using spark-submit won't work either (again, I don't
> > know nor care enough about Spark to figure out). Shall we simply get rid of
> > the datanucleus package and install these libs everywhere now? Cause you
> > know
> > - having a package just for the sake of having the package doesn't sound
> > very
> > optimal to me. See where I am going with it?
> >
> > > BTW, during the test I found that the spark installation dir are under
> > > /usr/lib/spark instead of /var/lib/spark.
> > > I expect we should put all the component libs under /var/lib for
> > > consistency. Is it correct?
> >
> > No, I believe /usr/lib/<component> is what we've been doing for a long
> > time.
> > Look at /usr/lib/hadoop or hbase or anything else for that matter.
> >
> > Cos
> >
> > > 2016-01-23 4:27 GMT+08:00 Konstantin Boudnik <co...@apache.org>:
> > >
> > > > Hey all.
> > > >
> > > > I hate to pollute the list with this extra traffic, but there's a quite
> > > > ugly
> > > > issue that we are facing because of the infamous Hive-Spark collusion.
> > I am
> > > > talking about https://issues.apache.org/jira/browse/BIGTOP-2154
> > > >
> > > > I have proposed a patch to work around it and let the 1.1 out of the
> > door.
> > > > I
> > > > have also open BIGTOP-2268 so we don't forget to fix it properly in the
> > > > next
> > > > release.
> > > >
> > > > I'd appreciate someone else taking a look and sharing the feedback, so
> > we
> > > > can
> > > > finally unblock the RC1.
> > > >
> > > > Thanks,
> > > >   Cos
> > > >
> >

Re: Spark dependency issues (BIGTOP-2154)

Posted by Evans Ye <ev...@apache.org>.

Oh, yes. I confused myself,... Should be /usr/lib...sorry!

For that hive thing, I mean, if we can't have a proper working spark env
w/o datanucleus, we can only accept it and put them together even though I
don't use hive. This should be a spark issue we can't fix it at our side.

Merge spark-datanucleus into spark-core seems reasonable given the current
context, but maybe in the future spark will have a better decoupled code?
It's much more comfortable to me to apply a workaround and expect they can
be decoupled in the future.


2016-01-24 9:35 GMT+08:00 Konstantin Boudnik <co...@apache.org>:

> On Sun, Jan 24, 2016 at 09:22AM, Evans Ye wrote:
> > I actually feels OK about having the dependency, if that's what spark
> want.
>
> This isn't what Spark wants; this is what spark-hive integration needs.
> But my
> point is a bit different: let's suppose we all agree we need the Hive
> support
> in Spark (which I am not convinced, but don't care enough to make an
> argument). Now, without datanucleus libs spark-shell is broken and won't
> work.
> Perhaps other clients using spark-submit won't work either (again, I don't
> know nor care enough about Spark to figure out). Shall we simply get rid of
> the datanucleus package and install these libs everywhere now? Cause you
> know
> - having a package just for the sake of having the package doesn't sound
> very
> optimal to me. See where I am going with it?
>
> > BTW, during the test I found that the spark installation dir are under
> > /usr/lib/spark instead of /var/lib/spark.
> > I expect we should put all the component libs under /var/lib for
> > consistency. Is it correct?
>
> No, I believe /usr/lib/<component> is what we've been doing for a long
> time.
> Look at /usr/lib/hadoop or hbase or anything else for that matter.
>
> Cos
>
> > 2016-01-23 4:27 GMT+08:00 Konstantin Boudnik <co...@apache.org>:
> >
> > > Hey all.
> > >
> > > I hate to pollute the list with this extra traffic, but there's a quite
> > > ugly
> > > issue that we are facing because of the infamous Hive-Spark collusion.
> I am
> > > talking about https://issues.apache.org/jira/browse/BIGTOP-2154
> > >
> > > I have proposed a patch to work around it and let the 1.1 out of the
> door.
> > > I
> > > have also open BIGTOP-2268 so we don't forget to fix it properly in the
> > > next
> > > release.
> > >
> > > I'd appreciate someone else taking a look and sharing the feedback, so
> we
> > > can
> > > finally unblock the RC1.
> > >
> > > Thanks,
> > >   Cos
> > >
>

Re: Spark dependency issues (BIGTOP-2154)

Posted by Konstantin Boudnik <co...@apache.org>.

On Sun, Jan 24, 2016 at 09:22AM, Evans Ye wrote:
> I actually feels OK about having the dependency, if that's what spark want.

This isn't what Spark wants; this is what spark-hive integration needs. But my
point is a bit different: let's suppose we all agree we need the Hive support
in Spark (which I am not convinced, but don't care enough to make an
argument). Now, without datanucleus libs spark-shell is broken and won't work.
Perhaps other clients using spark-submit won't work either (again, I don't
know nor care enough about Spark to figure out). Shall we simply get rid of
the datanucleus package and install these libs everywhere now? Cause you know
- having a package just for the sake of having the package doesn't sound very
optimal to me. See where I am going with it?

> BTW, during the test I found that the spark installation dir are under
> /usr/lib/spark instead of /var/lib/spark.
> I expect we should put all the component libs under /var/lib for
> consistency. Is it correct?

No, I believe /usr/lib/<component> is what we've been doing for a long time. 
Look at /usr/lib/hadoop or hbase or anything else for that matter.

Cos

> 2016-01-23 4:27 GMT+08:00 Konstantin Boudnik <co...@apache.org>:
> 
> > Hey all.
> >
> > I hate to pollute the list with this extra traffic, but there's a quite
> > ugly
> > issue that we are facing because of the infamous Hive-Spark collusion. I am
> > talking about https://issues.apache.org/jira/browse/BIGTOP-2154
> >
> > I have proposed a patch to work around it and let the 1.1 out of the door.
> > I
> > have also open BIGTOP-2268 so we don't forget to fix it properly in the
> > next
> > release.
> >
> > I'd appreciate someone else taking a look and sharing the feedback, so we
> > can
> > finally unblock the RC1.
> >
> > Thanks,
> >   Cos
> >

Re: Spark dependency issues (BIGTOP-2154)

Posted by Evans Ye <ev...@apache.org>.

I actually feels OK about having the dependency, if that's what spark want.

BTW, during the test I found that the spark installation dir are under
/usr/lib/spark instead of /var/lib/spark.
I expect we should put all the component libs under /var/lib for
consistency. Is it correct?


2016-01-23 4:27 GMT+08:00 Konstantin Boudnik <co...@apache.org>:

> Hey all.
>
> I hate to pollute the list with this extra traffic, but there's a quite
> ugly
> issue that we are facing because of the infamous Hive-Spark collusion. I am
> talking about https://issues.apache.org/jira/browse/BIGTOP-2154
>
> I have proposed a patch to work around it and let the 1.1 out of the door.
> I
> have also open BIGTOP-2268 so we don't forget to fix it properly in the
> next
> release.
>
> I'd appreciate someone else taking a look and sharing the feedback, so we
> can
> finally unblock the RC1.
>
> Thanks,
>   Cos
>