You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Sean Owen <sr...@gmail.com> on 2019/12/15 16:08:28 UTC

Do we need to finally update Guava?

See for example:

https://github.com/apache/spark/pull/25932#issuecomment-565822573
https://issues.apache.org/jira/browse/SPARK-23897

This is a dicey dependency that we have been reluctant to update as a)
Hadoop used an old version and b) Guava versions are incompatible
after a few releases.

But Hadoop is going all the way from 11 to 27 in Hadoop 3.2.1. Time to
match that? I haven't assessed how much internal change it requires.
If it's a lot, well, that makes it hard, as we need to stay compatible
with Hadoop 2 / Guava 11-14. But then that causes a problem updating
past Hadoop 3.2.0.

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org

Re: Do we need to finally update Guava?

Posted by Sean Owen <sr...@gmail.com>.

Yeah that won't be the last problem I bet. Here's a proposal for just
directly reducing exposure to Guava in Spark itself though:
https://github.com/apache/spark/pull/26911

On Mon, Dec 16, 2019 at 11:36 AM Marcelo Vanzin <va...@cloudera.com> wrote:
>
> Great that Hadoop has done it (which, btw, probably means that Spark
> won't work with that version of Hadoop yet), but Hive also depends on
> Guava, and last time I tried, even Hive 3.x did not work with Guava
> 27.
>
> (Newer Hadoop versions also have a new artifact that shades a lot of
> dependencies, which would be great for Spark. But since Spark uses
> some test artifacts from Hadoop, that may be a bit tricky, since I
> don't believe those are shaded.)
>
> On Sun, Dec 15, 2019 at 8:08 AM Sean Owen <sr...@gmail.com> wrote:
> >
> > See for example:
> >
> > https://github.com/apache/spark/pull/25932#issuecomment-565822573
> > https://issues.apache.org/jira/browse/SPARK-23897
> >
> > This is a dicey dependency that we have been reluctant to update as a)
> > Hadoop used an old version and b) Guava versions are incompatible
> > after a few releases.
> >
> > But Hadoop is going all the way from 11 to 27 in Hadoop 3.2.1. Time to
> > match that? I haven't assessed how much internal change it requires.
> > If it's a lot, well, that makes it hard, as we need to stay compatible
> > with Hadoop 2 / Guava 11-14. But then that causes a problem updating
> > past Hadoop 3.2.0.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
> >
>
>
> --
> Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org

Re: Do we need to finally update Guava?

Posted by Sean Owen <sr...@gmail.com>.

PS you are correct; with Guava 27 and my recent changes, and Hadoop
3.2.1 + Hive 2.3, I still see ...

*** RUN ABORTED ***
  java.lang.IllegalAccessError: tried to access method
com.google.common.collect.Iterators.emptyIterator()Lcom/google/common/collect/UnmodifiableIterator;
from class org.apache.hadoop.hive.ql.exec.FetchOperator
  at org.apache.hadoop.hive.ql.exec.FetchOperator.<init>(FetchOperator.java:108)
  at org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:87)
...

So, hm, we can make Spark work with old and new Guava (see PR) but
this seems like it will cause a problem updating to / running on
Hadoop 3.2.1+, regardless.

On Mon, Dec 16, 2019 at 11:36 AM Marcelo Vanzin <va...@cloudera.com> wrote:
>
> Great that Hadoop has done it (which, btw, probably means that Spark
> won't work with that version of Hadoop yet), but Hive also depends on
> Guava, and last time I tried, even Hive 3.x did not work with Guava
> 27.
>
> (Newer Hadoop versions also have a new artifact that shades a lot of
> dependencies, which would be great for Spark. But since Spark uses
> some test artifacts from Hadoop, that may be a bit tricky, since I
> don't believe those are shaded.)
>
> On Sun, Dec 15, 2019 at 8:08 AM Sean Owen <sr...@gmail.com> wrote:
> >
> > See for example:
> >
> > https://github.com/apache/spark/pull/25932#issuecomment-565822573
> > https://issues.apache.org/jira/browse/SPARK-23897
> >
> > This is a dicey dependency that we have been reluctant to update as a)
> > Hadoop used an old version and b) Guava versions are incompatible
> > after a few releases.
> >
> > But Hadoop is going all the way from 11 to 27 in Hadoop 3.2.1. Time to
> > match that? I haven't assessed how much internal change it requires.
> > If it's a lot, well, that makes it hard, as we need to stay compatible
> > with Hadoop 2 / Guava 11-14. But then that causes a problem updating
> > past Hadoop 3.2.0.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
> >
>
>
> --
> Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org

Re: Do we need to finally update Guava?

Posted by Marcelo Vanzin <va...@cloudera.com.INVALID>.

Great that Hadoop has done it (which, btw, probably means that Spark
won't work with that version of Hadoop yet), but Hive also depends on
Guava, and last time I tried, even Hive 3.x did not work with Guava
27.

(Newer Hadoop versions also have a new artifact that shades a lot of
dependencies, which would be great for Spark. But since Spark uses
some test artifacts from Hadoop, that may be a bit tricky, since I
don't believe those are shaded.)

On Sun, Dec 15, 2019 at 8:08 AM Sean Owen <sr...@gmail.com> wrote:
>
> See for example:
>
> https://github.com/apache/spark/pull/25932#issuecomment-565822573
> https://issues.apache.org/jira/browse/SPARK-23897
>
> This is a dicey dependency that we have been reluctant to update as a)
> Hadoop used an old version and b) Guava versions are incompatible
> after a few releases.
>
> But Hadoop is going all the way from 11 to 27 in Hadoop 3.2.1. Time to
> match that? I haven't assessed how much internal change it requires.
> If it's a lot, well, that makes it hard, as we need to stay compatible
> with Hadoop 2 / Guava 11-14. But then that causes a problem updating
> past Hadoop 3.2.0.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>

-- 
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org