You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Evan Galpin <eg...@apache.org> on 2023/04/20 23:52:18 UTC

[java] Trouble with gradle and using ParquetIO

Hi all,

I'm trying to make use of ParquetIO.  Based on what's documented in maven
central, I'm including the artifact in "compileOnly" mode (or in maven
parlance, 'provided' scope).  I can successfully compile my pipeline, but
when I run it I (intuitively?) am met with a ClassNotFound exception for
ParquetIO.

Is 'compileOnly' still the desired way to include ParquetIO as a pipeline
dependency?

Thanks,
Evan

Re: [java] Trouble with gradle and using ParquetIO

Posted by Alexey Romanenko <ar...@gmail.com>.
No, I don’t think so, since iirc hadoop dependencies intentionally were made “provided” to support different Hadoop versions and make it "up to user" to finally decide which version to use.

—
Alexey

> On 26 Apr 2023, at 05:51, Evan Galpin <eg...@apache.org> wrote:
> 
> The root cause was actually   "java.lang.ClassNotFoundException: org.apache.hadoop.io.Writable" which I eventually fixed by including hadoop-common as a dep for my pipeline (below).  Should hadoop-common be listed as a dep of ParquetIO the beam repo itself? 
> 
> implementation "org.apache.hadoop:hadoop-common:3.2.4"
> 
> On Fri, Apr 21, 2023 at 10:38 AM Evan Galpin <egalpin@apache.org <ma...@apache.org>> wrote:
>> Oops, I was looking at the "bootleg" mvnrepository search engine, which shows `compileOnly` in the copy-pastable dependency installation prompts[1].  When I received the "ClassNotFound" error, my thought was that the dep should be installed in "implementation" mode.  When I tried that, I get other more strange errors when I try to run my pipeline: "java.lang.NoClassDefFoundError: Could not initialize class org.apache.beam.sdk.coders.CoderRegistry".
>> 
>> My deps are like so:
>>     implementation "org.apache.beam:beam-sdks-java-core:${beamVersion}"
>>     implementation "org.apache.beam:beam-sdks-java-io-parquet:${beamVersion}"
>>     ...
>> 
>> Not sure why the CoderRegistry error comes up at runtime when both of the above deps are included. 
>> 
>> [1] https://mvnrepository.com/artifact/org.apache.beam/beam-sdks-java-io-parquet/2.46.0
>> 
>> On Fri, Apr 21, 2023 at 2:34 AM Alexey Romanenko <aromanenko.dev@gmail.com <ma...@gmail.com>> wrote:
>>> Just curious. where it was documented like this?
>>> 
>>> I briefly checked it on Maven Central [1] and the provided code snippet for Gradle uses “implementation” scope.
>>> 
>>> —
>>> Alexey
>>> 
>>> [1] https://search.maven.org/artifact/org.apache.beam/beam-sdks-java-io-parquet/2.46.0/jar
>>> 
>>> > On 21 Apr 2023, at 01:52, Evan Galpin <egalpin@apache.org <ma...@apache.org>> wrote:
>>> > 
>>> > Hi all,
>>> > 
>>> > I'm trying to make use of ParquetIO.  Based on what's documented in maven central, I'm including the artifact in "compileOnly" mode (or in maven parlance, 'provided' scope).  I can successfully compile my pipeline, but when I run it I (intuitively?) am met with a ClassNotFound exception for ParquetIO.
>>> > 
>>> > Is 'compileOnly' still the desired way to include ParquetIO as a pipeline dependency? 
>>> > 
>>> > Thanks,
>>> > Evan
>>> 


Re: [java] Trouble with gradle and using ParquetIO

Posted by Evan Galpin <eg...@apache.org>.
The root cause was actually   "java.lang.ClassNotFoundException:
org.apache.hadoop.io.Writable" which I eventually fixed by including
hadoop-common as a dep for my pipeline (below).  Should hadoop-common be
listed as a dep of ParquetIO the beam repo itself?

implementation "org.apache.hadoop:hadoop-common:3.2.4"

On Fri, Apr 21, 2023 at 10:38 AM Evan Galpin <eg...@apache.org> wrote:

> Oops, I was looking at the "bootleg" mvnrepository search engine, which
> shows `compileOnly` in the copy-pastable dependency installation
> prompts[1].  When I received the "ClassNotFound" error, my thought was that
> the dep should be installed in "implementation" mode.  When I tried that, I
> get other more strange errors when I try to run my pipeline:
> "java.lang.NoClassDefFoundError: Could not initialize class
> org.apache.beam.sdk.coders.CoderRegistry".
>
> My deps are like so:
>     implementation "org.apache.beam:beam-sdks-java-core:${beamVersion}"
>     implementation
> "org.apache.beam:beam-sdks-java-io-parquet:${beamVersion}"
>     ...
>
> Not sure why the CoderRegistry error comes up at runtime when both of the
> above deps are included.
>
> [1]
> https://mvnrepository.com/artifact/org.apache.beam/beam-sdks-java-io-parquet/2.46.0
>
> On Fri, Apr 21, 2023 at 2:34 AM Alexey Romanenko <ar...@gmail.com>
> wrote:
>
>> Just curious. where it was documented like this?
>>
>> I briefly checked it on Maven Central [1] and the provided code snippet
>> for Gradle uses “implementation” scope.
>>
>> —
>> Alexey
>>
>> [1]
>> https://search.maven.org/artifact/org.apache.beam/beam-sdks-java-io-parquet/2.46.0/jar
>>
>> > On 21 Apr 2023, at 01:52, Evan Galpin <eg...@apache.org> wrote:
>> >
>> > Hi all,
>> >
>> > I'm trying to make use of ParquetIO.  Based on what's documented in
>> maven central, I'm including the artifact in "compileOnly" mode (or in
>> maven parlance, 'provided' scope).  I can successfully compile my pipeline,
>> but when I run it I (intuitively?) am met with a ClassNotFound exception
>> for ParquetIO.
>> >
>> > Is 'compileOnly' still the desired way to include ParquetIO as a
>> pipeline dependency?
>> >
>> > Thanks,
>> > Evan
>>
>>

Re: [java] Trouble with gradle and using ParquetIO

Posted by Evan Galpin <eg...@apache.org>.
Oops, I was looking at the "bootleg" mvnrepository search engine, which
shows `compileOnly` in the copy-pastable dependency installation
prompts[1].  When I received the "ClassNotFound" error, my thought was that
the dep should be installed in "implementation" mode.  When I tried that, I
get other more strange errors when I try to run my pipeline:
"java.lang.NoClassDefFoundError: Could not initialize class
org.apache.beam.sdk.coders.CoderRegistry".

My deps are like so:
    implementation "org.apache.beam:beam-sdks-java-core:${beamVersion}"
    implementation
"org.apache.beam:beam-sdks-java-io-parquet:${beamVersion}"
    ...

Not sure why the CoderRegistry error comes up at runtime when both of the
above deps are included.

[1]
https://mvnrepository.com/artifact/org.apache.beam/beam-sdks-java-io-parquet/2.46.0

On Fri, Apr 21, 2023 at 2:34 AM Alexey Romanenko <ar...@gmail.com>
wrote:

> Just curious. where it was documented like this?
>
> I briefly checked it on Maven Central [1] and the provided code snippet
> for Gradle uses “implementation” scope.
>
> —
> Alexey
>
> [1]
> https://search.maven.org/artifact/org.apache.beam/beam-sdks-java-io-parquet/2.46.0/jar
>
> > On 21 Apr 2023, at 01:52, Evan Galpin <eg...@apache.org> wrote:
> >
> > Hi all,
> >
> > I'm trying to make use of ParquetIO.  Based on what's documented in
> maven central, I'm including the artifact in "compileOnly" mode (or in
> maven parlance, 'provided' scope).  I can successfully compile my pipeline,
> but when I run it I (intuitively?) am met with a ClassNotFound exception
> for ParquetIO.
> >
> > Is 'compileOnly' still the desired way to include ParquetIO as a
> pipeline dependency?
> >
> > Thanks,
> > Evan
>
>

Re: [java] Trouble with gradle and using ParquetIO

Posted by Alexey Romanenko <ar...@gmail.com>.
Just curious. where it was documented like this?

I briefly checked it on Maven Central [1] and the provided code snippet for Gradle uses “implementation” scope.

—
Alexey

[1] https://search.maven.org/artifact/org.apache.beam/beam-sdks-java-io-parquet/2.46.0/jar

> On 21 Apr 2023, at 01:52, Evan Galpin <eg...@apache.org> wrote:
> 
> Hi all,
> 
> I'm trying to make use of ParquetIO.  Based on what's documented in maven central, I'm including the artifact in "compileOnly" mode (or in maven parlance, 'provided' scope).  I can successfully compile my pipeline, but when I run it I (intuitively?) am met with a ClassNotFound exception for ParquetIO.
> 
> Is 'compileOnly' still the desired way to include ParquetIO as a pipeline dependency? 
> 
> Thanks,
> Evan


Re: [java] Trouble with gradle and using ParquetIO

Posted by Wiśniowski Piotr <co...@gmail.com>.
Hi Evan,

Just to have full knowledge:

- "provided" should be used when You expect the target cluster on 
environment to have the package of interest installed so you do not have 
to include it in the pipeline jar (this is to have it more lightweight 
and easier to maintain coherent target jre env across organization).

- it seems that You should either install the library on You target env 
or include it in your build jar. Up to Your specific use case. Typically 
corporation envs provide commonly used libs in their envs like spark, 
and IO libs - and this might be the reason that maven suggest this.

Best

Wiśniowski Piotr

On 21.04.2023 08:30, Moritz Mack wrote:
>
> Hi Evan,
>
> Not sure why maven suggests using “compileOnly”.
>
> That’s certainly wrong, make sure to use “implementation” in your case.
>
> Cheers, Moritz
>
> On 21.04.23, 01:52, "Evan Galpin" <eg...@apache.org> wrote:
>
> Hi all, I'm trying to make use of ParquetIO.   Based on what's 
> documented in maven central, I'm including the artifact in 
> "compileOnly" mode (or in maven parlance, 'provided' scope).   I can 
> successfully compile
>
> Hi all,
>
> I'm trying to make use of ParquetIO.  Based on what's documented in 
> maven central, I'm including the artifact in "compileOnly" mode (or in 
> maven parlance, 'provided' scope).  I can successfully compile my 
> pipeline, but when I run it I (intuitively?) am met with a 
> ClassNotFound exception for ParquetIO.
>
> Is 'compileOnly' still the desired way to include ParquetIO as a 
> pipeline dependency?
>
> Thanks,
>
> Evan
>
>
>   *As a recipient of an email from the Talend Group, your personal
>   data will be processed by our systems. Please see our Privacy Notice
>   <https://www.talend.com/privacy-policy/>*for more information about
>   our collection and use of your personal information, our security
>   practices, and your data protection rights, including any rights you
>   may have to object to automated-decision making or profiling we use
>   to analyze support or marketing related communications. To manage or
>   discontinue promotional communications, use the communication
>   preferences portal
>   <https://info.talend.com/emailpreferencesen.html>. To exercise your
>   data protection rights, use the privacy request form
>   <https://talend.my.onetrust.com/webform/ef906c5a-de41-4ea0-ba73-96c079cdd15a/b191c71d-f3cb-4a42-9815-0c3ca021704cl>.
>   Contact us here <https://www.talend.com/contact/>or by mail to
>   either of our co-headquarters: Talend, Inc.: 400 South El Camino
>   Real, Ste 1400, San Mateo, CA 94402; Talend SAS: 5/7 rue Salomon De
>   Rothschild, 92150 Suresnes, France
>

Re: [java] Trouble with gradle and using ParquetIO

Posted by Moritz Mack <mm...@talend.com>.
Hi Evan,

Not sure why maven suggests using “compileOnly”.
That’s certainly wrong, make sure to use “implementation” in your case.

Cheers, Moritz

On 21.04.23, 01:52, "Evan Galpin" <eg...@apache.org> wrote:

Hi all, I'm trying to make use of ParquetIO.   Based on what's documented in maven central, I'm including the artifact in "compileOnly" mode (or in maven parlance, 'provided' scope).   I can successfully compile

Hi all,

I'm trying to make use of ParquetIO.  Based on what's documented in maven central, I'm including the artifact in "compileOnly" mode (or in maven parlance, 'provided' scope).  I can successfully compile my pipeline, but when I run it I (intuitively?) am met with a ClassNotFound exception for ParquetIO.
Is 'compileOnly' still the desired way to include ParquetIO as a pipeline dependency?

Thanks,
Evan

As a recipient of an email from the Talend Group, your personal data will be processed by our systems. Please see our Privacy Notice <https://www.talend.com/privacy-policy/> for more information about our collection and use of your personal information, our security practices, and your data protection rights, including any rights you may have to object to automated-decision making or profiling we use to analyze support or marketing related communications. To manage or discontinue promotional communications, use the communication preferences portal<https://info.talend.com/emailpreferencesen.html>. To exercise your data protection rights, use the privacy request form<https://talend.my.onetrust.com/webform/ef906c5a-de41-4ea0-ba73-96c079cdd15a/b191c71d-f3cb-4a42-9815-0c3ca021704cl>. Contact us here <https://www.talend.com/contact/> or by mail to either of our co-headquarters: Talend, Inc.: 400 South El Camino Real, Ste 1400, San Mateo, CA 94402; Talend SAS: 5/7 rue Salomon De Rothschild, 92150 Suresnes, France