You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Andrew Kettmann <ak...@evolve24.com> on 2021/06/11 21:33:44 UTC

[Java] [ParquetIO] How to determine required dependencies

I am quite ignorant to the details of package management in Java (Usually write stuff in Python, but the beam SDK in Python is not to the same level the Java one is), I am troubleshooting an issue specifically on the DataflowRunner and I decided to try upgrading Beam from 2.28.0 to 2.30.0.

However, code that ran under 2.28.0 now gives a class not found exception when it attempts to write data to Parquet locally. My question is what is the expected path to know that I am going to need additional dependencies and what they are when upgrading the Beam SDK? I would assume that there is a path that does not involve googling classes that the pipeline tries to call and adding dependencies until it stops complaining.

Could someone more experienced tell me what the expected path is for this?

The specific error I am getting is regarding some Hadoop class for either ParquetIO or Snappy compression, but my question is more general. How do I know what packages and versions are intended to be used with the different aspects of beam extensions?

[https://storage.googleapis.com/e24-email-images/e24logonotag.png]<https://www.evolve24.com> Andrew Kettmann
DevOps Engineer
P: 1.314.596.2836
[LinkedIn]<https://linkedin.com/company/evolve24> [Twitter] <https://twitter.com/evolve24>  [Instagram] <https://www.instagram.com/evolve_24>

evolve24 Confidential & Proprietary Statement: This email and any attachments are confidential and may contain information that is privileged, confidential or exempt from disclosure under applicable law. It is intended for the use of the recipients. If you are not the intended recipient, or believe that you have received this communication in error, please do not read, print, copy, retransmit, disseminate, or otherwise use the information. Please delete this email and attachments, without reading, printing, copying, forwarding or saving them, and notify the Sender immediately by reply email. No confidentiality or privilege is waived or lost by any transmission in error.

Re: [Java] [ParquetIO] How to determine required dependencies

Posted by Kyle Weaver <kc...@google.com>.
As far as I know, the only dependency you need to manage directly
is beam-sdks-java-io-parquet [1]. Can you make sure the version of that
dependency is correct (i.e. matches the version of your other Beam
dependencies)?

[1] https://beam.apache.org/documentation/io/built-in/parquet/

On Fri, Jun 11, 2021 at 2:34 PM Andrew Kettmann <ak...@evolve24.com>
wrote:

> I am quite ignorant to the details of package management in Java (Usually
> write stuff in Python, but the beam SDK in Python is not to the same level
> the Java one is), I am troubleshooting an issue specifically on the
> DataflowRunner and I decided to try upgrading Beam from 2.28.0 to 2.30.0.
>
> However, code that ran under 2.28.0 now gives a class not found exception
> when it attempts to write data to Parquet locally. My question is what is
> the expected path to know that I am going to need additional dependencies
> and what they are when upgrading the Beam SDK? I would assume that there is
> a path that does not involve googling classes that the pipeline tries to
> call and adding dependencies until it stops complaining.
>
> Could someone more experienced tell me what the expected path is for this?
>
> The specific error I am getting is regarding some Hadoop class for either
> ParquetIO or Snappy compression, but my question is more general. How do I
> know what packages and versions are intended to be used with the different
> aspects of beam extensions?
>
> <https://www.evolve24.com> *Andrew Kettmann*
> DevOps Engineer
> P: 1.314.596.2836 <(314)%20596-2836>
> [image: LinkedIn] <https://linkedin.com/company/evolve24> [image: Twitter]
> <https://twitter.com/evolve24> [image: Instagram]
> <https://www.instagram.com/evolve_24>
>
> evolve24 Confidential & Proprietary Statement: This email and any
> attachments are confidential and may contain information that is
> privileged, confidential or exempt from disclosure under applicable law. It
> is intended for the use of the recipients. If you are not the intended
> recipient, or believe that you have received this communication in error,
> please do not read, print, copy, retransmit, disseminate, or otherwise use
> the information. Please delete this email and attachments, without reading,
> printing, copying, forwarding or saving them, and notify the Sender
> immediately by reply email. No confidentiality or privilege is waived or
> lost by any transmission in error.
>