You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by Lorenzo Nicora <lo...@gmail.com> on 2020/07/16 10:58:35 UTC

Hadoop FS when running standalone

Hi

I need to run my streaming job as a *standalone* Java application, for
testing
The job uses the Hadoop S3 FS and I need to test it (not a unit test).

The job works fine when deployed (I am using AWS Kinesis Data Analytics, so
Flink 1.8.2)

I have *org.apache.flink:flink-s3-fs-hadoop* as a "compile" dependency.

For running standalone, I have a Maven profile adding dependencies that are
normally provided (
*org.apache.flink:flink-java*, *org.apache.flink:flink-streaming-java_2.11*,
*org.apache.flink:flink-statebackend-rocksdb_2.11*,
*org.apache.flink:flink-connector-filesystem_2.11*) but I keep getting the
error "Hadoop is not in the classpath/dependencies" and it does not work.
I tried adding *org.apache.flink:flink-hadoop-fs* with no luck

What dependencies am I missing?

Cheers
Lorenzo

Re: Hadoop FS when running standalone

Posted by Lorenzo Nicora <lo...@gmail.com>.

Thanks Alessandro,

I think I solved it.
I cannot set any HADOOP_HOME as I have no Hadoop installed on the machine
running my tests.
But adding *org.apache.flink:flink-shaded-hadoop-2:2.8.3-10.0* as a compile
dependency to the Maven profile building the standalone version fixed the
issue.

Lorenzo


On Thu, 16 Jul 2020 at 15:35, Alessandro Solimando <
alessandro.solimando@gmail.com> wrote:

> Hi Lorenzo,
> IIRC I had the same error message when trying to write snappified parquet
> on HDFS with a standalone fat jar.
>
> Flink could not "find" the hadoop native/binary libraries (specifically I
> think for me the issue was related to snappy), because my HADOOP_HOME was
> not (properly) set.
>
> I have never used S3 so I don't know if what I mentioned could be the
> problem here too, but worth checking.
>
> Best regards,
> Alessandro
>
> On Thu, 16 Jul 2020 at 12:59, Lorenzo Nicora <lo...@gmail.com>
> wrote:
>
>> Hi
>>
>> I need to run my streaming job as a *standalone* Java application, for
>> testing
>> The job uses the Hadoop S3 FS and I need to test it (not a unit test).
>>
>> The job works fine when deployed (I am using AWS Kinesis Data Analytics,
>> so Flink 1.8.2)
>>
>> I have *org.apache.flink:flink-s3-fs-hadoop* as a "compile" dependency.
>>
>> For running standalone, I have a Maven profile adding dependencies that
>> are normally provided (
>> *org.apache.flink:flink-java*,
>> *org.apache.flink:flink-streaming-java_2.11*,
>> *org.apache.flink:flink-statebackend-rocksdb_2.11*,
>> *org.apache.flink:flink-connector-filesystem_2.11*) but I keep getting
>> the error "Hadoop is not in the classpath/dependencies" and it does not
>> work.
>> I tried adding *org.apache.flink:flink-hadoop-fs* with no luck
>>
>> What dependencies am I missing?
>>
>> Cheers
>> Lorenzo
>>
>

Re: Hadoop FS when running standalone

Posted by Alessandro Solimando <al...@gmail.com>.

Hi Lorenzo,
IIRC I had the same error message when trying to write snappified parquet
on HDFS with a standalone fat jar.

Flink could not "find" the hadoop native/binary libraries (specifically I
think for me the issue was related to snappy), because my HADOOP_HOME was
not (properly) set.

I have never used S3 so I don't know if what I mentioned could be the
problem here too, but worth checking.

Best regards,
Alessandro

On Thu, 16 Jul 2020 at 12:59, Lorenzo Nicora <lo...@gmail.com>
wrote:

> Hi
>
> I need to run my streaming job as a *standalone* Java application, for
> testing
> The job uses the Hadoop S3 FS and I need to test it (not a unit test).
>
> The job works fine when deployed (I am using AWS Kinesis Data Analytics,
> so Flink 1.8.2)
>
> I have *org.apache.flink:flink-s3-fs-hadoop* as a "compile" dependency.
>
> For running standalone, I have a Maven profile adding dependencies that
> are normally provided (
> *org.apache.flink:flink-java*,
> *org.apache.flink:flink-streaming-java_2.11*,
> *org.apache.flink:flink-statebackend-rocksdb_2.11*,
> *org.apache.flink:flink-connector-filesystem_2.11*) but I keep getting
> the error "Hadoop is not in the classpath/dependencies" and it does not
> work.
> I tried adding *org.apache.flink:flink-hadoop-fs* with no luck
>
> What dependencies am I missing?
>
> Cheers
> Lorenzo
>