You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Matthias Seiler <ma...@campus.tu-berlin.de> on 2021/03/25 09:58:06 UTC

Hadoop is not in the classpath/dependencies

Hello everybody,

I set up a a Flink (1.12.1) and Hadoop (3.2.1) cluster on two machines.
The job should store the checkpoints on HDFS like so:
```java
StreamExecutionEnvironment env =
StreamExecutionEnvironment.getExecutionEnvironment();
env.enableCheckpointing(15000, CheckpointingMode.EXACTLY_ONCE);
env.setStateBackend(new FsStateBackend("hdfs://node-1:9000/flink"));
```

Unfortunately, the JobManager throws
```
org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could not
find a file system implementation for scheme 'hdfs'. The scheme is not
directly supported by Flink and no Hadoop file system to support this
scheme could be loaded. For a full list of supported file systems,
please see
https://ci.apache.org/projects/flink/flink-docs-stable/ops/filesystems/.
// ...
Caused by:
org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Hadoop is
not in the classpath/dependencies.
```
and I don't understand why.

`echo $HADOOP_CLASSPATH` returns the path of Hadoop libraries with
wildcards. Flink's JobManger prints the classpath which includes
specific packages from these Hadoop libraries. Besides that, Flink
creates the state directories on HDFS, but no content.

Thank you for any advice,
Matthias


Re: Hadoop is not in the classpath/dependencies

Posted by Maminspapin <un...@mail.ru>.
I have the same problem  ...



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Hadoop is not in the classpath/dependencies

Posted by Maminspapin <un...@mail.ru>.
I downloaded the lib (last version) from here:
https://repo.maven.apache.org/maven2/org/apache/flink/flink-shaded-hadoop-2-uber/2.8.3-7.0/

and put it in the flink_home/lib directory.

It helped.



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Hadoop is not in the classpath/dependencies

Posted by Chesnay Schepler <ch...@apache.org>.
This looks related to HDFS-12920; where Hadoop 2.X tries to read a 
duration from hdfs-default.xml expecting plain numbers, but in 3.x they 
also contain time units.

On 3/30/2021 9:37 AM, Matthias Seiler wrote:
>
> Thank you all for the replies!
>
>
> I did as @Maminspapin suggested and indeed the previous error 
> disappeared, but now the exception is
> ```
> java.io.IOException: Cannot instantiate file system for URI: 
> hdfs://node-1:9000/flink
> //...
> Caused by: java.lang.NumberFormatException: For input string: "30s"
> // this is thrown by the flink-shaded-hadoop library
> ```
> I thought that it relates to the windowing I do, which has a slide 
> interval of 30 seconds, but removing it displays the same error.
>
> I also added the dependency to the maven pom, but without effect.
>
> Since I use Hadoop 3.2.1, I also tried 
> https://mvnrepository.com/artifact/org.apache.flink/flink-shaded-hadoop-3-uber 
> but with this I can't even start a cluster (`TaskManager 
> initialization failed`).
>
>
>
> @Robert, Flink includes roughly 100 hdfs jars. 
> `hadoop-hdfs-client-3.2.1.jar` is one of them and is supposed to 
> contain `DistributedFileSystem.class`, which I checked running `jar 
> tvf hadoop-3.2.1/share/hadoop/hdfs/hadoop-hdfs-client-3.2.1.jar | grep 
> DistributedFileSystem`. How can I verify that the class is really 
> accessible?
>
> Cheers,
> Matthias
>
> On 3/26/21 10:20 AM, Robert Metzger wrote:
>> Hey Matthias,
>>
>> Maybe the classpath contains hadoop libraries, but not the HDFS 
>> libraries? The "DistributedFileSystem" class needs to be accessible 
>> to the classloader. Can you check if that class is available?
>>
>> Best,
>> Robert
>>
>> On Thu, Mar 25, 2021 at 11:10 AM Matthias Seiler 
>> <matthias.seiler@campus.tu-berlin.de 
>> <ma...@campus.tu-berlin.de>> wrote:
>>
>>     Hello everybody,
>>
>>     I set up a a Flink (1.12.1) and Hadoop (3.2.1) cluster on two
>>     machines.
>>     The job should store the checkpoints on HDFS like so:
>>     ```java
>>     StreamExecutionEnvironment env =
>>     StreamExecutionEnvironment.getExecutionEnvironment();
>>     env.enableCheckpointing(15000, CheckpointingMode.EXACTLY_ONCE);
>>     env.setStateBackend(new FsStateBackend("hdfs://node-1:9000/flink"));
>>     ```
>>
>>     Unfortunately, the JobManager throws
>>     ```
>>     org.apache.flink.core.fs.UnsupportedFileSystemSchemeException:
>>     Could not
>>     find a file system implementation for scheme 'hdfs'. The scheme
>>     is not
>>     directly supported by Flink and no Hadoop file system to support this
>>     scheme could be loaded. For a full list of supported file systems,
>>     please see
>>     https://ci.apache.org/projects/flink/flink-docs-stable/ops/filesystems/
>>     <https://ci.apache.org/projects/flink/flink-docs-stable/ops/filesystems/>.
>>     // ...
>>     Caused by:
>>     org.apache.flink.core.fs.UnsupportedFileSystemSchemeException:
>>     Hadoop is
>>     not in the classpath/dependencies.
>>     ```
>>     and I don't understand why.
>>
>>     `echo $HADOOP_CLASSPATH` returns the path of Hadoop libraries with
>>     wildcards. Flink's JobManger prints the classpath which includes
>>     specific packages from these Hadoop libraries. Besides that, Flink
>>     creates the state directories on HDFS, but no content.
>>
>>     Thank you for any advice,
>>     Matthias
>>


Re: Hadoop is not in the classpath/dependencies

Posted by Matthias Seiler <ma...@campus.tu-berlin.de>.
Thank you all for the replies!


I did as @Maminspapin suggested and indeed the previous error
disappeared, but now the exception is
```
java.io.IOException: Cannot instantiate file system for URI:
hdfs://node-1:9000/flink
//...
Caused by: java.lang.NumberFormatException: For input string: "30s"
// this is thrown by the flink-shaded-hadoop library
```
I thought that it relates to the windowing I do, which has a slide
interval of 30 seconds, but removing it displays the same error.

I also added the dependency to the maven pom, but without effect.

Since I use Hadoop 3.2.1, I also tried
https://mvnrepository.com/artifact/org.apache.flink/flink-shaded-hadoop-3-uber
but with this I can't even start a cluster (`TaskManager initialization
failed`).



@Robert, Flink includes roughly 100 hdfs jars.
`hadoop-hdfs-client-3.2.1.jar` is one of them and is supposed to contain
`DistributedFileSystem.class`, which I checked running `jar tvf
hadoop-3.2.1/share/hadoop/hdfs/hadoop-hdfs-client-3.2.1.jar | grep
DistributedFileSystem`. How can I verify that the class is really
accessible?

Cheers,
Matthias

On 3/26/21 10:20 AM, Robert Metzger wrote:
> Hey Matthias,
>
> Maybe the classpath contains hadoop libraries, but not the HDFS
> libraries? The "DistributedFileSystem" class needs to be accessible to
> the classloader. Can you check if that class is available?
>
> Best,
> Robert
>
> On Thu, Mar 25, 2021 at 11:10 AM Matthias Seiler
> <matthias.seiler@campus.tu-berlin.de
> <ma...@campus.tu-berlin.de>> wrote:
>
>     Hello everybody,
>
>     I set up a a Flink (1.12.1) and Hadoop (3.2.1) cluster on two
>     machines.
>     The job should store the checkpoints on HDFS like so:
>     ```java
>     StreamExecutionEnvironment env =
>     StreamExecutionEnvironment.getExecutionEnvironment();
>     env.enableCheckpointing(15000, CheckpointingMode.EXACTLY_ONCE);
>     env.setStateBackend(new FsStateBackend("hdfs://node-1:9000/flink"));
>     ```
>
>     Unfortunately, the JobManager throws
>     ```
>     org.apache.flink.core.fs.UnsupportedFileSystemSchemeException:
>     Could not
>     find a file system implementation for scheme 'hdfs'. The scheme is not
>     directly supported by Flink and no Hadoop file system to support this
>     scheme could be loaded. For a full list of supported file systems,
>     please see
>     https://ci.apache.org/projects/flink/flink-docs-stable/ops/filesystems/
>     <https://ci.apache.org/projects/flink/flink-docs-stable/ops/filesystems/>.
>     // ...
>     Caused by:
>     org.apache.flink.core.fs.UnsupportedFileSystemSchemeException:
>     Hadoop is
>     not in the classpath/dependencies.
>     ```
>     and I don't understand why.
>
>     `echo $HADOOP_CLASSPATH` returns the path of Hadoop libraries with
>     wildcards. Flink's JobManger prints the classpath which includes
>     specific packages from these Hadoop libraries. Besides that, Flink
>     creates the state directories on HDFS, but no content.
>
>     Thank you for any advice,
>     Matthias
>

Re: Hadoop is not in the classpath/dependencies

Posted by Robert Metzger <rm...@apache.org>.
Hey Matthias,

Maybe the classpath contains hadoop libraries, but not the HDFS libraries?
The "DistributedFileSystem" class needs to be accessible to the
classloader. Can you check if that class is available?

Best,
Robert

On Thu, Mar 25, 2021 at 11:10 AM Matthias Seiler <
matthias.seiler@campus.tu-berlin.de> wrote:

> Hello everybody,
>
> I set up a a Flink (1.12.1) and Hadoop (3.2.1) cluster on two machines.
> The job should store the checkpoints on HDFS like so:
> ```java
> StreamExecutionEnvironment env =
> StreamExecutionEnvironment.getExecutionEnvironment();
> env.enableCheckpointing(15000, CheckpointingMode.EXACTLY_ONCE);
> env.setStateBackend(new FsStateBackend("hdfs://node-1:9000/flink"));
> ```
>
> Unfortunately, the JobManager throws
> ```
> org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could not
> find a file system implementation for scheme 'hdfs'. The scheme is not
> directly supported by Flink and no Hadoop file system to support this
> scheme could be loaded. For a full list of supported file systems,
> please see
> https://ci.apache.org/projects/flink/flink-docs-stable/ops/filesystems/.
> // ...
> Caused by:
> org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Hadoop is
> not in the classpath/dependencies.
> ```
> and I don't understand why.
>
> `echo $HADOOP_CLASSPATH` returns the path of Hadoop libraries with
> wildcards. Flink's JobManger prints the classpath which includes
> specific packages from these Hadoop libraries. Besides that, Flink
> creates the state directories on HDFS, but no content.
>
> Thank you for any advice,
> Matthias
>
>