You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Anubhav Nanda <aa...@gmail.com> on 2022/04/13 04:50:08 UTC

Issue with doing filesink to HDFS

Hi,

I have setup flink 1.13.5 and we are using Hadoop 3.0.0 while we are
running simple wordcount example we are getting following error


./flink-1.13.5/bin/flink run flink-1.13.5/examples/batch/WordCount.jar
--input hdfs:///tmp/log4j.properties


Caused by: org.apache.flink.runtime.JobException: Creating the input splits
caused an error: Could not find a file system implementation for scheme
'hdfs'. The scheme is not directly supported by Flink and no Hadoop file
system to support this scheme could be loaded. For a full list of supported
file systems, please see
https://nightlies.apache.org/flink/flink-docs-stable/ops/filesystems/.

        at
org.apache.flink.runtime.executiongraph.ExecutionJobVertex.<init>(ExecutionJobVertex.java:247)

        at
org.apache.flink.runtime.executiongraph.DefaultExecutionGraph.attachJobGraph(DefaultExecutionGraph.java:792)

        at
org.apache.flink.runtime.executiongraph.DefaultExecutionGraphBuilder.buildGraph(DefaultExecutionGraphBuilder.java:196)

        at
org.apache.flink.runtime.scheduler.DefaultExecutionGraphFactory.createAndRestoreExecutionGraph(DefaultExecutionGraphFactory.java:107)

        at
org.apache.flink.runtime.scheduler.SchedulerBase.createAndRestoreExecutionGraph(SchedulerBase.java:342)

        at
org.apache.flink.runtime.scheduler.SchedulerBase.<init>(SchedulerBase.java:190)

        at
org.apache.flink.runtime.scheduler.DefaultScheduler.<init>(DefaultScheduler.java:122)

        at
org.apache.flink.runtime.scheduler.DefaultSchedulerFactory.createInstance(DefaultSchedulerFactory.java:132)

        at
org.apache.flink.runtime.jobmaster.DefaultSlotPoolServiceSchedulerFactory.createScheduler(DefaultSlotPoolServiceSchedulerFactory.java:110)

        at
org.apache.flink.runtime.jobmaster.JobMaster.createScheduler(JobMaster.java:340)

        at
org.apache.flink.runtime.jobmaster.JobMaster.<init>(JobMaster.java:317)

        at
org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.internalCreateJobMasterService(DefaultJobMasterServiceFactory.java:107)

        at
org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.lambda$createJobMasterService$0(DefaultJobMasterServiceFactory.java:95)

        at
org.apache.flink.util.function.FunctionUtils.lambda$uncheckedSupplier$4(FunctionUtils.java:112)

        ... 8 more

Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException:
Could not find a file system implementation for scheme 'hdfs'. The scheme
is not directly supported by Flink and no Hadoop file system to support
this scheme could be loaded. For a full list of supported file systems,
please see
https://nightlies.apache.org/flink/flink-docs-stable/ops/filesystems/.

        at
org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:530)

        at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:407)

        at org.apache.flink.core.fs.Path.getFileSystem(Path.java:274)

        at
org.apache.flink.api.common.io.FileInputFormat.createInputSplits(FileInputFormat.java:599)

        at
org.apache.flink.api.common.io.FileInputFormat.createInputSplits(FileInputFormat.java:63)

        at
org.apache.flink.runtime.executiongraph.ExecutionJobVertex.<init>(ExecutionJobVertex.java:234)

        ... 21 more

Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException:
Cannot support file system for 'hdfs' via Hadoop, because Hadoop is not in
the classpath, or some classes are missing from the classpath.

        at
org.apache.flink.runtime.fs.hdfs.HadoopFsFactory.create(HadoopFsFactory.java:189)

        at
org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:526)

        ... 26 more

Caused by: java.lang.NoClassDefFoundError: Could not initialize class
org.apache.hadoop.hdfs.HdfsConfiguration

        at
org.apache.flink.runtime.util.HadoopUtils.getHadoopConfiguration(HadoopUtils.java:59)

        at
org.apache.flink.runtime.fs.hdfs.HadoopFsFactory.create(HadoopFsFactory.java:84)

        ... 27 more






[image: Mailtrack]
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality11&>
Sender
notified by
Mailtrack
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality11&>
13/04/22,
10:20:02

Re: Issue with doing filesink to HDFS

Posted by Guowei Ma <gu...@gmail.com>.
Hi,Anubhav

Would you like to share the result of `echo $HADOOP_CLASSPATH`  and the
detailed information after you set up the hadoop classpaht?

Best,
Guowei


On Wed, Apr 13, 2022 at 4:16 PM Anubhav Nanda <aa...@gmail.com>
wrote:

> Hi Guomei,
>
> That i already did but still getting the issue
>
> Regards,
> Anubhav
>
>
>
> [image: Mailtrack]
> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality11&> Sender
> notified by
> Mailtrack
> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality11&> 13/04/22,
> 13:46:13
>
> On Wed, Apr 13, 2022 at 1:23 PM Guowei Ma <gu...@gmail.com> wrote:
>
>> Hi
>> I think you need to export HADOOP_CLASSPATH correclty. [1]
>>
>> [1]
>> https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/resource-providers/yarn/#preparation
>> Best,
>> Guowei
>>
>>
>> On Wed, Apr 13, 2022 at 12:50 PM Anubhav Nanda <aa...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I have setup flink 1.13.5 and we are using Hadoop 3.0.0 while we are
>>> running simple wordcount example we are getting following error
>>>
>>>
>>> ./flink-1.13.5/bin/flink run flink-1.13.5/examples/batch/WordCount.jar
>>> --input hdfs:///tmp/log4j.properties
>>>
>>>
>>> Caused by: org.apache.flink.runtime.JobException: Creating the input
>>> splits caused an error: Could not find a file system implementation for
>>> scheme 'hdfs'. The scheme is not directly supported by Flink and no Hadoop
>>> file system to support this scheme could be loaded. For a full list of
>>> supported file systems, please see
>>> https://nightlies.apache.org/flink/flink-docs-stable/ops/filesystems/.
>>>
>>>         at
>>> org.apache.flink.runtime.executiongraph.ExecutionJobVertex.<init>(ExecutionJobVertex.java:247)
>>>
>>>         at
>>> org.apache.flink.runtime.executiongraph.DefaultExecutionGraph.attachJobGraph(DefaultExecutionGraph.java:792)
>>>
>>>         at
>>> org.apache.flink.runtime.executiongraph.DefaultExecutionGraphBuilder.buildGraph(DefaultExecutionGraphBuilder.java:196)
>>>
>>>         at
>>> org.apache.flink.runtime.scheduler.DefaultExecutionGraphFactory.createAndRestoreExecutionGraph(DefaultExecutionGraphFactory.java:107)
>>>
>>>         at
>>> org.apache.flink.runtime.scheduler.SchedulerBase.createAndRestoreExecutionGraph(SchedulerBase.java:342)
>>>
>>>         at
>>> org.apache.flink.runtime.scheduler.SchedulerBase.<init>(SchedulerBase.java:190)
>>>
>>>         at
>>> org.apache.flink.runtime.scheduler.DefaultScheduler.<init>(DefaultScheduler.java:122)
>>>
>>>         at
>>> org.apache.flink.runtime.scheduler.DefaultSchedulerFactory.createInstance(DefaultSchedulerFactory.java:132)
>>>
>>>         at
>>> org.apache.flink.runtime.jobmaster.DefaultSlotPoolServiceSchedulerFactory.createScheduler(DefaultSlotPoolServiceSchedulerFactory.java:110)
>>>
>>>         at
>>> org.apache.flink.runtime.jobmaster.JobMaster.createScheduler(JobMaster.java:340)
>>>
>>>         at
>>> org.apache.flink.runtime.jobmaster.JobMaster.<init>(JobMaster.java:317)
>>>
>>>         at
>>> org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.internalCreateJobMasterService(DefaultJobMasterServiceFactory.java:107)
>>>
>>>         at
>>> org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.lambda$createJobMasterService$0(DefaultJobMasterServiceFactory.java:95)
>>>
>>>         at
>>> org.apache.flink.util.function.FunctionUtils.lambda$uncheckedSupplier$4(FunctionUtils.java:112)
>>>
>>>         ... 8 more
>>>
>>> Caused by:
>>> org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could not
>>> find a file system implementation for scheme 'hdfs'. The scheme is not
>>> directly supported by Flink and no Hadoop file system to support this
>>> scheme could be loaded. For a full list of supported file systems, please
>>> see
>>> https://nightlies.apache.org/flink/flink-docs-stable/ops/filesystems/.
>>>
>>>         at
>>> org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:530)
>>>
>>>         at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:407)
>>>
>>>         at org.apache.flink.core.fs.Path.getFileSystem(Path.java:274)
>>>
>>>         at
>>> org.apache.flink.api.common.io.FileInputFormat.createInputSplits(FileInputFormat.java:599)
>>>
>>>         at
>>> org.apache.flink.api.common.io.FileInputFormat.createInputSplits(FileInputFormat.java:63)
>>>
>>>         at
>>> org.apache.flink.runtime.executiongraph.ExecutionJobVertex.<init>(ExecutionJobVertex.java:234)
>>>
>>>         ... 21 more
>>>
>>> Caused by:
>>> org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Cannot
>>> support file system for 'hdfs' via Hadoop, because Hadoop is not in the
>>> classpath, or some classes are missing from the classpath.
>>>
>>>         at
>>> org.apache.flink.runtime.fs.hdfs.HadoopFsFactory.create(HadoopFsFactory.java:189)
>>>
>>>         at
>>> org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:526)
>>>
>>>         ... 26 more
>>>
>>> Caused by: java.lang.NoClassDefFoundError: Could not initialize class
>>> org.apache.hadoop.hdfs.HdfsConfiguration
>>>
>>>         at
>>> org.apache.flink.runtime.util.HadoopUtils.getHadoopConfiguration(HadoopUtils.java:59)
>>>
>>>         at
>>> org.apache.flink.runtime.fs.hdfs.HadoopFsFactory.create(HadoopFsFactory.java:84)
>>>
>>>         ... 27 more
>>>
>>>
>>>
>>>
>>>
>>>
>>> [image: Mailtrack]
>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality11&> Sender
>>> notified by
>>> Mailtrack
>>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality11&> 13/04/22,
>>> 10:20:02
>>>
>>

Re: Issue with doing filesink to HDFS

Posted by Anubhav Nanda <aa...@gmail.com>.
Hi Guomei,

That i already did but still getting the issue

Regards,
Anubhav



[image: Mailtrack]
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality11&>
Sender
notified by
Mailtrack
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality11&>
13/04/22,
13:46:13

On Wed, Apr 13, 2022 at 1:23 PM Guowei Ma <gu...@gmail.com> wrote:

> Hi
> I think you need to export HADOOP_CLASSPATH correclty. [1]
>
> [1]
> https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/resource-providers/yarn/#preparation
> Best,
> Guowei
>
>
> On Wed, Apr 13, 2022 at 12:50 PM Anubhav Nanda <aa...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I have setup flink 1.13.5 and we are using Hadoop 3.0.0 while we are
>> running simple wordcount example we are getting following error
>>
>>
>> ./flink-1.13.5/bin/flink run flink-1.13.5/examples/batch/WordCount.jar
>> --input hdfs:///tmp/log4j.properties
>>
>>
>> Caused by: org.apache.flink.runtime.JobException: Creating the input
>> splits caused an error: Could not find a file system implementation for
>> scheme 'hdfs'. The scheme is not directly supported by Flink and no Hadoop
>> file system to support this scheme could be loaded. For a full list of
>> supported file systems, please see
>> https://nightlies.apache.org/flink/flink-docs-stable/ops/filesystems/.
>>
>>         at
>> org.apache.flink.runtime.executiongraph.ExecutionJobVertex.<init>(ExecutionJobVertex.java:247)
>>
>>         at
>> org.apache.flink.runtime.executiongraph.DefaultExecutionGraph.attachJobGraph(DefaultExecutionGraph.java:792)
>>
>>         at
>> org.apache.flink.runtime.executiongraph.DefaultExecutionGraphBuilder.buildGraph(DefaultExecutionGraphBuilder.java:196)
>>
>>         at
>> org.apache.flink.runtime.scheduler.DefaultExecutionGraphFactory.createAndRestoreExecutionGraph(DefaultExecutionGraphFactory.java:107)
>>
>>         at
>> org.apache.flink.runtime.scheduler.SchedulerBase.createAndRestoreExecutionGraph(SchedulerBase.java:342)
>>
>>         at
>> org.apache.flink.runtime.scheduler.SchedulerBase.<init>(SchedulerBase.java:190)
>>
>>         at
>> org.apache.flink.runtime.scheduler.DefaultScheduler.<init>(DefaultScheduler.java:122)
>>
>>         at
>> org.apache.flink.runtime.scheduler.DefaultSchedulerFactory.createInstance(DefaultSchedulerFactory.java:132)
>>
>>         at
>> org.apache.flink.runtime.jobmaster.DefaultSlotPoolServiceSchedulerFactory.createScheduler(DefaultSlotPoolServiceSchedulerFactory.java:110)
>>
>>         at
>> org.apache.flink.runtime.jobmaster.JobMaster.createScheduler(JobMaster.java:340)
>>
>>         at
>> org.apache.flink.runtime.jobmaster.JobMaster.<init>(JobMaster.java:317)
>>
>>         at
>> org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.internalCreateJobMasterService(DefaultJobMasterServiceFactory.java:107)
>>
>>         at
>> org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.lambda$createJobMasterService$0(DefaultJobMasterServiceFactory.java:95)
>>
>>         at
>> org.apache.flink.util.function.FunctionUtils.lambda$uncheckedSupplier$4(FunctionUtils.java:112)
>>
>>         ... 8 more
>>
>> Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException:
>> Could not find a file system implementation for scheme 'hdfs'. The scheme
>> is not directly supported by Flink and no Hadoop file system to support
>> this scheme could be loaded. For a full list of supported file systems,
>> please see
>> https://nightlies.apache.org/flink/flink-docs-stable/ops/filesystems/.
>>
>>         at
>> org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:530)
>>
>>         at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:407)
>>
>>         at org.apache.flink.core.fs.Path.getFileSystem(Path.java:274)
>>
>>         at
>> org.apache.flink.api.common.io.FileInputFormat.createInputSplits(FileInputFormat.java:599)
>>
>>         at
>> org.apache.flink.api.common.io.FileInputFormat.createInputSplits(FileInputFormat.java:63)
>>
>>         at
>> org.apache.flink.runtime.executiongraph.ExecutionJobVertex.<init>(ExecutionJobVertex.java:234)
>>
>>         ... 21 more
>>
>> Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException:
>> Cannot support file system for 'hdfs' via Hadoop, because Hadoop is not in
>> the classpath, or some classes are missing from the classpath.
>>
>>         at
>> org.apache.flink.runtime.fs.hdfs.HadoopFsFactory.create(HadoopFsFactory.java:189)
>>
>>         at
>> org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:526)
>>
>>         ... 26 more
>>
>> Caused by: java.lang.NoClassDefFoundError: Could not initialize class
>> org.apache.hadoop.hdfs.HdfsConfiguration
>>
>>         at
>> org.apache.flink.runtime.util.HadoopUtils.getHadoopConfiguration(HadoopUtils.java:59)
>>
>>         at
>> org.apache.flink.runtime.fs.hdfs.HadoopFsFactory.create(HadoopFsFactory.java:84)
>>
>>         ... 27 more
>>
>>
>>
>>
>>
>>
>> [image: Mailtrack]
>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality11&> Sender
>> notified by
>> Mailtrack
>> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality11&> 13/04/22,
>> 10:20:02
>>
>

Re: Issue with doing filesink to HDFS

Posted by Guowei Ma <gu...@gmail.com>.
Hi
I think you need to export HADOOP_CLASSPATH correclty. [1]

[1]
https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/resource-providers/yarn/#preparation
Best,
Guowei


On Wed, Apr 13, 2022 at 12:50 PM Anubhav Nanda <aa...@gmail.com>
wrote:

> Hi,
>
> I have setup flink 1.13.5 and we are using Hadoop 3.0.0 while we are
> running simple wordcount example we are getting following error
>
>
> ./flink-1.13.5/bin/flink run flink-1.13.5/examples/batch/WordCount.jar
> --input hdfs:///tmp/log4j.properties
>
>
> Caused by: org.apache.flink.runtime.JobException: Creating the input
> splits caused an error: Could not find a file system implementation for
> scheme 'hdfs'. The scheme is not directly supported by Flink and no Hadoop
> file system to support this scheme could be loaded. For a full list of
> supported file systems, please see
> https://nightlies.apache.org/flink/flink-docs-stable/ops/filesystems/.
>
>         at
> org.apache.flink.runtime.executiongraph.ExecutionJobVertex.<init>(ExecutionJobVertex.java:247)
>
>         at
> org.apache.flink.runtime.executiongraph.DefaultExecutionGraph.attachJobGraph(DefaultExecutionGraph.java:792)
>
>         at
> org.apache.flink.runtime.executiongraph.DefaultExecutionGraphBuilder.buildGraph(DefaultExecutionGraphBuilder.java:196)
>
>         at
> org.apache.flink.runtime.scheduler.DefaultExecutionGraphFactory.createAndRestoreExecutionGraph(DefaultExecutionGraphFactory.java:107)
>
>         at
> org.apache.flink.runtime.scheduler.SchedulerBase.createAndRestoreExecutionGraph(SchedulerBase.java:342)
>
>         at
> org.apache.flink.runtime.scheduler.SchedulerBase.<init>(SchedulerBase.java:190)
>
>         at
> org.apache.flink.runtime.scheduler.DefaultScheduler.<init>(DefaultScheduler.java:122)
>
>         at
> org.apache.flink.runtime.scheduler.DefaultSchedulerFactory.createInstance(DefaultSchedulerFactory.java:132)
>
>         at
> org.apache.flink.runtime.jobmaster.DefaultSlotPoolServiceSchedulerFactory.createScheduler(DefaultSlotPoolServiceSchedulerFactory.java:110)
>
>         at
> org.apache.flink.runtime.jobmaster.JobMaster.createScheduler(JobMaster.java:340)
>
>         at
> org.apache.flink.runtime.jobmaster.JobMaster.<init>(JobMaster.java:317)
>
>         at
> org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.internalCreateJobMasterService(DefaultJobMasterServiceFactory.java:107)
>
>         at
> org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.lambda$createJobMasterService$0(DefaultJobMasterServiceFactory.java:95)
>
>         at
> org.apache.flink.util.function.FunctionUtils.lambda$uncheckedSupplier$4(FunctionUtils.java:112)
>
>         ... 8 more
>
> Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException:
> Could not find a file system implementation for scheme 'hdfs'. The scheme
> is not directly supported by Flink and no Hadoop file system to support
> this scheme could be loaded. For a full list of supported file systems,
> please see
> https://nightlies.apache.org/flink/flink-docs-stable/ops/filesystems/.
>
>         at
> org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:530)
>
>         at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:407)
>
>         at org.apache.flink.core.fs.Path.getFileSystem(Path.java:274)
>
>         at
> org.apache.flink.api.common.io.FileInputFormat.createInputSplits(FileInputFormat.java:599)
>
>         at
> org.apache.flink.api.common.io.FileInputFormat.createInputSplits(FileInputFormat.java:63)
>
>         at
> org.apache.flink.runtime.executiongraph.ExecutionJobVertex.<init>(ExecutionJobVertex.java:234)
>
>         ... 21 more
>
> Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException:
> Cannot support file system for 'hdfs' via Hadoop, because Hadoop is not in
> the classpath, or some classes are missing from the classpath.
>
>         at
> org.apache.flink.runtime.fs.hdfs.HadoopFsFactory.create(HadoopFsFactory.java:189)
>
>         at
> org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:526)
>
>         ... 26 more
>
> Caused by: java.lang.NoClassDefFoundError: Could not initialize class
> org.apache.hadoop.hdfs.HdfsConfiguration
>
>         at
> org.apache.flink.runtime.util.HadoopUtils.getHadoopConfiguration(HadoopUtils.java:59)
>
>         at
> org.apache.flink.runtime.fs.hdfs.HadoopFsFactory.create(HadoopFsFactory.java:84)
>
>         ... 27 more
>
>
>
>
>
>
> [image: Mailtrack]
> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality11&> Sender
> notified by
> Mailtrack
> <https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality11&> 13/04/22,
> 10:20:02
>