You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by "Ravi Papisetti (rpapiset)" <rp...@cisco.com> on 2018/03/22 18:05:23 UTC

PutHDFS with mapr

Hi,

I have re-compiled nifi with mapr dependencies as per instructions at http://hariology.com/integrating-mapr-fs-and-apache-nifi/

Created process flow with ListFile > FetchFile > PutHDFS. As soon as I start this process group nifi-bootstrap.log is getting filled with

2018-03-21 22:56:26,806 ERROR [NiFi logging handler] org.apache.nifi.StdErr 2018-03-21 22:56:26,8003 select failed(-1) error Invalid argument

2018-03-21 22:56:26,806 ERROR [NiFi logging handler] org.apache.nifi.StdErr 2018-03-21 22:56:26,8003 select failed(-1) error Invalid argument

This log grows into GBs in minutes. I had to stop nifi to stop the flooding.

I found similar issue in petaho forum: https://jira.pentaho.com/browse/PDI-16270

Any one has any thoughts why this error might be coming?

Appreciate any help.

Thanks,
Ravi Papisetti

Re: PutHDFS with mapr

Posted by "Ravi Papisetti (rpapiset)" <rp...@cisco.com>.
Hmmm..I overlooked about that option. I haven’t tried just pointing PurHDFS to mapr client libraries. That is a good idea, will try in my morning and keep you posted.

Thanks,
Ravi Papisetti

From: Andre <an...@fucs.org>
Reply-To: "andre-lists@fucs.org" <an...@fucs.org>
Date: Sunday, 25 March 2018 at 9:34 PM
To: Cisco Employee <rp...@cisco.com>
Cc: "users@nifi.apache.org" <us...@nifi.apache.org>, "andre-lists@fucs.org" <an...@fucs.org>
Subject: Re: PutHDFS with mapr

Ravi,

Have you tried without recompiling MapR but following the second approach I listed before?

Kind regards

On Mon, Mar 26, 2018 at 1:17 PM, Ravi Papisetti (rpapiset) <rp...@cisco.com>> wrote:
Thanks Andre. In the absence of FUSE license, I have recompiled NiFi with mpar dependencies.

I started getting below errors  when I run PurHDFS processor (in a flow) nifi on mac with mapr client libraries setup. However same package (nifi recompiled with mapr libraries) worked fine on linux version.

2018-03-21 22:56:26,806 ERROR [NiFi logging handler] org.apache.nifi.StdErr 2018-03-21 22:56:26,8003 select failed(-1) error Invalid argument

2018-03-21 22:56:26,806 ERROR [NiFi logging handler] org.apache.nifi.StdErr 2018-03-21 22:56:26,8003 select failed(-1) error Invalid argument

For now I am moving forward with testing from linux machine. Not sure what is the specific issue with mac.

I am still trying to work through connecting to maprDb from nifi.
Thanks,
Ravi Papisetti

From: Andre <an...@fucs.org>>
Reply-To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>, "andre-lists@fucs.org<ma...@fucs.org>" <an...@fucs.org>>
Date: Sunday, 25 March 2018 at 3:56 PM
To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
Subject: Re: PutHDFS with mapr

Joey,

Yes. The client must be installed and setup (this is a requirement for the compiled NiFi as well).

Without the client installed and configured the MapR libraries (java and native) would be lost in to what ZK connect in order to get information about the CLDB (their alternative to namenode).

Cheers

On Mon, Mar 26, 2018 at 1:20 AM, Joey Frazee <jo...@icloud.com>> wrote:
I'm kinda going on memory here because I lost some notes I had about doing this, but I think the compile against the mapr libs presumes you have also have the C-based mapr client libs on your machine at compile time and run time. I skimmed that blog post, albeit very quickly, and didn't see that explicitly mentioned in there.

Using the additional jars in PutHDFS would presumably require them too.

Andre, that's correct isn't it?

On Mar 24, 2018, 8:26 AM -0500, Mark Payne <ma...@hotmail.com>>, wrote:
Andre,

I knew this was possible but had no idea how. Thanks for the great explanation and associates caveats!

-Mark

On Mar 24, 2018, at 1:04 AM, Andre <an...@fucs.org>> wrote:
Ravi,

There are two ways of solving this.

One of them (suggested to me MapR representatives) is to deploy MapR's FUSE client to your NiFi nodes, use the PutFile processor instead of PutHDFS and let the MapR client pump coordinate the API engagement with MapR-FS. This is a very clean and robust approach, however it may have licensing implications as the FUSE client is licensed. (per node if I recall correctly).

The other one is to use the out of box PutHDFS processor with a bit of configurations (it works on both Secure and Insecure clusters).

Try this out

Instead of recompiling PutHDFS simply point it to the mapr-client jars and use a core-site.xml with the following content:

<configuration>
<property>
<name>fs.defaultFS</name>
<value>maprfs:///</value>
</property>
</configuration>

Please note maprclients don't play ball with kerberos nicely and you will be required to use a MapR ticket to access the system. This can be easily done by:

sudo -u <whatever_user_nifi_uses> "kinit -kt /path/to/your/keytab && maprlogin kerberos"

Cheers

[1] https://lists.apache.org/thread.html/af9244266e89990618152bb59b5bf95c9a49dc2428ea3fa0e6aaa682@%3Cusers.nifi.apache.org%3E
[2] https://cwiki.apache.org/confluence/x/zI5zAw



On Fri, Mar 23, 2018 at 5:05 AM, Ravi Papisetti (rpapiset) <rp...@cisco.com>> wrote:
Hi,

I have re-compiled nifi with mapr dependencies as per instructions at http://hariology.com/integrating-mapr-fs-and-apache-nifi/

Created process flow with ListFile > FetchFile > PutHDFS. As soon as I start this process group nifi-bootstrap.log is getting filled with

2018-03-21 22:56:26,806 ERROR [NiFi logging handler] org.apache.nifi.StdErr 2018-03-21 22:56:26,8003 select failed(-1) error Invalid argument

2018-03-21 22:56:26,806 ERROR [NiFi logging handler] org.apache.nifi.StdErr 2018-03-21 22:56:26,8003 select failed(-1) error Invalid argument

This log grows into GBs in minutes. I had to stop nifi to stop the flooding.

I found similar issue in petaho forum: https://jira.pentaho.com/browse/PDI-16270

Any one has any thoughts why this error might be coming?

Appreciate any help.

Thanks,
Ravi Papisetti




Re: PutHDFS with mapr

Posted by Andre <an...@fucs.org>.
Ravi,

Have you tried without recompiling MapR but following the second approach I
listed before?

Kind regards

On Mon, Mar 26, 2018 at 1:17 PM, Ravi Papisetti (rpapiset) <
rpapiset@cisco.com> wrote:

> Thanks Andre. In the absence of FUSE license, I have recompiled NiFi with
> mpar dependencies.
>
>
>
> I started getting below errors  when I run PurHDFS processor (in a flow)
> nifi on mac with mapr client libraries setup. However same package (nifi
> recompiled with mapr libraries) worked fine on linux version.
>
> 2018-03-21 22:56:26,806 ERROR [NiFi logging handler]
> org.apache.nifi.StdErr 2018-03-21 22:56:26,8003 select failed(-1) error
> Invalid argument
>
> 2018-03-21 22:56:26,806 ERROR [NiFi logging handler]
> org.apache.nifi.StdErr 2018-03-21 22:56:26,8003 select failed(-1) error
> Invalid argument
>
> For now I am moving forward with testing from linux machine. Not sure what
> is the specific issue with mac.
>
> I am still trying to work through connecting to maprDb from nifi.
>
> Thanks,
>
> Ravi Papisetti
>
>
>
> *From: *Andre <an...@fucs.org>
> *Reply-To: *"users@nifi.apache.org" <us...@nifi.apache.org>, "
> andre-lists@fucs.org" <an...@fucs.org>
> *Date: *Sunday, 25 March 2018 at 3:56 PM
> *To: *"users@nifi.apache.org" <us...@nifi.apache.org>
> *Subject: *Re: PutHDFS with mapr
>
>
>
> Joey,
>
>
>
> Yes. The client must be installed and setup (this is a requirement for the
> compiled NiFi as well).
>
>
>
> Without the client installed and configured the MapR libraries (java and
> native) would be lost in to what ZK connect in order to get information
> about the CLDB (their alternative to namenode).
>
>
>
> Cheers
>
>
>
> On Mon, Mar 26, 2018 at 1:20 AM, Joey Frazee <jo...@icloud.com>
> wrote:
>
> I'm kinda going on memory here because I lost some notes I had about doing
> this, but I think the compile against the mapr libs presumes you have also
> have the C-based mapr client libs on your machine at compile time and run
> time. I skimmed that blog post, albeit very quickly, and didn't see that
> explicitly mentioned in there.
>
> Using the additional jars in PutHDFS would presumably require them too.
>
> Andre, that's correct isn't it?
>
>
> On Mar 24, 2018, 8:26 AM -0500, Mark Payne <ma...@hotmail.com>, wrote:
>
> Andre,
>
>
>
> I knew this was possible but had no idea how. Thanks for the great
> explanation and associates caveats!
>
>
>
> -Mark
>
>
> On Mar 24, 2018, at 1:04 AM, Andre <an...@fucs.org> wrote:
>
> Ravi,
>
>
>
> There are two ways of solving this.
>
>
>
> One of them (suggested to me MapR representatives) is to deploy MapR's
> FUSE client to your NiFi nodes, use the PutFile processor instead of
> PutHDFS and let the MapR client pump coordinate the API engagement with
> MapR-FS. This is a very clean and robust approach, however it may have
> licensing implications as the FUSE client is licensed. (per node if I
> recall correctly).
>
>
>
> The other one is to use the out of box PutHDFS processor with a bit of
> configurations (it works on both Secure and Insecure clusters).
>
>
>
> Try this out
>
>
>
> Instead of recompiling PutHDFS simply point it to the mapr-client jars and
> use a core-site.xml with the following content:
>
>
>
> <configuration>
>
> <property>
>
> <name>fs.defaultFS</name>
>
> <value>maprfs:///</value>
>
> </property>
>
> </configuration>
>
>
>
> Please note maprclients don't play ball with kerberos nicely and you will
> be required to use a MapR ticket to access the system. This can be easily
> done by:
>
>
>
> sudo -u <whatever_user_nifi_uses> "kinit -kt /path/to/your/keytab &&
> maprlogin kerberos"
>
>
>
> Cheers
>
>
>
> [1] https://lists.apache.org/thread.html/af9244266e89990618152bb59b5bf9
> 5c9a49dc2428ea3fa0e6aaa682@%3Cusers.nifi.apache.org%3E
>
> [2] https://cwiki.apache.org/confluence/x/zI5zAw
>
>
>
>
>
>
>
> On Fri, Mar 23, 2018 at 5:05 AM, Ravi Papisetti (rpapiset) <
> rpapiset@cisco.com> wrote:
>
> Hi,
>
>
>
> I have re-compiled nifi with mapr dependencies as per instructions at
> http://hariology.com/integrating-mapr-fs-and-apache-nifi/
>
>
>
> Created process flow with ListFile > FetchFile > PutHDFS. As soon as I
> start this process group nifi-bootstrap.log is getting filled with
>
> 2018-03-21 22:56:26,806 ERROR [NiFi logging handler]
> org.apache.nifi.StdErr 2018-03-21 22:56:26,8003 select failed(-1) error
> Invalid argument
>
> 2018-03-21 22:56:26,806 ERROR [NiFi logging handler]
> org.apache.nifi.StdErr 2018-03-21 22:56:26,8003 select failed(-1) error
> Invalid argument
>
>
>
> This log grows into GBs in minutes. I had to stop nifi to stop the
> flooding.
>
>
>
> I found similar issue in petaho forum: https://jira.pentaho.com/
> browse/PDI-16270
>
>
>
> Any one has any thoughts why this error might be coming?
>
>
>
> Appreciate any help.
>
>
>
> Thanks,
>
> Ravi Papisetti
>
>
>
>
>

Re: PutHDFS with mapr

Posted by "Ravi Papisetti (rpapiset)" <rp...@cisco.com>.
Thanks Andre. In the absence of FUSE license, I have recompiled NiFi with mpar dependencies.

I started getting below errors  when I run PurHDFS processor (in a flow) nifi on mac with mapr client libraries setup. However same package (nifi recompiled with mapr libraries) worked fine on linux version.

2018-03-21 22:56:26,806 ERROR [NiFi logging handler] org.apache.nifi.StdErr 2018-03-21 22:56:26,8003 select failed(-1) error Invalid argument

2018-03-21 22:56:26,806 ERROR [NiFi logging handler] org.apache.nifi.StdErr 2018-03-21 22:56:26,8003 select failed(-1) error Invalid argument

For now I am moving forward with testing from linux machine. Not sure what is the specific issue with mac.

I am still trying to work through connecting to maprDb from nifi.
Thanks,
Ravi Papisetti

From: Andre <an...@fucs.org>
Reply-To: "users@nifi.apache.org" <us...@nifi.apache.org>, "andre-lists@fucs.org" <an...@fucs.org>
Date: Sunday, 25 March 2018 at 3:56 PM
To: "users@nifi.apache.org" <us...@nifi.apache.org>
Subject: Re: PutHDFS with mapr

Joey,

Yes. The client must be installed and setup (this is a requirement for the compiled NiFi as well).

Without the client installed and configured the MapR libraries (java and native) would be lost in to what ZK connect in order to get information about the CLDB (their alternative to namenode).

Cheers

On Mon, Mar 26, 2018 at 1:20 AM, Joey Frazee <jo...@icloud.com>> wrote:
I'm kinda going on memory here because I lost some notes I had about doing this, but I think the compile against the mapr libs presumes you have also have the C-based mapr client libs on your machine at compile time and run time. I skimmed that blog post, albeit very quickly, and didn't see that explicitly mentioned in there.

Using the additional jars in PutHDFS would presumably require them too.

Andre, that's correct isn't it?

On Mar 24, 2018, 8:26 AM -0500, Mark Payne <ma...@hotmail.com>>, wrote:

Andre,

I knew this was possible but had no idea how. Thanks for the great explanation and associates caveats!

-Mark

On Mar 24, 2018, at 1:04 AM, Andre <an...@fucs.org>> wrote:
Ravi,

There are two ways of solving this.

One of them (suggested to me MapR representatives) is to deploy MapR's FUSE client to your NiFi nodes, use the PutFile processor instead of PutHDFS and let the MapR client pump coordinate the API engagement with MapR-FS. This is a very clean and robust approach, however it may have licensing implications as the FUSE client is licensed. (per node if I recall correctly).

The other one is to use the out of box PutHDFS processor with a bit of configurations (it works on both Secure and Insecure clusters).

Try this out

Instead of recompiling PutHDFS simply point it to the mapr-client jars and use a core-site.xml with the following content:

<configuration>
<property>
<name>fs.defaultFS</name>
<value>maprfs:///</value>
</property>
</configuration>

Please note maprclients don't play ball with kerberos nicely and you will be required to use a MapR ticket to access the system. This can be easily done by:

sudo -u <whatever_user_nifi_uses> "kinit -kt /path/to/your/keytab && maprlogin kerberos"

Cheers

[1] https://lists.apache.org/thread.html/af9244266e89990618152bb59b5bf95c9a49dc2428ea3fa0e6aaa682@%3Cusers.nifi.apache.org%3E
[2] https://cwiki.apache.org/confluence/x/zI5zAw



On Fri, Mar 23, 2018 at 5:05 AM, Ravi Papisetti (rpapiset) <rp...@cisco.com>> wrote:
Hi,

I have re-compiled nifi with mapr dependencies as per instructions at http://hariology.com/integrating-mapr-fs-and-apache-nifi/

Created process flow with ListFile > FetchFile > PutHDFS. As soon as I start this process group nifi-bootstrap.log is getting filled with

2018-03-21 22:56:26,806 ERROR [NiFi logging handler] org.apache.nifi.StdErr 2018-03-21 22:56:26,8003 select failed(-1) error Invalid argument

2018-03-21 22:56:26,806 ERROR [NiFi logging handler] org.apache.nifi.StdErr 2018-03-21 22:56:26,8003 select failed(-1) error Invalid argument

This log grows into GBs in minutes. I had to stop nifi to stop the flooding.

I found similar issue in petaho forum: https://jira.pentaho.com/browse/PDI-16270

Any one has any thoughts why this error might be coming?

Appreciate any help.

Thanks,
Ravi Papisetti



Re: PutHDFS with mapr

Posted by Bryan Bende <bb...@gmail.com>.
If you want to see the classpath of a processor that has an
"Additional Resources" property, you should be able to turn on TRACE
logging for ExtensionManager in logback.xml:

<logger name="org.apache.nifi.nar.ExtensionManager" level="TRACE" />

Wait 30 seconds, then modify the value of the "Additional Resources"
property to force it to print. It will only print when the value
changes, so you can just remove the value, save, then paste it back in
and save again, then see if all the JARs you expected are there.

Usually the error "No FileSystem for scheme: maprfs" means the JAR
where the MapR implementation of Hadoop's FileSystem class is not on
the classpath. You would probably want to figure out exactly which JAR
their FileSystem impl comes from to determine if it is on the
classpath.

On Wed, Mar 28, 2018 at 4:27 PM, Ravi Papisetti (rpapiset)
<rp...@cisco.com> wrote:
> Yes, I can do Hadoop fs -ls to mapr distribution from nifi node. Nifi node
> has all necessary mapr client node setup.
>
>
>
> All the jar files those are part of re-compiled Hadoop-bundles-nar are part
> of class path provided. Nifi processor doesn’t seems to be resolving classes
> from overwritten class path.
>
>
>
> Thanks,
>
> Ravi Papisetti
>
>
>
> From: Andre <an...@fucs.org>
> Reply-To: "andre-lists@fucs.org" <an...@fucs.org>
> Date: Tuesday, 27 March 2018 at 8:27 PM
> To: Cisco Employee <rp...@cisco.com>
> Cc: "users@nifi.apache.org" <us...@nifi.apache.org>, "andre-lists@fucs.org"
> <an...@fucs.org>
>
>
> Subject: Re: PutHDFS with mapr
>
>
>
> Ravi,
>
>
> I assume the MapR client package is working and operational and you can
> login to the uid running NiFi and issues the following successfuly:
>
> $ maprlogin authtest
>
> $ maprlogin print
>
>
>
> $ hdfs dfs -ls /
>
>
>
>
>
> So if those fail, fix them before you proceed.
>
>
> If those work, I would point the issue is likely to be caused by the
> additional class path not being complete.
>
> From the documentation:
>
> A comma-separated list of paths to files and/or directories that will be
> added to the classpath. When specifying a directory, all files with in the
> directory will be added to the classpath, but further sub-directories will
> not be included.
>
> I don't have a mapr-client instance handy but my next steps would be
> ensuring the list of directory and subdirectories is complete and if not,
> add individual JAR files.
>
>
>
>
>
> It should work.
>
>
>
> On Tue, Mar 27, 2018 at 1:56 AM, Ravi Papisetti (rpapiset)
> <rp...@cisco.com> wrote:
>
> Hi Andre,
>
>
>
> I have tried with pointing puthdfs to lib class path with:
> /opt/mapr/lib,/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common,/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib
>
>
>
> I have given this value for “Additional Classpath Resources” parameter of
> PutHDFS processor.
>
>
>
> Getting below exception. Please note that, I have tried this in NiFi 1.5
> version.
>
>
>
>
>
> 2018-03-26 14:47:51,305 ERROR [StandardProcessScheduler Thread-6]
> o.a.n.controller.StandardProcessorNode Failed to invoke @OnScheduled method
> due to java.lang.RuntimeException: Failed while executing one of processor's
> OnScheduled task.
>
> java.lang.RuntimeException: Failed while executing one of processor's
> OnScheduled task.
>
>         at
> org.apache.nifi.controller.StandardProcessorNode.invokeTaskAsCancelableFuture(StandardProcessorNode.java:1504)
>
>         at
> org.apache.nifi.controller.StandardProcessorNode.initiateStart(StandardProcessorNode.java:1330)
>
>         at
> org.apache.nifi.controller.StandardProcessorNode.lambda$initiateStart$1(StandardProcessorNode.java:1358)
>
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>
>         at java.lang.Thread.run(Thread.java:748)
>
> Caused by: java.util.concurrent.ExecutionException:
> java.lang.reflect.InvocationTargetException
>
>         at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>
>         at java.util.concurrent.FutureTask.get(FutureTask.java:206)
>
>         at
> org.apache.nifi.controller.StandardProcessorNode.invokeTaskAsCancelableFuture(StandardProcessorNode.java:1487)
>
>         ... 9 common frames omitted
>
> Caused by: java.lang.reflect.InvocationTargetException: null
>
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>
>         at java.lang.reflect.Method.invoke(Method.java:498)
>
>         at
> org.apache.nifi.util.ReflectionUtils.invokeMethodsWithAnnotations(ReflectionUtils.java:137)
>
>         at
> org.apache.nifi.util.ReflectionUtils.invokeMethodsWithAnnotations(ReflectionUtils.java:125)
>
>         at
> org.apache.nifi.util.ReflectionUtils.invokeMethodsWithAnnotations(ReflectionUtils.java:70)
>
>         at
> org.apache.nifi.util.ReflectionUtils.invokeMethodsWithAnnotation(ReflectionUtils.java:47)
>
>         at
> org.apache.nifi.controller.StandardProcessorNode$1.call(StandardProcessorNode.java:1334)
>
>         at
> org.apache.nifi.controller.StandardProcessorNode$1.call(StandardProcessorNode.java:1330)
>
>         ... 6 common frames omitted
>
> Caused by: java.io.IOException: No FileSystem for scheme: maprfs
>
>         at
> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2660)
>
>         at
> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667)
>
>         at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
>
>         at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:172)
>
>         at
> org.apache.nifi.processors.hadoop.AbstractHadoopProcessor$1.run(AbstractHadoopProcessor.java:322)
>
>         at
> org.apache.nifi.processors.hadoop.AbstractHadoopProcessor$1.run(AbstractHadoopProcessor.java:319)
>
>         at java.security.AccessController.doPrivileged(Native Method)
>
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>
>         at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
>
>         at
> org.apache.nifi.processors.hadoop.AbstractHadoopProcessor.getFileSystemAsUser(AbstractHadoopProcessor.java:319)
>
>         at
> org.apache.nifi.processors.hadoop.AbstractHadoopProcessor.resetHDFSResources(AbstractHadoopProcessor.java:281)
>
>         at
> org.apache.nifi.processors.hadoop.AbstractHadoopProcessor.abstractOnScheduled(AbstractHadoopProcessor.java:205)
>
>
>
> Thanks,
>
> Ravi Papisetti
>
>
>
>
>
> From: Andre <an...@fucs.org>
> Reply-To: "users@nifi.apache.org" <us...@nifi.apache.org>,
> "andre-lists@fucs.org" <an...@fucs.org>
> Date: Sunday, 25 March 2018 at 3:56 PM
> To: "users@nifi.apache.org" <us...@nifi.apache.org>
> Subject: Re: PutHDFS with mapr
>
>
>
> Joey,
>
>
>
> Yes. The client must be installed and setup (this is a requirement for the
> compiled NiFi as well).
>
>
>
> Without the client installed and configured the MapR libraries (java and
> native) would be lost in to what ZK connect in order to get information
> about the CLDB (their alternative to namenode).
>
>
>
> Cheers
>
>
>
> On Mon, Mar 26, 2018 at 1:20 AM, Joey Frazee <jo...@icloud.com> wrote:
>
> I'm kinda going on memory here because I lost some notes I had about doing
> this, but I think the compile against the mapr libs presumes you have also
> have the C-based mapr client libs on your machine at compile time and run
> time. I skimmed that blog post, albeit very quickly, and didn't see that
> explicitly mentioned in there.
>
> Using the additional jars in PutHDFS would presumably require them too.
>
> Andre, that's correct isn't it?
>
>
> On Mar 24, 2018, 8:26 AM -0500, Mark Payne <ma...@hotmail.com>, wrote:
>
> Andre,
>
>
>
> I knew this was possible but had no idea how. Thanks for the great
> explanation and associates caveats!
>
>
>
> -Mark
>
>
> On Mar 24, 2018, at 1:04 AM, Andre <an...@fucs.org> wrote:
>
> Ravi,
>
>
>
> There are two ways of solving this.
>
>
>
> One of them (suggested to me MapR representatives) is to deploy MapR's FUSE
> client to your NiFi nodes, use the PutFile processor instead of PutHDFS and
> let the MapR client pump coordinate the API engagement with MapR-FS. This is
> a very clean and robust approach, however it may have licensing implications
> as the FUSE client is licensed. (per node if I recall correctly).
>
>
>
> The other one is to use the out of box PutHDFS processor with a bit of
> configurations (it works on both Secure and Insecure clusters).
>
>
>
> Try this out
>
>
>
> Instead of recompiling PutHDFS simply point it to the mapr-client jars and
> use a core-site.xml with the following content:
>
>
>
> <configuration>
>
> <property>
>
> <name>fs.defaultFS</name>
>
> <value>maprfs:///</value>
>
> </property>
>
> </configuration>
>
>
>
> Please note maprclients don't play ball with kerberos nicely and you will be
> required to use a MapR ticket to access the system. This can be easily done
> by:
>
>
>
> sudo -u <whatever_user_nifi_uses> "kinit -kt /path/to/your/keytab &&
> maprlogin kerberos"
>
>
>
> Cheers
>
>
>
> [1]
> https://lists.apache.org/thread.html/af9244266e89990618152bb59b5bf95c9a49dc2428ea3fa0e6aaa682@%3Cusers.nifi.apache.org%3E
>
> [2] https://cwiki.apache.org/confluence/x/zI5zAw
>
>
>
>
>
>
>
> On Fri, Mar 23, 2018 at 5:05 AM, Ravi Papisetti (rpapiset)
> <rp...@cisco.com> wrote:
>
> Hi,
>
>
>
> I have re-compiled nifi with mapr dependencies as per instructions at
> http://hariology.com/integrating-mapr-fs-and-apache-nifi/
>
>
>
> Created process flow with ListFile > FetchFile > PutHDFS. As soon as I start
> this process group nifi-bootstrap.log is getting filled with
>
> 2018-03-21 22:56:26,806 ERROR [NiFi logging handler] org.apache.nifi.StdErr
> 2018-03-21 22:56:26,8003 select failed(-1) error Invalid argument
>
> 2018-03-21 22:56:26,806 ERROR [NiFi logging handler] org.apache.nifi.StdErr
> 2018-03-21 22:56:26,8003 select failed(-1) error Invalid argument
>
>
>
> This log grows into GBs in minutes. I had to stop nifi to stop the flooding.
>
>
>
> I found similar issue in petaho forum:
> https://jira.pentaho.com/browse/PDI-16270
>
>
>
> Any one has any thoughts why this error might be coming?
>
>
>
> Appreciate any help.
>
>
>
> Thanks,
>
> Ravi Papisetti
>
>
>
>
>
>

Re: PutHDFS with mapr

Posted by "Ravi Papisetti (rpapiset)" <rp...@cisco.com>.
Yes, I can do Hadoop fs -ls to mapr distribution from nifi node. Nifi node has all necessary mapr client node setup.

All the jar files those are part of re-compiled Hadoop-bundles-nar are part of class path provided. Nifi processor doesn’t seems to be resolving classes from overwritten class path.

Thanks,
Ravi Papisetti

From: Andre <an...@fucs.org>
Reply-To: "andre-lists@fucs.org" <an...@fucs.org>
Date: Tuesday, 27 March 2018 at 8:27 PM
To: Cisco Employee <rp...@cisco.com>
Cc: "users@nifi.apache.org" <us...@nifi.apache.org>, "andre-lists@fucs.org" <an...@fucs.org>
Subject: Re: PutHDFS with mapr

Ravi,

I assume the MapR client package is working and operational and you can login to the uid running NiFi and issues the following successfuly:

$ maprlogin authtest

$ maprlogin print

$ hdfs dfs -ls /


So if those fail, fix them before you proceed.

If those work, I would point the issue is likely to be caused by the additional class path not being complete.

From the documentation:

A comma-separated list of paths to files and/or directories that will be added to the classpath. When specifying a directory, all files with in the directory will be added to the classpath, but further sub-directories will not be included.

I don't have a mapr-client instance handy but my next steps would be ensuring the list of directory and subdirectories is complete and if not, add individual JAR files.


It should work.

On Tue, Mar 27, 2018 at 1:56 AM, Ravi Papisetti (rpapiset) <rp...@cisco.com>> wrote:
Hi Andre,


I have tried with pointing puthdfs to lib class path with: /opt/mapr/lib,/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common,/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib


I have given this value for “Additional Classpath Resources” parameter of PutHDFS processor.

Getting below exception. Please note that, I have tried this in NiFi 1.5 version.





2018-03-26 14:47:51,305 ERROR [StandardProcessScheduler Thread-6] o.a.n.controller.StandardProcessorNode Failed to invoke @OnScheduled method due to java.lang.RuntimeException: Failed while executing one of processor's OnScheduled task.

java.lang.RuntimeException: Failed while executing one of processor's OnScheduled task.

        at org.apache.nifi.controller.StandardProcessorNode.invokeTaskAsCancelableFuture(StandardProcessorNode.java:1504)

        at org.apache.nifi.controller.StandardProcessorNode.initiateStart(StandardProcessorNode.java:1330)

        at org.apache.nifi.controller.StandardProcessorNode.lambda$initiateStart$1(StandardProcessorNode.java:1358)

        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

        at java.util.concurrent.FutureTask.run(FutureTask.java:266)

        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)

        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)

Caused by: java.util.concurrent.ExecutionException: java.lang.reflect.InvocationTargetException

        at java.util.concurrent.FutureTask.report(FutureTask.java:122)

        at java.util.concurrent.FutureTask.get(FutureTask.java:206)

        at org.apache.nifi.controller.StandardProcessorNode.invokeTaskAsCancelableFuture(StandardProcessorNode.java:1487)

        ... 9 common frames omitted

Caused by: java.lang.reflect.InvocationTargetException: null

        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

        at java.lang.reflect.Method.invoke(Method.java:498)

        at org.apache.nifi.util.ReflectionUtils.invokeMethodsWithAnnotations(ReflectionUtils.java:137)

        at org.apache.nifi.util.ReflectionUtils.invokeMethodsWithAnnotations(ReflectionUtils.java:125)

        at org.apache.nifi.util.ReflectionUtils.invokeMethodsWithAnnotations(ReflectionUtils.java:70)

        at org.apache.nifi.util.ReflectionUtils.invokeMethodsWithAnnotation(ReflectionUtils.java:47)

        at org.apache.nifi.controller.StandardProcessorNode$1.call(StandardProcessorNode.java:1334)

        at org.apache.nifi.controller.StandardProcessorNode$1.call(StandardProcessorNode.java:1330)

        ... 6 common frames omitted

Caused by: java.io.IOException: No FileSystem for scheme: maprfs

        at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2660)

        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667)

        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)

        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:172)

        at org.apache.nifi.processors.hadoop.AbstractHadoopProcessor$1.run(AbstractHadoopProcessor.java:322)

        at org.apache.nifi.processors.hadoop.AbstractHadoopProcessor$1.run(AbstractHadoopProcessor.java:319)

        at java.security.AccessController.doPrivileged(Native Method)

        at javax.security.auth.Subject.doAs(Subject.java:422)

        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)

        at org.apache.nifi.processors.hadoop.AbstractHadoopProcessor.getFileSystemAsUser(AbstractHadoopProcessor.java:319)

        at org.apache.nifi.processors.hadoop.AbstractHadoopProcessor.resetHDFSResources(AbstractHadoopProcessor.java:281)

        at org.apache.nifi.processors.hadoop.AbstractHadoopProcessor.abstractOnScheduled(AbstractHadoopProcessor.java:205)



Thanks,

Ravi Papisetti


From: Andre <an...@fucs.org>>
Reply-To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>, "andre-lists@fucs.org<ma...@fucs.org>" <an...@fucs.org>>
Date: Sunday, 25 March 2018 at 3:56 PM
To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
Subject: Re: PutHDFS with mapr

Joey,

Yes. The client must be installed and setup (this is a requirement for the compiled NiFi as well).

Without the client installed and configured the MapR libraries (java and native) would be lost in to what ZK connect in order to get information about the CLDB (their alternative to namenode).

Cheers

On Mon, Mar 26, 2018 at 1:20 AM, Joey Frazee <jo...@icloud.com>> wrote:
I'm kinda going on memory here because I lost some notes I had about doing this, but I think the compile against the mapr libs presumes you have also have the C-based mapr client libs on your machine at compile time and run time. I skimmed that blog post, albeit very quickly, and didn't see that explicitly mentioned in there.

Using the additional jars in PutHDFS would presumably require them too.

Andre, that's correct isn't it?

On Mar 24, 2018, 8:26 AM -0500, Mark Payne <ma...@hotmail.com>>, wrote:
Andre,

I knew this was possible but had no idea how. Thanks for the great explanation and associates caveats!

-Mark

On Mar 24, 2018, at 1:04 AM, Andre <an...@fucs.org>> wrote:
Ravi,

There are two ways of solving this.

One of them (suggested to me MapR representatives) is to deploy MapR's FUSE client to your NiFi nodes, use the PutFile processor instead of PutHDFS and let the MapR client pump coordinate the API engagement with MapR-FS. This is a very clean and robust approach, however it may have licensing implications as the FUSE client is licensed. (per node if I recall correctly).

The other one is to use the out of box PutHDFS processor with a bit of configurations (it works on both Secure and Insecure clusters).

Try this out

Instead of recompiling PutHDFS simply point it to the mapr-client jars and use a core-site.xml with the following content:

<configuration>
<property>
<name>fs.defaultFS</name>
<value>maprfs:///</value>
</property>
</configuration>

Please note maprclients don't play ball with kerberos nicely and you will be required to use a MapR ticket to access the system. This can be easily done by:

sudo -u <whatever_user_nifi_uses> "kinit -kt /path/to/your/keytab && maprlogin kerberos"

Cheers

[1] https://lists.apache.org/thread.html/af9244266e89990618152bb59b5bf95c9a49dc2428ea3fa0e6aaa682@%3Cusers.nifi.apache.org%3E
[2] https://cwiki.apache.org/confluence/x/zI5zAw



On Fri, Mar 23, 2018 at 5:05 AM, Ravi Papisetti (rpapiset) <rp...@cisco.com>> wrote:
Hi,

I have re-compiled nifi with mapr dependencies as per instructions at http://hariology.com/integrating-mapr-fs-and-apache-nifi/

Created process flow with ListFile > FetchFile > PutHDFS. As soon as I start this process group nifi-bootstrap.log is getting filled with

2018-03-21 22:56:26,806 ERROR [NiFi logging handler] org.apache.nifi.StdErr 2018-03-21 22:56:26,8003 select failed(-1) error Invalid argument

2018-03-21 22:56:26,806 ERROR [NiFi logging handler] org.apache.nifi.StdErr 2018-03-21 22:56:26,8003 select failed(-1) error Invalid argument

This log grows into GBs in minutes. I had to stop nifi to stop the flooding.

I found similar issue in petaho forum: https://jira.pentaho.com/browse/PDI-16270

Any one has any thoughts why this error might be coming?

Appreciate any help.

Thanks,
Ravi Papisetti




Re: PutHDFS with mapr

Posted by Andre <an...@fucs.org>.
Ravi,

I assume the MapR client package is working and operational and you can
login to the uid running NiFi and issues the following successfuly:

$ maprlogin authtest

$ maprlogin print

$ hdfs dfs -ls /


So if those fail, fix them before you proceed.

If those work, I would point the issue is likely to be caused by the
additional class path not being complete.

From the documentation:

A comma-separated list of paths to files and/or directories that will be
added to the classpath. When specifying a directory, all files with in the
directory will be added to the classpath, but further sub-directories will
not be included.

I don't have a mapr-client instance handy but my next steps would be
ensuring the list of directory and subdirectories is complete and if not,
add individual JAR files.


It should work.


On Tue, Mar 27, 2018 at 1:56 AM, Ravi Papisetti (rpapiset) <
rpapiset@cisco.com> wrote:

> Hi Andre,
>
>
>
> I have tried with pointing puthdfs to lib class path with:
> /opt/mapr/lib,/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common,/opt/mapr/
> hadoop/hadoop-2.7.0/share/hadoop/common/lib
>
>
>
> I have given this value for “Additional Classpath Resources” parameter of
> PutHDFS processor.
>
>
>
> Getting below exception. Please note that, I have tried this in NiFi 1.5
> version.
>
>
>
>
>
> 2018-03-26 14:47:51,305 ERROR [StandardProcessScheduler Thread-6]
> o.a.n.controller.StandardProcessorNode Failed to invoke @OnScheduled
> method due to java.lang.RuntimeException: Failed while executing one of
> processor's OnScheduled task.
>
> java.lang.RuntimeException: Failed while executing one of processor's
> OnScheduled task.
>
>         at org.apache.nifi.controller.StandardProcessorNode.
> invokeTaskAsCancelableFuture(StandardProcessorNode.java:1504)
>
>         at org.apache.nifi.controller.StandardProcessorNode.initiateStart(
> StandardProcessorNode.java:1330)
>
>         at org.apache.nifi.controller.StandardProcessorNode.lambda$
> initiateStart$1(StandardProcessorNode.java:1358)
>
>         at java.util.concurrent.Executors$RunnableAdapter.
> call(Executors.java:511)
>
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>
>         at java.util.concurrent.ScheduledThreadPoolExecutor$
> ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>
>         at java.util.concurrent.ScheduledThreadPoolExecutor$
> ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1149)
>
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:624)
>
>         at java.lang.Thread.run(Thread.java:748)
>
> Caused by: java.util.concurrent.ExecutionException: java.lang.reflect.
> InvocationTargetException
>
>         at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>
>         at java.util.concurrent.FutureTask.get(FutureTask.java:206)
>
>         at org.apache.nifi.controller.StandardProcessorNode.
> invokeTaskAsCancelableFuture(StandardProcessorNode.java:1487)
>
>         ... 9 common frames omitted
>
> Caused by: java.lang.reflect.InvocationTargetException: null
>
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
>         at sun.reflect.NativeMethodAccessorImpl.invoke(
> NativeMethodAccessorImpl.java:62)
>
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
>
>         at java.lang.reflect.Method.invoke(Method.java:498)
>
>         at org.apache.nifi.util.ReflectionUtils.
> invokeMethodsWithAnnotations(ReflectionUtils.java:137)
>
>         at org.apache.nifi.util.ReflectionUtils.
> invokeMethodsWithAnnotations(ReflectionUtils.java:125)
>
>         at org.apache.nifi.util.ReflectionUtils.
> invokeMethodsWithAnnotations(ReflectionUtils.java:70)
>
>         at org.apache.nifi.util.ReflectionUtils.
> invokeMethodsWithAnnotation(ReflectionUtils.java:47)
>
>         at org.apache.nifi.controller.StandardProcessorNode$1.call(
> StandardProcessorNode.java:1334)
>
>         at org.apache.nifi.controller.StandardProcessorNode$1.call(
> StandardProcessorNode.java:1330)
>
>         ... 6 common frames omitted
>
> Caused by: java.io.IOException: No FileSystem for scheme: maprfs
>
>         at org.apache.hadoop.fs.FileSystem.getFileSystemClass(
> FileSystem.java:2660)
>
>         at org.apache.hadoop.fs.FileSystem.createFileSystem(
> FileSystem.java:2667)
>
>         at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
>
>         at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:172)
>
>         at org.apache.nifi.processors.hadoop.
> AbstractHadoopProcessor$1.run(AbstractHadoopProcessor.java:322)
>
>         at org.apache.nifi.processors.hadoop.
> AbstractHadoopProcessor$1.run(AbstractHadoopProcessor.java:319)
>
>         at java.security.AccessController.doPrivileged(Native Method)
>
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>
>         at org.apache.hadoop.security.UserGroupInformation.doAs(
> UserGroupInformation.java:1698)
>
>         at org.apache.nifi.processors.hadoop.AbstractHadoopProcessor.
> getFileSystemAsUser(AbstractHadoopProcessor.java:319)
>
>         at org.apache.nifi.processors.hadoop.AbstractHadoopProcessor.
> resetHDFSResources(AbstractHadoopProcessor.java:281)
>
>         at org.apache.nifi.processors.hadoop.AbstractHadoopProcessor.
> abstractOnScheduled(AbstractHadoopProcessor.java:205)
>
>
>
> Thanks,
>
> Ravi Papisetti
>
>
>
>
>
> *From: *Andre <an...@fucs.org>
> *Reply-To: *"users@nifi.apache.org" <us...@nifi.apache.org>, "
> andre-lists@fucs.org" <an...@fucs.org>
> *Date: *Sunday, 25 March 2018 at 3:56 PM
> *To: *"users@nifi.apache.org" <us...@nifi.apache.org>
> *Subject: *Re: PutHDFS with mapr
>
>
>
> Joey,
>
>
>
> Yes. The client must be installed and setup (this is a requirement for the
> compiled NiFi as well).
>
>
>
> Without the client installed and configured the MapR libraries (java and
> native) would be lost in to what ZK connect in order to get information
> about the CLDB (their alternative to namenode).
>
>
>
> Cheers
>
>
>
> On Mon, Mar 26, 2018 at 1:20 AM, Joey Frazee <jo...@icloud.com>
> wrote:
>
> I'm kinda going on memory here because I lost some notes I had about doing
> this, but I think the compile against the mapr libs presumes you have also
> have the C-based mapr client libs on your machine at compile time and run
> time. I skimmed that blog post, albeit very quickly, and didn't see that
> explicitly mentioned in there.
>
> Using the additional jars in PutHDFS would presumably require them too.
>
> Andre, that's correct isn't it?
>
>
> On Mar 24, 2018, 8:26 AM -0500, Mark Payne <ma...@hotmail.com>, wrote:
>
> Andre,
>
>
>
> I knew this was possible but had no idea how. Thanks for the great
> explanation and associates caveats!
>
>
>
> -Mark
>
>
> On Mar 24, 2018, at 1:04 AM, Andre <an...@fucs.org> wrote:
>
> Ravi,
>
>
>
> There are two ways of solving this.
>
>
>
> One of them (suggested to me MapR representatives) is to deploy MapR's
> FUSE client to your NiFi nodes, use the PutFile processor instead of
> PutHDFS and let the MapR client pump coordinate the API engagement with
> MapR-FS. This is a very clean and robust approach, however it may have
> licensing implications as the FUSE client is licensed. (per node if I
> recall correctly).
>
>
>
> The other one is to use the out of box PutHDFS processor with a bit of
> configurations (it works on both Secure and Insecure clusters).
>
>
>
> Try this out
>
>
>
> Instead of recompiling PutHDFS simply point it to the mapr-client jars and
> use a core-site.xml with the following content:
>
>
>
> <configuration>
>
> <property>
>
> <name>fs.defaultFS</name>
>
> <value>maprfs:///</value>
>
> </property>
>
> </configuration>
>
>
>
> Please note maprclients don't play ball with kerberos nicely and you will
> be required to use a MapR ticket to access the system. This can be easily
> done by:
>
>
>
> sudo -u <whatever_user_nifi_uses> "kinit -kt /path/to/your/keytab &&
> maprlogin kerberos"
>
>
>
> Cheers
>
>
>
> [1] https://lists.apache.org/thread.html/af9244266e89990618152bb59b5bf9
> 5c9a49dc2428ea3fa0e6aaa682@%3Cusers.nifi.apache.org%3E
>
> [2] https://cwiki.apache.org/confluence/x/zI5zAw
>
>
>
>
>
>
>
> On Fri, Mar 23, 2018 at 5:05 AM, Ravi Papisetti (rpapiset) <
> rpapiset@cisco.com> wrote:
>
> Hi,
>
>
>
> I have re-compiled nifi with mapr dependencies as per instructions at
> http://hariology.com/integrating-mapr-fs-and-apache-nifi/
>
>
>
> Created process flow with ListFile > FetchFile > PutHDFS. As soon as I
> start this process group nifi-bootstrap.log is getting filled with
>
> 2018-03-21 22:56:26,806 ERROR [NiFi logging handler]
> org.apache.nifi.StdErr 2018-03-21 22:56:26,8003 select failed(-1) error
> Invalid argument
>
> 2018-03-21 22:56:26,806 ERROR [NiFi logging handler]
> org.apache.nifi.StdErr 2018-03-21 22:56:26,8003 select failed(-1) error
> Invalid argument
>
>
>
> This log grows into GBs in minutes. I had to stop nifi to stop the
> flooding.
>
>
>
> I found similar issue in petaho forum: https://jira.pentaho.com/
> browse/PDI-16270
>
>
>
> Any one has any thoughts why this error might be coming?
>
>
>
> Appreciate any help.
>
>
>
> Thanks,
>
> Ravi Papisetti
>
>
>
>
>

Re: PutHDFS with mapr

Posted by "Ravi Papisetti (rpapiset)" <rp...@cisco.com>.
Hi Andre,


I have tried with pointing puthdfs to lib class path with: /opt/mapr/lib,/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common,/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib


I have given this value for “Additional Classpath Resources” parameter of PutHDFS processor.

Getting below exception. Please note that, I have tried this in NiFi 1.5 version.





2018-03-26 14:47:51,305 ERROR [StandardProcessScheduler Thread-6] o.a.n.controller.StandardProcessorNode Failed to invoke @OnScheduled method due to java.lang.RuntimeException: Failed while executing one of processor's OnScheduled task.

java.lang.RuntimeException: Failed while executing one of processor's OnScheduled task.

        at org.apache.nifi.controller.StandardProcessorNode.invokeTaskAsCancelableFuture(StandardProcessorNode.java:1504)

        at org.apache.nifi.controller.StandardProcessorNode.initiateStart(StandardProcessorNode.java:1330)

        at org.apache.nifi.controller.StandardProcessorNode.lambda$initiateStart$1(StandardProcessorNode.java:1358)

        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

        at java.util.concurrent.FutureTask.run(FutureTask.java:266)

        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)

        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)

Caused by: java.util.concurrent.ExecutionException: java.lang.reflect.InvocationTargetException

        at java.util.concurrent.FutureTask.report(FutureTask.java:122)

        at java.util.concurrent.FutureTask.get(FutureTask.java:206)

        at org.apache.nifi.controller.StandardProcessorNode.invokeTaskAsCancelableFuture(StandardProcessorNode.java:1487)

        ... 9 common frames omitted

Caused by: java.lang.reflect.InvocationTargetException: null

        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

        at java.lang.reflect.Method.invoke(Method.java:498)

        at org.apache.nifi.util.ReflectionUtils.invokeMethodsWithAnnotations(ReflectionUtils.java:137)

        at org.apache.nifi.util.ReflectionUtils.invokeMethodsWithAnnotations(ReflectionUtils.java:125)

        at org.apache.nifi.util.ReflectionUtils.invokeMethodsWithAnnotations(ReflectionUtils.java:70)

        at org.apache.nifi.util.ReflectionUtils.invokeMethodsWithAnnotation(ReflectionUtils.java:47)

        at org.apache.nifi.controller.StandardProcessorNode$1.call(StandardProcessorNode.java:1334)

        at org.apache.nifi.controller.StandardProcessorNode$1.call(StandardProcessorNode.java:1330)

        ... 6 common frames omitted

Caused by: java.io.IOException: No FileSystem for scheme: maprfs

        at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2660)

        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667)

        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)

        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:172)

        at org.apache.nifi.processors.hadoop.AbstractHadoopProcessor$1.run(AbstractHadoopProcessor.java:322)

        at org.apache.nifi.processors.hadoop.AbstractHadoopProcessor$1.run(AbstractHadoopProcessor.java:319)

        at java.security.AccessController.doPrivileged(Native Method)

        at javax.security.auth.Subject.doAs(Subject.java:422)

        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)

        at org.apache.nifi.processors.hadoop.AbstractHadoopProcessor.getFileSystemAsUser(AbstractHadoopProcessor.java:319)

        at org.apache.nifi.processors.hadoop.AbstractHadoopProcessor.resetHDFSResources(AbstractHadoopProcessor.java:281)

        at org.apache.nifi.processors.hadoop.AbstractHadoopProcessor.abstractOnScheduled(AbstractHadoopProcessor.java:205)



Thanks,

Ravi Papisetti


From: Andre <an...@fucs.org>
Reply-To: "users@nifi.apache.org" <us...@nifi.apache.org>, "andre-lists@fucs.org" <an...@fucs.org>
Date: Sunday, 25 March 2018 at 3:56 PM
To: "users@nifi.apache.org" <us...@nifi.apache.org>
Subject: Re: PutHDFS with mapr

Joey,

Yes. The client must be installed and setup (this is a requirement for the compiled NiFi as well).

Without the client installed and configured the MapR libraries (java and native) would be lost in to what ZK connect in order to get information about the CLDB (their alternative to namenode).

Cheers

On Mon, Mar 26, 2018 at 1:20 AM, Joey Frazee <jo...@icloud.com>> wrote:
I'm kinda going on memory here because I lost some notes I had about doing this, but I think the compile against the mapr libs presumes you have also have the C-based mapr client libs on your machine at compile time and run time. I skimmed that blog post, albeit very quickly, and didn't see that explicitly mentioned in there.

Using the additional jars in PutHDFS would presumably require them too.

Andre, that's correct isn't it?

On Mar 24, 2018, 8:26 AM -0500, Mark Payne <ma...@hotmail.com>>, wrote:

Andre,

I knew this was possible but had no idea how. Thanks for the great explanation and associates caveats!

-Mark

On Mar 24, 2018, at 1:04 AM, Andre <an...@fucs.org>> wrote:
Ravi,

There are two ways of solving this.

One of them (suggested to me MapR representatives) is to deploy MapR's FUSE client to your NiFi nodes, use the PutFile processor instead of PutHDFS and let the MapR client pump coordinate the API engagement with MapR-FS. This is a very clean and robust approach, however it may have licensing implications as the FUSE client is licensed. (per node if I recall correctly).

The other one is to use the out of box PutHDFS processor with a bit of configurations (it works on both Secure and Insecure clusters).

Try this out

Instead of recompiling PutHDFS simply point it to the mapr-client jars and use a core-site.xml with the following content:

<configuration>
<property>
<name>fs.defaultFS</name>
<value>maprfs:///</value>
</property>
</configuration>

Please note maprclients don't play ball with kerberos nicely and you will be required to use a MapR ticket to access the system. This can be easily done by:

sudo -u <whatever_user_nifi_uses> "kinit -kt /path/to/your/keytab && maprlogin kerberos"

Cheers

[1] https://lists.apache.org/thread.html/af9244266e89990618152bb59b5bf95c9a49dc2428ea3fa0e6aaa682@%3Cusers.nifi.apache.org%3E
[2] https://cwiki.apache.org/confluence/x/zI5zAw



On Fri, Mar 23, 2018 at 5:05 AM, Ravi Papisetti (rpapiset) <rp...@cisco.com>> wrote:
Hi,

I have re-compiled nifi with mapr dependencies as per instructions at http://hariology.com/integrating-mapr-fs-and-apache-nifi/

Created process flow with ListFile > FetchFile > PutHDFS. As soon as I start this process group nifi-bootstrap.log is getting filled with

2018-03-21 22:56:26,806 ERROR [NiFi logging handler] org.apache.nifi.StdErr 2018-03-21 22:56:26,8003 select failed(-1) error Invalid argument

2018-03-21 22:56:26,806 ERROR [NiFi logging handler] org.apache.nifi.StdErr 2018-03-21 22:56:26,8003 select failed(-1) error Invalid argument

This log grows into GBs in minutes. I had to stop nifi to stop the flooding.

I found similar issue in petaho forum: https://jira.pentaho.com/browse/PDI-16270

Any one has any thoughts why this error might be coming?

Appreciate any help.

Thanks,
Ravi Papisetti



Re: PutHDFS with mapr

Posted by Andre <an...@fucs.org>.
Joey,

Yes. The client must be installed and setup (this is a requirement for the
compiled NiFi as well).

Without the client installed and configured the MapR libraries (java and
native) would be lost in to what ZK connect in order to get information
about the CLDB (their alternative to namenode).

Cheers

On Mon, Mar 26, 2018 at 1:20 AM, Joey Frazee <jo...@icloud.com> wrote:

> I'm kinda going on memory here because I lost some notes I had about doing
> this, but I think the compile against the mapr libs presumes you have also
> have the C-based mapr client libs on your machine at compile time and run
> time. I skimmed that blog post, albeit very quickly, and didn't see that
> explicitly mentioned in there.
>
> Using the additional jars in PutHDFS would presumably require them too.
>
> Andre, that's correct isn't it?
>
> On Mar 24, 2018, 8:26 AM -0500, Mark Payne <ma...@hotmail.com>, wrote:
>
> Andre,
>
> I knew this was possible but had no idea how. Thanks for the great
> explanation and associates caveats!
>
> -Mark
>
>
> On Mar 24, 2018, at 1:04 AM, Andre <an...@fucs.org> wrote:
>
> Ravi,
>
> There are two ways of solving this.
>
> One of them (suggested to me MapR representatives) is to deploy MapR's
> FUSE client to your NiFi nodes, use the PutFile processor instead of
> PutHDFS and let the MapR client pump coordinate the API engagement with
> MapR-FS. This is a very clean and robust approach, however it may have
> licensing implications as the FUSE client is licensed. (per node if I
> recall correctly).
>
> The other one is to use the out of box PutHDFS processor with a bit of
> configurations (it works on both Secure and Insecure clusters).
>
> Try this out
>
> Instead of recompiling PutHDFS simply point it to the mapr-client jars and
> use a core-site.xml with the following content:
>
> <configuration>
> <property>
> <name>fs.defaultFS</name>
> <value>maprfs:///</value>
> </property>
> </configuration>
>
> Please note maprclients don't play ball with kerberos nicely and you will
> be required to use a MapR ticket to access the system. This can be easily
> done by:
>
> sudo -u <whatever_user_nifi_uses> "kinit -kt /path/to/your/keytab &&
> maprlogin kerberos"
>
> Cheers
>
> [1] https://lists.apache.org/thread.html/af9244266e89990618152bb59b5bf9
> 5c9a49dc2428ea3fa0e6aaa682@%3Cusers.nifi.apache.org%3E
> [2] https://cwiki.apache.org/confluence/x/zI5zAw
>
>
>
> On Fri, Mar 23, 2018 at 5:05 AM, Ravi Papisetti (rpapiset) <
> rpapiset@cisco.com> wrote:
>
>> Hi,
>>
>>
>>
>> I have re-compiled nifi with mapr dependencies as per instructions at
>> http://hariology.com/integrating-mapr-fs-and-apache-nifi/
>>
>>
>>
>> Created process flow with ListFile > FetchFile > PutHDFS. As soon as I
>> start this process group nifi-bootstrap.log is getting filled with
>>
>> 2018-03-21 22:56:26,806 ERROR [NiFi logging handler]
>> org.apache.nifi.StdErr 2018-03-21 22:56:26,8003 select failed(-1) error
>> Invalid argument
>>
>> 2018-03-21 22:56:26,806 ERROR [NiFi logging handler]
>> org.apache.nifi.StdErr 2018-03-21 22:56:26,8003 select failed(-1) error
>> Invalid argument
>>
>>
>>
>> This log grows into GBs in minutes. I had to stop nifi to stop the
>> flooding.
>>
>>
>>
>> I found similar issue in petaho forum: https://jira.pentaho.com/brows
>> e/PDI-16270
>>
>>
>>
>> Any one has any thoughts why this error might be coming?
>>
>>
>>
>> Appreciate any help.
>>
>>
>>
>> Thanks,
>>
>> Ravi Papisetti
>>
>
>

Re: PutHDFS with mapr

Posted by Joey Frazee <jo...@icloud.com>.
I'm kinda going on memory here because I lost some notes I had about doing this, but I think the compile against the mapr libs presumes you have also have the C-based mapr client libs on your machine at compile time and run time. I skimmed that blog post, albeit very quickly, and didn't see that explicitly mentioned in there.

Using the additional jars in PutHDFS would presumably require them too.

Andre, that's correct isn't it?

On Mar 24, 2018, 8:26 AM -0500, Mark Payne <ma...@hotmail.com>, wrote:
> Andre,
>
> I knew this was possible but had no idea how. Thanks for the great explanation and associates caveats!
>
> -Mark
>
>
> On Mar 24, 2018, at 1:04 AM, Andre <an...@fucs.org> wrote:
>
> > Ravi,
> >
> > There are two ways of solving this.
> >
> > One of them (suggested to me MapR representatives) is to deploy MapR's FUSE client to your NiFi nodes, use the PutFile processor instead of PutHDFS and let the MapR client pump coordinate the API engagement with MapR-FS. This is a very clean and robust approach, however it may have licensing implications as the FUSE client is licensed. (per node if I recall correctly).
> >
> > The other one is to use the out of box PutHDFS processor with a bit of configurations (it works on both Secure and Insecure clusters).
> >
> > Try this out
> >
> > Instead of recompiling PutHDFS simply point it to the mapr-client jars and use a core-site.xml with the following content:
> >
> > <configuration>
> > <property>
> > <name>fs.defaultFS</name>
> > <value>maprfs:///</value>
> > </property>
> > </configuration>
> >
> > Please note maprclients don't play ball with kerberos nicely and you will be required to use a MapR ticket to access the system. This can be easily done by:
> >
> > sudo -u <whatever_user_nifi_uses> "kinit -kt /path/to/your/keytab && maprlogin kerberos"
> >
> > Cheers
> >
> > [1] https://lists.apache.org/thread.html/af9244266e89990618152bb59b5bf95c9a49dc2428ea3fa0e6aaa682@%3Cusers.nifi.apache.org%3E
> > [2] https://cwiki.apache.org/confluence/x/zI5zAw
> >
> >
> >
> > > On Fri, Mar 23, 2018 at 5:05 AM, Ravi Papisetti (rpapiset) <rp...@cisco.com> wrote:
> > > > Hi,
> > > >
> > > > I have re-compiled nifi with mapr dependencies as per instructions at http://hariology.com/integrating-mapr-fs-and-apache-nifi/
> > > >
> > > > Created process flow with ListFile > FetchFile > PutHDFS. As soon as I start this process group nifi-bootstrap.log is getting filled with
> > > > 2018-03-21 22:56:26,806 ERROR [NiFi logging handler] org.apache.nifi.StdErr 2018-03-21 22:56:26,8003 select failed(-1) error Invalid argument
> > > > 2018-03-21 22:56:26,806 ERROR [NiFi logging handler] org.apache.nifi.StdErr 2018-03-21 22:56:26,8003 select failed(-1) error Invalid argument
> > > >
> > > > This log grows into GBs in minutes. I had to stop nifi to stop the flooding.
> > > >
> > > > I found similar issue in petaho forum: https://jira.pentaho.com/browse/PDI-16270
> > > >
> > > > Any one has any thoughts why this error might be coming?
> > > >
> > > > Appreciate any help.
> > > >
> > > > Thanks,
> > > > Ravi Papisetti
> >

Re: PutHDFS with mapr

Posted by Mark Payne <ma...@hotmail.com>.
Andre,

I knew this was possible but had no idea how. Thanks for the great explanation and associates caveats!

-Mark


On Mar 24, 2018, at 1:04 AM, Andre <an...@fucs.org>> wrote:

Ravi,

There are two ways of solving this.

One of them (suggested to me MapR representatives) is to deploy MapR's FUSE client to your NiFi nodes, use the PutFile processor instead of PutHDFS and let the MapR client pump coordinate the API engagement with MapR-FS. This is a very clean and robust approach, however it may have licensing implications as the FUSE client is licensed. (per node if I recall correctly).

The other one is to use the out of box PutHDFS processor with a bit of configurations (it works on both Secure and Insecure clusters).

Try this out

Instead of recompiling PutHDFS simply point it to the mapr-client jars and use a core-site.xml with the following content:

<configuration>
<property>
<name>fs.defaultFS</name>
<value>maprfs:///</value>
</property>
</configuration>

Please note maprclients don't play ball with kerberos nicely and you will be required to use a MapR ticket to access the system. This can be easily done by:

sudo -u <whatever_user_nifi_uses> "kinit -kt /path/to/your/keytab && maprlogin kerberos"

Cheers

[1] https://lists.apache.org/thread.html/af9244266e89990618152bb59b5bf95c9a49dc2428ea3fa0e6aaa682@%3Cusers.nifi.apache.org%3E
[2] https://cwiki.apache.org/confluence/x/zI5zAw



On Fri, Mar 23, 2018 at 5:05 AM, Ravi Papisetti (rpapiset) <rp...@cisco.com>> wrote:
Hi,

I have re-compiled nifi with mapr dependencies as per instructions at http://hariology.com/integrating-mapr-fs-and-apache-nifi/

Created process flow with ListFile > FetchFile > PutHDFS. As soon as I start this process group nifi-bootstrap.log is getting filled with

2018-03-21 22:56:26,806 ERROR [NiFi logging handler] org.apache.nifi.StdErr 2018-03-21 22:56:26,8003 select failed(-1) error Invalid argument

2018-03-21 22:56:26,806 ERROR [NiFi logging handler] org.apache.nifi.StdErr 2018-03-21 22:56:26,8003 select failed(-1) error Invalid argument

This log grows into GBs in minutes. I had to stop nifi to stop the flooding.

I found similar issue in petaho forum: https://jira.pentaho.com/browse/PDI-16270

Any one has any thoughts why this error might be coming?

Appreciate any help.

Thanks,
Ravi Papisetti


Re: PutHDFS with mapr

Posted by Andre <an...@fucs.org>.
Ravi,

There are two ways of solving this.

One of them (suggested to me MapR representatives) is to deploy MapR's FUSE
client to your NiFi nodes, use the PutFile processor instead of PutHDFS and
let the MapR client pump coordinate the API engagement with MapR-FS. This
is a very clean and robust approach, however it may have licensing
implications as the FUSE client is licensed. (per node if I recall
correctly).

The other one is to use the out of box PutHDFS processor with a bit of
configurations (it works on both Secure and Insecure clusters).

Try this out

Instead of recompiling PutHDFS simply point it to the mapr-client jars and
use a core-site.xml with the following content:

<configuration>
<property>
<name>fs.defaultFS</name>
<value>maprfs:///</value>
</property>
</configuration>

Please note maprclients don't play ball with kerberos nicely and you will
be required to use a MapR ticket to access the system. This can be easily
done by:

sudo -u <whatever_user_nifi_uses> "kinit -kt /path/to/your/keytab &&
maprlogin kerberos"

Cheers

[1]
https://lists.apache.org/thread.html/af9244266e89990618152bb59b5bf95c9a49dc2428ea3fa0e6aaa682@%3Cusers.nifi.apache.org%3E
[2] https://cwiki.apache.org/confluence/x/zI5zAw



On Fri, Mar 23, 2018 at 5:05 AM, Ravi Papisetti (rpapiset) <
rpapiset@cisco.com> wrote:

> Hi,
>
>
>
> I have re-compiled nifi with mapr dependencies as per instructions at
> http://hariology.com/integrating-mapr-fs-and-apache-nifi/
>
>
>
> Created process flow with ListFile > FetchFile > PutHDFS. As soon as I
> start this process group nifi-bootstrap.log is getting filled with
>
> 2018-03-21 22:56:26,806 ERROR [NiFi logging handler]
> org.apache.nifi.StdErr 2018-03-21 22:56:26,8003 select failed(-1) error
> Invalid argument
>
> 2018-03-21 22:56:26,806 ERROR [NiFi logging handler]
> org.apache.nifi.StdErr 2018-03-21 22:56:26,8003 select failed(-1) error
> Invalid argument
>
>
>
> This log grows into GBs in minutes. I had to stop nifi to stop the
> flooding.
>
>
>
> I found similar issue in petaho forum: https://jira.pentaho.com/
> browse/PDI-16270
>
>
>
> Any one has any thoughts why this error might be coming?
>
>
>
> Appreciate any help.
>
>
>
> Thanks,
>
> Ravi Papisetti
>