You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nifi.apache.org by Mothi86 <mo...@gmail.com> on 2017/06/23 19:37:28 UTC

How to ingest files into HDFS via Apache NiFi from non-hadoop environment

Apache NiFi is installed on non-hadoop environment and targets to ingest
processed files into HDFS (Kerberized cluster - 4 management node and 1 edge
node on public network and 4 worker nodes on private network).

Is it workable solution to achieve above use case as I face multiple error
even after performing below activities. Time being alternative, I have
installed NiFi in edge node and everything works fine but please advise if
there is anything additional I have to perform to make above use case work.

* Firewall restriction between NiFi and management server is open and ports
(22,88,749,389) are open.
* Firewall restriction between NiFi and edge node server is open and ports
(22, 2181,9083) are open
* krb5.conf file from hadoop cluster along with keytab for application user
is copied to NiFi server. Running kinit using application user and keytab -
successful token is listed under klist.
* SSH operation is successful and also SFTP into hadoop server works fine.
* configured hdfs-site.xml and core-site.xml files into NiFi.

<http://apache-nifi-developer-list.39713.n7.nabble.com/file/n16247/NiFi_Configuration.jpg> 
<http://apache-nifi-developer-list.39713.n7.nabble.com/file/n16247/putHDFS_loginError.jpg> 





--
View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/How-to-ingest-files-into-HDFS-via-Apache-NiFi-from-non-hadoop-environment-tp16247.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.

RE: How to ingest files into HDFS via Apache NiFi from non-hadoop environment

Posted by Takanobu Asanuma <ta...@yahoo-corp.jp>.
Hello Mothi86,

I think you can achieve it by using HttpFS on HDFS side. It is a part of hadoop library and a proxy server for HDFS.
https://hadoop.apache.org/docs/stable/hadoop-hdfs-httpfs/index.html

In your case, running HttpFS server on the management nodes or the edge node would be good. And set 'webhdfs://{HttpFS hostname}:{port}' to 'fs.defaultFS' in core-site.xml for NiFi's HDFS processors. Then, your NiFi cluster only need to access the HttpFS server and can access HDFS from non-hadoop environment.

Regards,
Takanobu

Re: How to ingest files into HDFS via Apache NiFi from non-hadoop environment

Posted by Adam Taft <ad...@adamtaft.com>.
This is a bit outside of the box, but I have actually implemented this
solution previously.

My scenario was very similar.  NIFI was installed outside of the firewalled
HDFS cluster.  The only external access to the HDFS cluster was through SSH.

Therefore, my solution was to use SSH to call a remote command on the HDFS
node.  This was enabled using the ExecuteStreamCommand processor.  I used
the hadoop fs command line tools, piping in the contents of the flowfile.

The basic command (assuming put) would look something like this:

$>  cat file.ext | hadoop fs -put - /hdfs/path/file.ext

This would read from standard input and store the stream into file.ext.
Next you add the SSH execution to call the above.

$>  cat file.ext | ssh user@remote 'hadoop fs -put - /hdfs/path/file.ext'

Now we can try to put the above into the ExecuteStreamCommand processor.
We will extract the filename from the flowfile attribute.  I like using
bash to execute my script:

ExecuteStreamCommand
Command Path:  /bin/bash
Command Arguments: -c; "ssh user@remote 'hadoop fs -put -
/hdfs/path/${filename}'"    * unsure of the quotes here

Not sure if the above helps, since it sounds like you're going for
something more than 'get' and 'put'.  But the above is an easy mechanism to
interact with an HDFS cluster if the NIFI node is not running on the
cluster.



On Fri, Jun 23, 2017 at 2:53 PM, Mothi86 <mo...@gmail.com> wrote:

> Okay thanks so that clarifies that NiFi will not work in terms of
> integrating
> from local machine / non-hadoop environment to hadoop environment. It
> either
> has to be in edge node or built up a node similar restriction of edge or
> management node.
>
> Is this HDF recommended solution ?
>
> Will spinning a VM work ? Can you suggest me VM requirements for Apache
> NiFi
> ?
>
>
>
>
>
> --
> View this message in context: http://apache-nifi-developer-
> list.39713.n7.nabble.com/How-to-ingest-files-into-HDFS-via-
> Apache-NiFi-from-non-hadoop-environment-tp16247p16252.html
> Sent from the Apache NiFi Developer List mailing list archive at
> Nabble.com.
>

Re: How to ingest files into HDFS via Apache NiFi from non-hadoop environment

Posted by Mothi86 <mo...@gmail.com>.
Okay thanks so that clarifies that NiFi will not work in terms of integrating
from local machine / non-hadoop environment to hadoop environment. It either
has to be in edge node or built up a node similar restriction of edge or
management node.

Is this HDF recommended solution ? 

Will spinning a VM work ? Can you suggest me VM requirements for Apache NiFi
? 





--
View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/How-to-ingest-files-into-HDFS-via-Apache-NiFi-from-non-hadoop-environment-tp16247p16252.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.

Re: How to ingest files into HDFS via Apache NiFi from non-hadoop environment

Posted by Bryan Bende <bb...@gmail.com>.
Yes, I think running NiFi on edge nodes would make sense, this way
they can access the public network to receive data, but also access
HDFS on the private network.


On Fri, Jun 23, 2017 at 4:24 PM, Mothi86 <mo...@gmail.com> wrote:
> Hi Bryan,
>
> Greetings and appreciate your instant reply. Data nodes are in private
> network inside the hadoop cluster and NiFi is away from hadoop cluster on a
> seperate non-hadoop server. If we need NiFi to have access to data node,
> does that mean we need to have NiFi within the cluster ? something like edge
> node or management node which has access to public network for twitter
> access or so and also private network of data nodes.
>
>
>
> --
> View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/How-to-ingest-files-into-HDFS-via-Apache-NiFi-from-non-hadoop-environment-tp16247p16249.html
> Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.

Re: How to ingest files into HDFS via Apache NiFi from non-hadoop environment

Posted by Mothi86 <mo...@gmail.com>.
Hi Bryan,

Greetings and appreciate your instant reply. Data nodes are in private
network inside the hadoop cluster and NiFi is away from hadoop cluster on a
seperate non-hadoop server. If we need NiFi to have access to data node,
does that mean we need to have NiFi within the cluster ? something like edge
node or management node which has access to public network for twitter
access or so and also private network of data nodes.



--
View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/How-to-ingest-files-into-HDFS-via-Apache-NiFi-from-non-hadoop-environment-tp16247p16249.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.

Re: How to ingest files into HDFS via Apache NiFi from non-hadoop environment

Posted by Bryan Bende <bb...@gmail.com>.
Hello,

Every node where NiFi is running must be able to connect to the data
node process on every node where HDFS is running. I believe the
default port for the HDFS data node process is usually 50010.

I'm assuming your 4 worker nodes are running HDFS, so NiFi would have
to access those.

-Bryan


On Fri, Jun 23, 2017 at 3:37 PM, Mothi86 <mo...@gmail.com> wrote:
> Apache NiFi is installed on non-hadoop environment and targets to ingest
> processed files into HDFS (Kerberized cluster - 4 management node and 1 edge
> node on public network and 4 worker nodes on private network).
>
> Is it workable solution to achieve above use case as I face multiple error
> even after performing below activities. Time being alternative, I have
> installed NiFi in edge node and everything works fine but please advise if
> there is anything additional I have to perform to make above use case work.
>
> * Firewall restriction between NiFi and management server is open and ports
> (22,88,749,389) are open.
> * Firewall restriction between NiFi and edge node server is open and ports
> (22, 2181,9083) are open
> * krb5.conf file from hadoop cluster along with keytab for application user
> is copied to NiFi server. Running kinit using application user and keytab -
> successful token is listed under klist.
> * SSH operation is successful and also SFTP into hadoop server works fine.
> * configured hdfs-site.xml and core-site.xml files into NiFi.
>
> <http://apache-nifi-developer-list.39713.n7.nabble.com/file/n16247/NiFi_Configuration.jpg>
> <http://apache-nifi-developer-list.39713.n7.nabble.com/file/n16247/putHDFS_loginError.jpg>
>
>
>
>
>
> --
> View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/How-to-ingest-files-into-HDFS-via-Apache-NiFi-from-non-hadoop-environment-tp16247.html
> Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.