You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nifi.apache.org by shweta Aggarwal <sh...@gmail.com> on 2016/11/25 04:48:31 UTC

Nifi Capability for Fast transfer of Data

Hi folks,

We have a requirement in one of our time critical application wherein we
are looking for transferring upto 40-50 GBs worth images
within few seconds between remote machine and HDFS.

Assuming network connectivity between the two is on a 10Gbe link and NIC
and socket buffers tuned optimally to give best performance , does Nifi
have a capability  to support desired performance using a combination of
"getFile" and "putHDFS" on a high ended cluster of  >8 nodes.

We are also exploring a combination of HDFS+GrdiFTP for fast transfer of
images from remote machine to HDFS cluster.

Any thoughts or pointers shall be helpful.

Thanks!!

Re: Nifi Capability for Fast transfer of Data

Posted by Lee Laim <le...@gmail.com>.
Shweta,

While this may deviate from your initial requirements, NiFi offers the ability to compress, resize, and extract metadata from your images.   You can use NiFi to build a image-processing pipeline for incoming images to prioritize and route ~10% of images data that needs to arrive in 4 seconds.  The rest  of the images will show up shortly after.   Resizing and compression, where applicable, can also  help now you towards your goal. 

Have fun, 
Lee

On Nov 25, 2016, at 6:37 PM, Andy LoPresto <al...@gmail.com> wrote:

> Unless my back of the envelope math is way off, to transfer 50GB (400Gb) per second, you would need 40 parallel 10GbE connections, assuming absolutely no overhead. Your precision for "a few seconds" would need to be 40+ seconds using a single 10 GbE link and optimal transmission speed. 
> 
> From the Apache NiFi Overview document: 
> 
> "for something concrete and broadly applicable, consider the out-of-the-box default implementations. These are all persistent with guaranteed delivery and do so using local disk. So being conservative, assume roughly 50MB per second read/write rate on modest disks or RAID volumes within a typical server. NiFi for a large class of dataflows then should be able to efficiently reach 100MB per second or more of throughput. "
> 
> Those numbers are at least 18 months old, so with a robust cluster of 8 high-performance machines and an optimized flow to balance computation across all the boxes, I would ballpark a perfect world estimate at 1Gbps. My last knowledge of HDFS write speeds was around 10-20Gbps. Again, if your tolerance for the full process is 40-50 seconds, NiFi should be able to keep up, but your uplink will probably be the long pole in the tent here. 
> 
> Feel free to correct any poor assumptions or bad math above. 
> 
> Andy LoPresto
> alopresto@apache.org
> alopresto.apache@gmail.com
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
> 
>> On Nov 24, 2016, at 20:48, shweta Aggarwal <sh...@gmail.com> wrote:
>> 
>> Hi folks,
>> 
>> We have a requirement in one of our time critical application wherein we
>> are looking for transferring upto 40-50 GBs worth images
>> within few seconds between remote machine and HDFS.
>> 
>> Assuming network connectivity between the two is on a 10Gbe link and NIC
>> and socket buffers tuned optimally to give best performance , does Nifi
>> have a capability  to support desired performance using a combination of
>> "getFile" and "putHDFS" on a high ended cluster of  >8 nodes.
>> 
>> We are also exploring a combination of HDFS+GrdiFTP for fast transfer of
>> images from remote machine to HDFS cluster.
>> 
>> Any thoughts or pointers shall be helpful.
>> 
>> Thanks!!

Re: Nifi Capability for Fast transfer of Data

Posted by Lee Laim <le...@gmail.com>.
Shweta,

While this may deviate from your initial requirements, NiFi offers the ability to compress, resize, and extract metadata from your images.   You can use NiFi to build a image-processing pipeline for incoming images to prioritize and route ~10% of images data that needs to arrive in 4 seconds.  The rest  of the images will show up shortly after.   Resizing and compression, where applicable, can also  help now you towards your goal. 

Have fun, 
Lee

On Nov 25, 2016, at 6:37 PM, Andy LoPresto <al...@gmail.com> wrote:

> Unless my back of the envelope math is way off, to transfer 50GB (400Gb) per second, you would need 40 parallel 10GbE connections, assuming absolutely no overhead. Your precision for "a few seconds" would need to be 40+ seconds using a single 10 GbE link and optimal transmission speed. 
> 
> From the Apache NiFi Overview document: 
> 
> "for something concrete and broadly applicable, consider the out-of-the-box default implementations. These are all persistent with guaranteed delivery and do so using local disk. So being conservative, assume roughly 50MB per second read/write rate on modest disks or RAID volumes within a typical server. NiFi for a large class of dataflows then should be able to efficiently reach 100MB per second or more of throughput. "
> 
> Those numbers are at least 18 months old, so with a robust cluster of 8 high-performance machines and an optimized flow to balance computation across all the boxes, I would ballpark a perfect world estimate at 1Gbps. My last knowledge of HDFS write speeds was around 10-20Gbps. Again, if your tolerance for the full process is 40-50 seconds, NiFi should be able to keep up, but your uplink will probably be the long pole in the tent here. 
> 
> Feel free to correct any poor assumptions or bad math above. 
> 
> Andy LoPresto
> alopresto@apache.org
> alopresto.apache@gmail.com
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
> 
>> On Nov 24, 2016, at 20:48, shweta Aggarwal <sh...@gmail.com> wrote:
>> 
>> Hi folks,
>> 
>> We have a requirement in one of our time critical application wherein we
>> are looking for transferring upto 40-50 GBs worth images
>> within few seconds between remote machine and HDFS.
>> 
>> Assuming network connectivity between the two is on a 10Gbe link and NIC
>> and socket buffers tuned optimally to give best performance , does Nifi
>> have a capability  to support desired performance using a combination of
>> "getFile" and "putHDFS" on a high ended cluster of  >8 nodes.
>> 
>> We are also exploring a combination of HDFS+GrdiFTP for fast transfer of
>> images from remote machine to HDFS cluster.
>> 
>> Any thoughts or pointers shall be helpful.
>> 
>> Thanks!!

Re: Nifi Capability for Fast transfer of Data

Posted by Andy LoPresto <al...@gmail.com>.
Unless my back of the envelope math is way off, to transfer 50GB (400Gb) per second, you would need 40 parallel 10GbE connections, assuming absolutely no overhead. Your precision for "a few seconds" would need to be 40+ seconds using a single 10 GbE link and optimal transmission speed. 

From the Apache NiFi Overview document: 

"for something concrete and broadly applicable, consider the out-of-the-box default implementations. These are all persistent with guaranteed delivery and do so using local disk. So being conservative, assume roughly 50MB per second read/write rate on modest disks or RAID volumes within a typical server. NiFi for a large class of dataflows then should be able to efficiently reach 100MB per second or more of throughput. "

Those numbers are at least 18 months old, so with a robust cluster of 8 high-performance machines and an optimized flow to balance computation across all the boxes, I would ballpark a perfect world estimate at 1Gbps. My last knowledge of HDFS write speeds was around 10-20Gbps. Again, if your tolerance for the full process is 40-50 seconds, NiFi should be able to keep up, but your uplink will probably be the long pole in the tent here. 

Feel free to correct any poor assumptions or bad math above. 

Andy LoPresto
alopresto@apache.org
alopresto.apache@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Nov 24, 2016, at 20:48, shweta Aggarwal <sh...@gmail.com> wrote:
> 
> Hi folks,
> 
> We have a requirement in one of our time critical application wherein we
> are looking for transferring upto 40-50 GBs worth images
> within few seconds between remote machine and HDFS.
> 
> Assuming network connectivity between the two is on a 10Gbe link and NIC
> and socket buffers tuned optimally to give best performance , does Nifi
> have a capability  to support desired performance using a combination of
> "getFile" and "putHDFS" on a high ended cluster of  >8 nodes.
> 
> We are also exploring a combination of HDFS+GrdiFTP for fast transfer of
> images from remote machine to HDFS cluster.
> 
> Any thoughts or pointers shall be helpful.
> 
> Thanks!!

Re: Nifi Capability for Fast transfer of Data

Posted by Andy LoPresto <al...@gmail.com>.
Unless my back of the envelope math is way off, to transfer 50GB (400Gb) per second, you would need 40 parallel 10GbE connections, assuming absolutely no overhead. Your precision for "a few seconds" would need to be 40+ seconds using a single 10 GbE link and optimal transmission speed. 

From the Apache NiFi Overview document: 

"for something concrete and broadly applicable, consider the out-of-the-box default implementations. These are all persistent with guaranteed delivery and do so using local disk. So being conservative, assume roughly 50MB per second read/write rate on modest disks or RAID volumes within a typical server. NiFi for a large class of dataflows then should be able to efficiently reach 100MB per second or more of throughput. "

Those numbers are at least 18 months old, so with a robust cluster of 8 high-performance machines and an optimized flow to balance computation across all the boxes, I would ballpark a perfect world estimate at 1Gbps. My last knowledge of HDFS write speeds was around 10-20Gbps. Again, if your tolerance for the full process is 40-50 seconds, NiFi should be able to keep up, but your uplink will probably be the long pole in the tent here. 

Feel free to correct any poor assumptions or bad math above. 

Andy LoPresto
alopresto@apache.org
alopresto.apache@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Nov 24, 2016, at 20:48, shweta Aggarwal <sh...@gmail.com> wrote:
> 
> Hi folks,
> 
> We have a requirement in one of our time critical application wherein we
> are looking for transferring upto 40-50 GBs worth images
> within few seconds between remote machine and HDFS.
> 
> Assuming network connectivity between the two is on a 10Gbe link and NIC
> and socket buffers tuned optimally to give best performance , does Nifi
> have a capability  to support desired performance using a combination of
> "getFile" and "putHDFS" on a high ended cluster of  >8 nodes.
> 
> We are also exploring a combination of HDFS+GrdiFTP for fast transfer of
> images from remote machine to HDFS cluster.
> 
> Any thoughts or pointers shall be helpful.
> 
> Thanks!!