You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@knox.apache.org by Mohammad Islam <mi...@yahoo.com> on 2016/11/05 21:52:51 UTC

WebHDFS Performance : with and without Knox

Hi,I did a very basic comparison of download speed. I used similar "curl .."  command to download a large file (13.6 GB) and gathered the numbers. 
Looks like WebHDFS with Knox is very slow ( at least 20x slower). I ran it twice with similar numbers. For Knox, I turned off  SSL and both cases I used unsecured (non-Kerberos)  cluster. 
Download with Knox took nearly 49 minutes whereas direct download took 2 mins. The download speed was 4811k for Knox and  99.6M for direct download.
I'm sure I have done something wrong. Do you see any such performance? Any help will be really appreciated.
Regards,Mohammad





Interactions:curl -o t2.direct -L http://<WEBHDFS_HOST>:50070/webhdfs/v1/<FILE_PATH>?op=OPEN  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current                                 Dload  Upload   Total   Spent    Left  Speed  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0100 13.5G  100 13.5G    0     0  99.6M      0  0:02:19  0:02:19 --:--:--  117M



curl -H X-Auth-Params-Email: mislam@uber.com -o t2 -L http://<KNOW_HOST>:8445/gateway/sandbox/webhdfs/v1/<FILE_PATH>?op=OPEN  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current                                 Dload  Upload   Total   Spent    Left  Speed  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0  0     0    0 13.5G    0     0  4811k      0 --:--:--  0:49:12 --:--:-- 6121k    

Re: WebHDFS Performance : with and without Knox

Posted by larry mccay <lm...@apache.org>.
That's great to hear, Mohammad!
I'd love to hear some of the popular usecases in use.

Also, we are going to more formally define the programming model for the
KnoxShell capabilities.
Perhaps there would be something interesting there from your perspective.

https://cwiki.apache.org/confluence/display/KNOX/KIP-4+KnoxShell+Improvements



On Thu, Nov 10, 2016 at 4:21 PM, Mohammad Islam <mi...@yahoo.com> wrote:

> Thanks Larry. As soon as I cross my current hurdles,  I plan to write a
> blog post.
> At Uber, we are replying on Knox a lot.
> So far good experience.
>
>
>
> On Wednesday, November 9, 2016 5:43 PM, larry mccay <lm...@apache.org>
> wrote:
>
>
> Hi Mohammad -
>
> That is wonderful to see!
> Not sure I understand why they are so close but I'm not arguing with
> success.
> "Performance Tip #1: turn off DEBUG logging" :)
>
> These numbers may be useful within a blog post or something too.
>
> Thank you for sharing with the community!
>
> thanks,
>
> --larry
>
>
> On Wed, Nov 9, 2016 at 5:26 PM, Mohammad Islam <mi...@yahoo.com> wrote:
>
> Updates:
>
> *Root cause : The log level  was DEBUG. As soon as I moved it to INFO for
> all, the performance got very comparable.
>
> Data : I ran those downloads for 7 times and then averaged. Looks like
> they are very close to each other.
> I tried it from a 3rd machine NOT the machine where Knox was running.
> Overall: download speed for direct WebHDFS was nearly384M. With Knox proxy,
> download speed was 383M.
>
>
> Data Size ~14 GB
> Iteration Approach Time (sec) Downland speed (MBS)
> 1 Direct 42 325 Knox 44 310
> 2 Direct 31 444 Knox 29 467
> 3 Direct 44 314 Knox 51 270
> 4 Direct 38 359 Knox 36 382
> 5 Direct 73 188 Knox 39 350
> 6 Direct 25 536 Knox 28 489
> 7 Direct 26 523 Knox 33 410
> 39.85714286 384.1428571 37.14285714 382.5714286
>
>
> On Saturday, November 5, 2016 7:14 PM, Mohammad Islam <mi...@yahoo.com>
> wrote:
>
>
> Thanks Larry for sharing your findings.
> Number looks much better than mine. I tried with 0.9.1. Should i upgrade
> to 0,10,
>
> If possible can you please share your exact command with various options.
> Did you try with SSL  on? My two hosts were different and i tried it from
> KNOX_HOST box.
>
> Any other idea of how can i get better number?
>
> Regards,
> Mohammad
>
>
>
>
>
> On Saturday, November 5, 2016 6:55 PM, larry mccay <lm...@apache.org>
> wrote:
>
>
> Hi Mohammad -
>
> I have played around with this a bit and haven't been able to reproduce
> your results.
>
> My environment is a sandbox VM download and the Apache Knox 0.10.0 test
> instance running on the host machine.
> I put an ~8.5 GB file in hdfs and OPENed it with and without Knox.
>
> With Knox:
> 100 8470M    0 8470M    0     0   9.9M      0 --:--:--  0:14:09 --:--:--
>  9.9M
>
> Direct to WebHDFS:
> 100 8470M    0 8470M    0     0  13.6M      0 --:--:--  0:10:20 --:--:--
> 14.9M
>
> While we are certainly not speeding things up it isn't too bad.
> I believe that there is still room for some optimization in our rewrite
> process as has been discussed a bit on [1].
>
> This would get the numbers even closer together probably.
> However, even that won't make up the difference that you are seeing.
>
> I wonder what your test environment looks like where you are getting 99.6M
> avg speed direct and 4.8M from Knox.
> If the KNOX_HOST and WEBHDFS_HOST are different machines maybe you should
> try the direct curl command from the KNOX_HOST and see if there is a
> difference being introduced by the network or something like that.
>
> thanks,
>
> --larry
>
> [1] https://issues.apache.org/ jira/browse/KNOX-767
> <https://issues.apache.org/jira/browse/KNOX-767>
>
>
>
> On Sat, Nov 5, 2016 at 6:41 PM, larry mccay <lm...@apache.org> wrote:
>
> Hi Mohammad -
>
> Thanks for reporting this.
>
> That is a big difference.
> Let me play around with it and see what I can reproduce.
>
> thanks,
>
> --larry
>
> On Sat, Nov 5, 2016 at 5:52 PM, Mohammad Islam <mi...@yahoo.com> wrote:
>
> Hi,
> I did a very basic comparison of download speed. I used similar "curl .."
>  command to download a large file (13.6 GB) and gathered the numbers.
>
> Looks like WebHDFS with Knox is very slow ( at least 20x slower). I ran it
> twice with similar numbers. For Knox, I turned off  SSL and both cases I
> used unsecured (non-Kerberos)  cluster.
>
> Download with Knox took nearly 49 minutes whereas direct download took 2
> mins. The download speed was *4811k* for Knox and  *99.6M* for direct
> download.
>
> I'm sure I have done something wrong. Do you see any such performance? Any
> help will be really appreciated.
>
> Regards,
> Mohammad
>
>
>
>
>
>
> Interactions:
> curl -o t2.direct -L http://<WEBHDFS_HOST>:50070/we
> bhdfs/v1/<FILE_PATH>?op=OPEN
>   % Total    % Received % Xferd  Average Speed   Time    Time     Time
> Current
>                                  Dload  Upload   Total   Spent    Left
> Speed
>   0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--
>   0
> 100 13.5G  100 13.5G    0     0  *99.6M*      0  0:02:19  *0:02:19*
> --:--:--  117M
>
>
>
>
> curl -H X-Auth-Params-Email: mislam@uber.com -o t2 -L http://
> <http://hadoopdevgw01-/><KNOW_HOST>:8445/gatewa
> y/sandbox/webhdfs/v1/<FILE_ PATH>?op=OPEN
>   % Total    % Received % Xferd  Average Speed   Time    Time     Time
> Current
>                                  Dload  Upload   Total   Spent    Left
> Speed
>   0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--
>   0
>   0     0    0 13.5G    0     0  *4811k*      0 --:--:--  *0:49:12*
> --:--:-- 6121k
>
>
>
>
>
>
>
>
>
>
>
>

Re: WebHDFS Performance : with and without Knox

Posted by Mohammad Islam <mi...@yahoo.com>.
Thanks Larry. As soon as I cross my current hurdles,  I plan to write a blog post.At Uber, we are replying on Knox a lot.So far good experience.
 

    On Wednesday, November 9, 2016 5:43 PM, larry mccay <lm...@apache.org> wrote:
 

 Hi Mohammad - 
That is wonderful to see!Not sure I understand why they are so close but I'm not arguing with success."Performance Tip #1: turn off DEBUG logging" :)
These numbers may be useful within a blog post or something too.
Thank you for sharing with the community!
thanks,
--larry

On Wed, Nov 9, 2016 at 5:26 PM, Mohammad Islam <mi...@yahoo.com> wrote:

Updates:
*Root cause : The log level  was DEBUG. As soon as I moved it to INFO for all, the performance got very comparable.
Data : I ran those downloads for 7 times and then averaged. Looks like they are very close to each other. I tried it from a 3rd machine NOT the machine where Knox was running. Overall: download speed for direct WebHDFS was nearly384M. With Knox proxy, download speed was 383M. 


| Data Size  | ~14 GB |  |  |  |  |  |
|  |  |  |  |  |  |  |
| Iteration | Approach | Time (sec) | Downland speed (MBS) |  |  |  |
| 1 | Direct | 42 | 325 | Knox | 44 | 310 |
| 2 | Direct  | 31 | 444 | Knox | 29 | 467 |
| 3 | Direct | 44 | 314 | Knox | 51 | 270 |
| 4 | Direct | 38 | 359 | Knox | 36 | 382 |
| 5 | Direct | 73 | 188 | Knox | 39 | 350 |
| 6 | Direct | 25 | 536 | Knox | 28 | 489 |
| 7 | Direct | 26 | 523 | Knox | 33 | 410 |
|  |  | 39.85714286 | 384.1428571 |  | 37.14285714 | 382.5714286 |

 

    On Saturday, November 5, 2016 7:14 PM, Mohammad Islam <mi...@yahoo.com> wrote:
 

 Thanks Larry for sharing your findings.Number looks much better than mine. I tried with 0.9.1. Should i upgrade to 0,10,
If possible can you please share your exact command with various options. Did you try with SSL  on? My two hosts were different and i tried it from KNOX_HOST box.
Any other idea of how can i get better number?
Regards,Mohammad


 

    On Saturday, November 5, 2016 6:55 PM, larry mccay <lm...@apache.org> wrote:
 

 Hi Mohammad -
I have played around with this a bit and haven't been able to reproduce your results.
My environment is a sandbox VM download and the Apache Knox 0.10.0 test instance running on the host machine.I put an ~8.5 GB file in hdfs and OPENed it with and without Knox.
With Knox:100 8470M    0 8470M    0     0   9.9M      0 --:--:--  0:14:09 --:--:--  9.9M
Direct to WebHDFS:100 8470M    0 8470M    0     0  13.6M      0 --:--:--  0:10:20 --:--:-- 14.9M
While we are certainly not speeding things up it isn't too bad.I believe that there is still room for some optimization in our rewrite process as has been discussed a bit on [1].
This would get the numbers even closer together probably.However, even that won't make up the difference that you are seeing.
I wonder what your test environment looks like where you are getting 99.6M avg speed direct and 4.8M from Knox.If the KNOX_HOST and WEBHDFS_HOST are different machines maybe you should try the direct curl command from the KNOX_HOST and see if there is a difference being introduced by the network or something like that.
thanks,
--larry
[1] https://issues.apache.org/ jira/browse/KNOX-767


On Sat, Nov 5, 2016 at 6:41 PM, larry mccay <lm...@apache.org> wrote:

Hi Mohammad -
Thanks for reporting this.
That is a big difference.Let me play around with it and see what I can reproduce.

thanks,
--larry
On Sat, Nov 5, 2016 at 5:52 PM, Mohammad Islam <mi...@yahoo.com> wrote:

Hi,I did a very basic comparison of download speed. I used similar "curl .."  command to download a large file (13.6 GB) and gathered the numbers. 
Looks like WebHDFS with Knox is very slow ( at least 20x slower). I ran it twice with similar numbers. For Knox, I turned off  SSL and both cases I used unsecured (non-Kerberos)  cluster. 
Download with Knox took nearly 49 minutes whereas direct download took 2 mins. The download speed was 4811k for Knox and  99.6M for direct download.
I'm sure I have done something wrong. Do you see any such performance? Any help will be really appreciated.
Regards,Mohammad





Interactions:curl -o t2.direct -L http://<WEBHDFS_HOST>:50070/we bhdfs/v1/<FILE_PATH>?op=OPEN  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current                                 Dload  Upload   Total   Spent    Left  Speed  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0100 13.5G  100 13.5G    0     0  99.6M      0  0:02:19  0:02:19 --:--:--  117M



curl -H X-Auth-Params-Email: mislam@uber.com -o t2 -L http://<KNOW_HOST>:8445/gatewa y/sandbox/webhdfs/v1/<FILE_ PATH>?op=OPEN  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current                                 Dload  Upload   Total   Spent    Left  Speed  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0  0     0    0 13.5G    0     0  4811k      0 --:--:--  0:49:12 --:--:-- 6121k    






   

   



   

Re: WebHDFS Performance : with and without Knox

Posted by larry mccay <lm...@apache.org>.
Hi Mohammad -

That is wonderful to see!
Not sure I understand why they are so close but I'm not arguing with
success.
"Performance Tip #1: turn off DEBUG logging" :)

These numbers may be useful within a blog post or something too.

Thank you for sharing with the community!

thanks,

--larry


On Wed, Nov 9, 2016 at 5:26 PM, Mohammad Islam <mi...@yahoo.com> wrote:

> Updates:
>
> *Root cause : The log level  was DEBUG. As soon as I moved it to INFO for
> all, the performance got very comparable.
>
> Data : I ran those downloads for 7 times and then averaged. Looks like
> they are very close to each other.
> I tried it from a 3rd machine NOT the machine where Knox was running.
> Overall: download speed for direct WebHDFS was nearly384M. With Knox proxy,
> download speed was 383M.
>
>
> Data Size ~14 GB
> Iteration Approach Time (sec) Downland speed (MBS)
> 1 Direct 42 325 Knox 44 310
> 2 Direct 31 444 Knox 29 467
> 3 Direct 44 314 Knox 51 270
> 4 Direct 38 359 Knox 36 382
> 5 Direct 73 188 Knox 39 350
> 6 Direct 25 536 Knox 28 489
> 7 Direct 26 523 Knox 33 410
> 39.85714286 384.1428571 37.14285714 382.5714286
>
>
> On Saturday, November 5, 2016 7:14 PM, Mohammad Islam <mi...@yahoo.com>
> wrote:
>
>
> Thanks Larry for sharing your findings.
> Number looks much better than mine. I tried with 0.9.1. Should i upgrade
> to 0,10,
>
> If possible can you please share your exact command with various options.
> Did you try with SSL  on? My two hosts were different and i tried it from
> KNOX_HOST box.
>
> Any other idea of how can i get better number?
>
> Regards,
> Mohammad
>
>
>
>
>
> On Saturday, November 5, 2016 6:55 PM, larry mccay <lm...@apache.org>
> wrote:
>
>
> Hi Mohammad -
>
> I have played around with this a bit and haven't been able to reproduce
> your results.
>
> My environment is a sandbox VM download and the Apache Knox 0.10.0 test
> instance running on the host machine.
> I put an ~8.5 GB file in hdfs and OPENed it with and without Knox.
>
> With Knox:
> 100 8470M    0 8470M    0     0   9.9M      0 --:--:--  0:14:09 --:--:--
>  9.9M
>
> Direct to WebHDFS:
> 100 8470M    0 8470M    0     0  13.6M      0 --:--:--  0:10:20 --:--:--
> 14.9M
>
> While we are certainly not speeding things up it isn't too bad.
> I believe that there is still room for some optimization in our rewrite
> process as has been discussed a bit on [1].
>
> This would get the numbers even closer together probably.
> However, even that won't make up the difference that you are seeing.
>
> I wonder what your test environment looks like where you are getting 99.6M
> avg speed direct and 4.8M from Knox.
> If the KNOX_HOST and WEBHDFS_HOST are different machines maybe you should
> try the direct curl command from the KNOX_HOST and see if there is a
> difference being introduced by the network or something like that.
>
> thanks,
>
> --larry
>
> [1] https://issues.apache.org/jira/browse/KNOX-767
>
>
>
> On Sat, Nov 5, 2016 at 6:41 PM, larry mccay <lm...@apache.org> wrote:
>
> Hi Mohammad -
>
> Thanks for reporting this.
>
> That is a big difference.
> Let me play around with it and see what I can reproduce.
>
> thanks,
>
> --larry
>
> On Sat, Nov 5, 2016 at 5:52 PM, Mohammad Islam <mi...@yahoo.com> wrote:
>
> Hi,
> I did a very basic comparison of download speed. I used similar "curl .."
>  command to download a large file (13.6 GB) and gathered the numbers.
>
> Looks like WebHDFS with Knox is very slow ( at least 20x slower). I ran it
> twice with similar numbers. For Knox, I turned off  SSL and both cases I
> used unsecured (non-Kerberos)  cluster.
>
> Download with Knox took nearly 49 minutes whereas direct download took 2
> mins. The download speed was *4811k* for Knox and  *99.6M* for direct
> download.
>
> I'm sure I have done something wrong. Do you see any such performance? Any
> help will be really appreciated.
>
> Regards,
> Mohammad
>
>
>
>
>
>
> Interactions:
> curl -o t2.direct -L http://<WEBHDFS_HOST>:50070/we
> bhdfs/v1/<FILE_PATH>?op=OPEN
>   % Total    % Received % Xferd  Average Speed   Time    Time     Time
> Current
>                                  Dload  Upload   Total   Spent    Left
> Speed
>   0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--
>   0
> 100 13.5G  100 13.5G    0     0  *99.6M*      0  0:02:19  *0:02:19*
> --:--:--  117M
>
>
>
>
> curl -H X-Auth-Params-Email: mislam@uber.com -o t2 -L http://
> <http://hadoopdevgw01-/><KNOW_HOST>:8445/gatewa
> y/sandbox/webhdfs/v1/<FILE_ PATH>?op=OPEN
>   % Total    % Received % Xferd  Average Speed   Time    Time     Time
> Current
>                                  Dload  Upload   Total   Spent    Left
> Speed
>   0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--
>   0
>   0     0    0 13.5G    0     0  *4811k*      0 --:--:--  *0:49:12*
> --:--:-- 6121k
>
>
>
>
>
>
>
>
>

Re: WebHDFS Performance : with and without Knox

Posted by Mohammad Islam <mi...@yahoo.com>.
Updates:
*Root cause : The log level  was DEBUG. As soon as I moved it to INFO for all, the performance got very comparable.
Data : I ran those downloads for 7 times and then averaged. Looks like they are very close to each other. I tried it from a 3rd machine NOT the machine where Knox was running. Overall: download speed for direct WebHDFS was nearly384M. With Knox proxy, download speed was 383M. 


| Data Size  | ~14 GB |  |  |  |  |  |
|  |  |  |  |  |  |  |
| Iteration | Approach | Time (sec) | Downland speed (MBS) |  |  |  |
| 1 | Direct | 42 | 325 | Knox | 44 | 310 |
| 2 | Direct  | 31 | 444 | Knox | 29 | 467 |
| 3 | Direct | 44 | 314 | Knox | 51 | 270 |
| 4 | Direct | 38 | 359 | Knox | 36 | 382 |
| 5 | Direct | 73 | 188 | Knox | 39 | 350 |
| 6 | Direct | 25 | 536 | Knox | 28 | 489 |
| 7 | Direct | 26 | 523 | Knox | 33 | 410 |
|  |  | 39.85714286 | 384.1428571 |  | 37.14285714 | 382.5714286 |

 

    On Saturday, November 5, 2016 7:14 PM, Mohammad Islam <mi...@yahoo.com> wrote:
 

 Thanks Larry for sharing your findings.Number looks much better than mine. I tried with 0.9.1. Should i upgrade to 0,10,
If possible can you please share your exact command with various options. Did you try with SSL  on? My two hosts were different and i tried it from KNOX_HOST box.
Any other idea of how can i get better number?
Regards,Mohammad


 

    On Saturday, November 5, 2016 6:55 PM, larry mccay <lm...@apache.org> wrote:
 

 Hi Mohammad -
I have played around with this a bit and haven't been able to reproduce your results.
My environment is a sandbox VM download and the Apache Knox 0.10.0 test instance running on the host machine.I put an ~8.5 GB file in hdfs and OPENed it with and without Knox.
With Knox:100 8470M    0 8470M    0     0   9.9M      0 --:--:--  0:14:09 --:--:--  9.9M
Direct to WebHDFS:100 8470M    0 8470M    0     0  13.6M      0 --:--:--  0:10:20 --:--:-- 14.9M
While we are certainly not speeding things up it isn't too bad.I believe that there is still room for some optimization in our rewrite process as has been discussed a bit on [1].
This would get the numbers even closer together probably.However, even that won't make up the difference that you are seeing.
I wonder what your test environment looks like where you are getting 99.6M avg speed direct and 4.8M from Knox.If the KNOX_HOST and WEBHDFS_HOST are different machines maybe you should try the direct curl command from the KNOX_HOST and see if there is a difference being introduced by the network or something like that.
thanks,
--larry
[1] https://issues.apache.org/jira/browse/KNOX-767


On Sat, Nov 5, 2016 at 6:41 PM, larry mccay <lm...@apache.org> wrote:

Hi Mohammad -
Thanks for reporting this.
That is a big difference.Let me play around with it and see what I can reproduce.

thanks,
--larry
On Sat, Nov 5, 2016 at 5:52 PM, Mohammad Islam <mi...@yahoo.com> wrote:

Hi,I did a very basic comparison of download speed. I used similar "curl .."  command to download a large file (13.6 GB) and gathered the numbers. 
Looks like WebHDFS with Knox is very slow ( at least 20x slower). I ran it twice with similar numbers. For Knox, I turned off  SSL and both cases I used unsecured (non-Kerberos)  cluster. 
Download with Knox took nearly 49 minutes whereas direct download took 2 mins. The download speed was 4811k for Knox and  99.6M for direct download.
I'm sure I have done something wrong. Do you see any such performance? Any help will be really appreciated.
Regards,Mohammad





Interactions:curl -o t2.direct -L http://<WEBHDFS_HOST>:50070/we bhdfs/v1/<FILE_PATH>?op=OPEN  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current                                 Dload  Upload   Total   Spent    Left  Speed  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0100 13.5G  100 13.5G    0     0  99.6M      0  0:02:19  0:02:19 --:--:--  117M



curl -H X-Auth-Params-Email: mislam@uber.com -o t2 -L http://<KNOW_HOST>:8445/gatewa y/sandbox/webhdfs/v1/<FILE_ PATH>?op=OPEN  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current                                 Dload  Upload   Total   Spent    Left  Speed  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0  0     0    0 13.5G    0     0  4811k      0 --:--:--  0:49:12 --:--:-- 6121k    






   

   

Re: WebHDFS Performance : with and without Knox

Posted by Mohammad Islam <mi...@yahoo.com>.
Thanks Larry for sharing your findings.Number looks much better than mine. I tried with 0.9.1. Should i upgrade to 0,10,
If possible can you please share your exact command with various options. Did you try with SSL  on? My two hosts were different and i tried it from KNOX_HOST box.
Any other idea of how can i get better number?
Regards,Mohammad


 

    On Saturday, November 5, 2016 6:55 PM, larry mccay <lm...@apache.org> wrote:
 

 Hi Mohammad -
I have played around with this a bit and haven't been able to reproduce your results.
My environment is a sandbox VM download and the Apache Knox 0.10.0 test instance running on the host machine.I put an ~8.5 GB file in hdfs and OPENed it with and without Knox.
With Knox:100 8470M    0 8470M    0     0   9.9M      0 --:--:--  0:14:09 --:--:--  9.9M
Direct to WebHDFS:100 8470M    0 8470M    0     0  13.6M      0 --:--:--  0:10:20 --:--:-- 14.9M
While we are certainly not speeding things up it isn't too bad.I believe that there is still room for some optimization in our rewrite process as has been discussed a bit on [1].
This would get the numbers even closer together probably.However, even that won't make up the difference that you are seeing.
I wonder what your test environment looks like where you are getting 99.6M avg speed direct and 4.8M from Knox.If the KNOX_HOST and WEBHDFS_HOST are different machines maybe you should try the direct curl command from the KNOX_HOST and see if there is a difference being introduced by the network or something like that.
thanks,
--larry
[1] https://issues.apache.org/jira/browse/KNOX-767


On Sat, Nov 5, 2016 at 6:41 PM, larry mccay <lm...@apache.org> wrote:

Hi Mohammad -
Thanks for reporting this.
That is a big difference.Let me play around with it and see what I can reproduce.

thanks,
--larry
On Sat, Nov 5, 2016 at 5:52 PM, Mohammad Islam <mi...@yahoo.com> wrote:

Hi,I did a very basic comparison of download speed. I used similar "curl .."  command to download a large file (13.6 GB) and gathered the numbers. 
Looks like WebHDFS with Knox is very slow ( at least 20x slower). I ran it twice with similar numbers. For Knox, I turned off  SSL and both cases I used unsecured (non-Kerberos)  cluster. 
Download with Knox took nearly 49 minutes whereas direct download took 2 mins. The download speed was 4811k for Knox and  99.6M for direct download.
I'm sure I have done something wrong. Do you see any such performance? Any help will be really appreciated.
Regards,Mohammad





Interactions:curl -o t2.direct -L http://<WEBHDFS_HOST>:50070/we bhdfs/v1/<FILE_PATH>?op=OPEN  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current                                 Dload  Upload   Total   Spent    Left  Speed  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0100 13.5G  100 13.5G    0     0  99.6M      0  0:02:19  0:02:19 --:--:--  117M



curl -H X-Auth-Params-Email: mislam@uber.com -o t2 -L http://<KNOW_HOST>:8445/gatewa y/sandbox/webhdfs/v1/<FILE_ PATH>?op=OPEN  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current                                 Dload  Upload   Total   Spent    Left  Speed  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0  0     0    0 13.5G    0     0  4811k      0 --:--:--  0:49:12 --:--:-- 6121k    






   

Re: WebHDFS Performance : with and without Knox

Posted by larry mccay <lm...@apache.org>.
Hi Mohammad -

I have played around with this a bit and haven't been able to reproduce
your results.

My environment is a sandbox VM download and the Apache Knox 0.10.0 test
instance running on the host machine.
I put an ~8.5 GB file in hdfs and OPENed it with and without Knox.

With Knox:
100 8470M    0 8470M    0     0   9.9M      0 --:--:--  0:14:09 --:--:--
 9.9M

Direct to WebHDFS:
100 8470M    0 8470M    0     0  13.6M      0 --:--:--  0:10:20 --:--:--
14.9M

While we are certainly not speeding things up it isn't too bad.
I believe that there is still room for some optimization in our rewrite
process as has been discussed a bit on [1].

This would get the numbers even closer together probably.
However, even that won't make up the difference that you are seeing.

I wonder what your test environment looks like where you are getting 99.6M
avg speed direct and 4.8M from Knox.
If the KNOX_HOST and WEBHDFS_HOST are different machines maybe you should
try the direct curl command from the KNOX_HOST and see if there is a
difference being introduced by the network or something like that.

thanks,

--larry

[1] https://issues.apache.org/jira/browse/KNOX-767



On Sat, Nov 5, 2016 at 6:41 PM, larry mccay <lm...@apache.org> wrote:

> Hi Mohammad -
>
> Thanks for reporting this.
>
> That is a big difference.
> Let me play around with it and see what I can reproduce.
>
> thanks,
>
> --larry
>
> On Sat, Nov 5, 2016 at 5:52 PM, Mohammad Islam <mi...@yahoo.com> wrote:
>
>> Hi,
>> I did a very basic comparison of download speed. I used similar "curl .."
>>  command to download a large file (13.6 GB) and gathered the numbers.
>>
>> Looks like WebHDFS with Knox is very slow ( at least 20x slower). I ran
>> it twice with similar numbers. For Knox, I turned off  SSL and both cases I
>> used unsecured (non-Kerberos)  cluster.
>>
>> Download with Knox took nearly 49 minutes whereas direct download took 2
>> mins. The download speed was *4811k* for Knox and  *99.6M* for direct
>> download.
>>
>> I'm sure I have done something wrong. Do you see any such performance?
>> Any help will be really appreciated.
>>
>> Regards,
>> Mohammad
>>
>>
>>
>>
>>
>>
>> Interactions:
>> curl -o t2.direct -L http://<WEBHDFS_HOST>:50070/we
>> bhdfs/v1/<FILE_PATH>?op=OPEN
>>   % Total    % Received % Xferd  Average Speed   Time    Time     Time
>> Current
>>                                  Dload  Upload   Total   Spent    Left
>> Speed
>>   0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--
>>     0
>> 100 13.5G  100 13.5G    0     0  *99.6M*      0  0:02:19  *0:02:19*
>> --:--:--  117M
>>
>>
>>
>>
>> curl -H X-Auth-Params-Email: mislam@uber.com -o t2 -L http://
>> <http://hadoopdevgw01-/><KNOW_HOST>:8445/gatewa
>> y/sandbox/webhdfs/v1/<FILE_PATH>?op=OPEN
>>   % Total    % Received % Xferd  Average Speed   Time    Time     Time
>> Current
>>                                  Dload  Upload   Total   Spent    Left
>> Speed
>>   0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--
>>     0
>>   0     0    0 13.5G    0     0  *4811k*      0 --:--:--  *0:49:12*
>> --:--:-- 6121k
>>
>>
>
>

Re: WebHDFS Performance : with and without Knox

Posted by larry mccay <lm...@apache.org>.
Hi Mohammad -

Thanks for reporting this.

That is a big difference.
Let me play around with it and see what I can reproduce.

thanks,

--larry

On Sat, Nov 5, 2016 at 5:52 PM, Mohammad Islam <mi...@yahoo.com> wrote:

> Hi,
> I did a very basic comparison of download speed. I used similar "curl .."
>  command to download a large file (13.6 GB) and gathered the numbers.
>
> Looks like WebHDFS with Knox is very slow ( at least 20x slower). I ran it
> twice with similar numbers. For Knox, I turned off  SSL and both cases I
> used unsecured (non-Kerberos)  cluster.
>
> Download with Knox took nearly 49 minutes whereas direct download took 2
> mins. The download speed was *4811k* for Knox and  *99.6M* for direct
> download.
>
> I'm sure I have done something wrong. Do you see any such performance? Any
> help will be really appreciated.
>
> Regards,
> Mohammad
>
>
>
>
>
>
> Interactions:
> curl -o t2.direct -L http://<WEBHDFS_HOST>:50070/
> webhdfs/v1/<FILE_PATH>?op=OPEN
>   % Total    % Received % Xferd  Average Speed   Time    Time     Time
> Current
>                                  Dload  Upload   Total   Spent    Left
> Speed
>   0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--
>   0
> 100 13.5G  100 13.5G    0     0  *99.6M*      0  0:02:19  *0:02:19*
> --:--:--  117M
>
>
>
>
> curl -H X-Auth-Params-Email: mislam@uber.com -o t2 -L http://
> <http://hadoopdevgw01-/><KNOW_HOST>:8445/gateway/sandbox/webhdfs/v1/<
> FILE_PATH>?op=OPEN
>   % Total    % Received % Xferd  Average Speed   Time    Time     Time
> Current
>                                  Dload  Upload   Total   Spent    Left
> Speed
>   0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--
>   0
>   0     0    0 13.5G    0     0  *4811k*      0 --:--:--  *0:49:12*
> --:--:-- 6121k
>
>