You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oodt.apache.org by Etienne Koen <et...@scs-space.com> on 2014/09/12 09:15:37 UTC

RE: PushPull

Hi Chris,

Thank you for your response and info! I would be happy to document my results and would appreciate it if the community could respond to some of my questions I still have.

At the moment it does not look like I have permissions or the functionality to create a page... Or I am looking at the wrong place to do so :-)

My immediate question is whether pushpull have the parallel capability such as GridFTP and how to specify it for the next test phase...

Cheers

Etienne Koen
Data Processing Systems Engineer

Space Advisory Company

O: +27 (21) 300 0060 I C: +27 (76) 661 0170 I E: etiennek@scs-space.com

________________________________________
From: Mattmann, Chris A (3980) [chris.a.mattmann@jpl.nasa.gov]
Sent: Thursday, September 11, 2014 4:47 PM
To: dev@oodt.apache.org
Cc: Etienne Koen; Khudikyan, Shakeh E (398J)
Subject: FW: PushPull

Etienne,


Thank you for sending this along! The crazy part about these types of data
transfer studies especially with TCP/IP based protocols that aren't
parallelized
(e.g., FTP) is that you are limited by what's going on in the surrounding
network.
For example see the attached studies my team has published on data
movement
over the past 5-7 years and notice a similar type of behavior. Pretty
interesting
independent of the family of data transfer you're using.

Take a look at my Dissertation too:

http://sunset.usc.edu/~mattmann/Dissertation.pdf

This concluded that parallel TCP/IP technologies like GridFTP (now
GlobusOnline)
and bbFTP performed the best across the public WAN for performance and
efficiency
related parameters, whereas if those aren't the overall properties you are
trying
to maximize (and instead care about good enough performance, but with ease
of
install and use - then things like WebDAV and so forth are probably good
enough).

I'd be happy to discuss your results more in general. It would be great if
you
created a wiki page here:

https://cwiki.apache.org/confluence/display/OODT/Home


To document your testing and results. Thank you and let me know!

Cheers,
Chris

-----Original Message-----
From: Etienne Koen <et...@scs-space.com>
Date: Thursday, September 11, 2014 12:55 AM
To: Chris Mattmann <ch...@gmail.com>
Cc: Shakeh Khudikyan <Sh...@jpl.nasa.gov>
Subject: PushPull

>Hi Chris and Shakeh,
>
>Attached are some of the results which were performed according to the
>baseline testing requirements. This was simply to transfer a directory of
>1GB with varying file sizes. For completeness I have gone so far as to
>transfer files of 1MB each (This scenario might not be very probable for
>SKA though...). I have noticed a substantiation drop in the transfer rate
>achieved compared to the 100MB files as well as the transfer rate being
>quite variable. What would be the main contributor for this? I see that
>there is a metadata file created for each transfer which might perhaps
>contribute to the overhead and become quite prominent in the 1000 x 1MB
>file case. All these tests used the FTP protocol and were performed on
>the same machine and network link:
>
>
>
>
>
>For testing single file transfer I found the maximum transfer rate only
>being achieved for files > 256 MB:
>
>
>
>
>I also monitored the transfer rate of a 8192 MB file which constantly
>revealed an interesting behaviour of achieving a maximum transfer rate
>after which the transfer rate then drops. I am also unsure what the cause
>for this might be as it happened constantly and in both transfer
>directions:
>
>
>
>I would greatly appreciate your comments on this and it include it in my
>report before I submit it during next week.
>
>All the best!
>
>Cheers
>Etienne
>
>
>
>
>Etienne Koen
>Data Processing Systems Engineer
>
>
>
>
>Space Advisory Company
>
>O: +27 (21) 300 0060 I C: +27 (76) 661 0170 I E: etiennek@scs-space.com
>
>
>
>
>


________________________________

Disclaimer: This E-mail message, including any attachments, is intended only for the person or entity to which it is addressed, and may contain confidential information. Each page attached hereto must also be read in conjunction with this disclaimer.
If you are not the intended recipient you are hereby notified that any disclosure, copying, distribution or reliance upon the contents of this e-mail is strictly prohibited. E.&O.E.

________________________________

Disclaimer: This E-mail message, including any attachments, is intended only for the person or entity to which it is addressed, and may contain confidential information. Each page attached hereto must also be read in conjunction with this disclaimer.
If you are not the intended recipient you are hereby notified that any disclosure, copying, distribution or reliance upon the contents of this e-mail is strictly prohibited. E.&O.E.

Re: PushPull

Posted by "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov>.
Thanks Etienne, congrats on finishing up your PhD!

Push Pull focuses on downloading multiple files using multiple threads,
but I donĀ¹t believe a single file using multiple threads.

Hope that helps!

Cheers,
Chris


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++






-----Original Message-----
From: Etienne Koen <et...@scs-space.com>
Date: Wednesday, October 15, 2014 at 1:53 AM
To: Chris Mattmann <Ch...@jpl.nasa.gov>, "dev@oodt.apache.org"
<de...@oodt.apache.org>
Cc: Shakeh Khudikyan <Sh...@jpl.nasa.gov>, Brian Foster
<ho...@me.com>
Subject: RE: PushPull

>Hi Chris,
>
>Sorry for the lack of communication the last couple of weeks. I had my
>last PhD responsibilities which is now finally completed :-)
>
>I am getting back in to things again with OODT... I just want to get some
>clarification on the parallel transfer of pushpull. Sorry if I am
>repeating myself but I just want some clear clarification about the
>parallel file transfer. Is the parallelism of pushpull implemented and
>capable to download a single file using parallel threads or is it only
>applicable to downloading multiple files?
>
>I am referring to the lines:
>
>org.apache.oodt.cas.pushpull.crawler.use.tracker=false
>
>org.apache.oodt.cas.pushpull.file.retrieval.system.recommended.thread.coun
>t=30
>
>Thanks
>
>Etienne Koen
>Data Processing Systems Engineer
>
>Space Advisory Company
>
>O: +27 (21) 300 0060 I C: +27 (76) 661 0170 I E: etiennek@scs-space.com
>
>________________________________________
>From: Mattmann, Chris A (3980) [chris.a.mattmann@jpl.nasa.gov]
>Sent: Monday, September 15, 2014 8:45 AM
>To: Etienne Koen; dev@oodt.apache.org
>Cc: Khudikyan, Shakeh E (398J); Brian Foster
>Subject: Re: PushPull
>
>Dear Etienne,
>
>Thanks for your questions! Yes, there are ways to manipulate the
>manner in which PushPull achieves parallelism, check out:
>
>http://svn.apache.org/repos/asf/oodt/trunk/pushpull/src/main/resources/pus
>h
>_pull_framework.properties
>
>
>Look at the File Retrieval System related parameters.
>
>Also check out this documentation produced by Brian Foster which
>provides a lot of detail on how to use PushPull.
>
>http://svn.apache.org/repos/asf/oodt/trunk/pushpull/src/main/resources/doc
>u
>mentation/
>
>
>Cheers,
>Chris
>
>
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>Chris Mattmann, Ph.D.
>Chief Architect
>Instrument Software and Science Data Systems Section (398)
>NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>Office: 168-519, Mailstop: 168-527
>Email: chris.a.mattmann@nasa.gov
>WWW:  http://sunset.usc.edu/~mattmann/
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>Adjunct Associate Professor, Computer Science Department
>University of Southern California, Los Angeles, CA 90089 USA
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>
>
>
>-----Original Message-----
>From: Etienne Koen <et...@scs-space.com>
>Date: Sunday, September 14, 2014 11:42 PM
>To: Chris Mattmann <Ch...@jpl.nasa.gov>, "dev@oodt.apache.org"
><de...@oodt.apache.org>
>Cc: Shakeh Khudikyan <Sh...@jpl.nasa.gov>
>Subject: RE: PushPull
>
>>Thanks for the information!
>>
>>Please correct me if I am wrong, PushPull in it's default operation
>>downloads files in parallel? Is there a way to specify any of the
>>parallel parameters when downloading files? For example, thread number?
>>Is there any way to have more control over the parallelism?
>>
>>Thanks
>>Etienne
>>
>>Etienne Koen
>>Data Processing Systems Engineer
>>
>>Space Advisory Company
>>
>>O: +27 (21) 300 0060 I C: +27 (76) 661 0170 I E: etiennek@scs-space.com
>>
>>________________________________________
>>From: Mattmann, Chris A (3980) [chris.a.mattmann@jpl.nasa.gov]
>>Sent: Friday, September 12, 2014 4:18 PM
>>To: Etienne Koen; dev@oodt.apache.org
>>Cc: Khudikyan, Shakeh E (398J)
>>Subject: Re: PushPull
>>
>>Hi Etienne,
>>
>>Thanks for your question! Yes, PushPull has parallel downloading
>>capability, so in terms of "pulling" data it definitely has similar
>>capability to GridFTP. PushPull can't initiate or "push" a transfer
>>like GridFTP can in that sense, so it's not exactly an apples to
>>apples comparison.
>>
>>For the wiki, you can sign up to create an account here:
>>
>>https://cwiki.apache.org/confluence/signup.action
>>
>>Cheers!
>>
>>Chris
>>
>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>Chris Mattmann, Ph.D.
>>Chief Architect
>>Instrument Software and Science Data Systems Section (398)
>>NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>Office: 168-519, Mailstop: 168-527
>>Email: chris.a.mattmann@nasa.gov
>>WWW:  http://sunset.usc.edu/~mattmann/
>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>Adjunct Associate Professor, Computer Science Department
>>University of Southern California, Los Angeles, CA 90089 USA
>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>
>>
>>
>>
>>
>>
>>-----Original Message-----
>>From: Etienne Koen <et...@scs-space.com>
>>Date: Friday, September 12, 2014 12:15 AM
>>To: Chris Mattmann <Ch...@jpl.nasa.gov>, "dev@oodt.apache.org"
>><de...@oodt.apache.org>
>>Cc: Shakeh Khudikyan <Sh...@jpl.nasa.gov>
>>Subject: RE: PushPull
>>
>>>Hi Chris,
>>>
>>>Thank you for your response and info! I would be happy to document my
>>>results and would appreciate it if the community could respond to some
>>>of
>>>my questions I still have.
>>>
>>>At the moment it does not look like I have permissions or the
>>>functionality to create a page... Or I am looking at the wrong place to
>>>do so :-)
>>>
>>>My immediate question is whether pushpull have the parallel capability
>>>such as GridFTP and how to specify it for the next test phase...
>>>
>>>Cheers
>>>
>>>Etienne Koen
>>>Data Processing Systems Engineer
>>>
>>>Space Advisory Company
>>>
>>>O: +27 (21) 300 0060 I C: +27 (76) 661 0170 I E: etiennek@scs-space.com
>>>
>>>________________________________________
>>>From: Mattmann, Chris A (3980) [chris.a.mattmann@jpl.nasa.gov]
>>>Sent: Thursday, September 11, 2014 4:47 PM
>>>To: dev@oodt.apache.org
>>>Cc: Etienne Koen; Khudikyan, Shakeh E (398J)
>>>Subject: FW: PushPull
>>>
>>>Etienne,
>>>
>>>
>>>Thank you for sending this along! The crazy part about these types of
>>>data
>>>transfer studies especially with TCP/IP based protocols that aren't
>>>parallelized
>>>(e.g., FTP) is that you are limited by what's going on in the
>>>surrounding
>>>network.
>>>For example see the attached studies my team has published on data
>>>movement
>>>over the past 5-7 years and notice a similar type of behavior. Pretty
>>>interesting
>>>independent of the family of data transfer you're using.
>>>
>>>Take a look at my Dissertation too:
>>>
>>>http://sunset.usc.edu/~mattmann/Dissertation.pdf
>>>
>>>This concluded that parallel TCP/IP technologies like GridFTP (now
>>>GlobusOnline)
>>>and bbFTP performed the best across the public WAN for performance and
>>>efficiency
>>>related parameters, whereas if those aren't the overall properties you
>>>are
>>>trying
>>>to maximize (and instead care about good enough performance, but with
>>>ease
>>>of
>>>install and use - then things like WebDAV and so forth are probably good
>>>enough).
>>>
>>>I'd be happy to discuss your results more in general. It would be great
>>>if
>>>you
>>>created a wiki page here:
>>>
>>>https://cwiki.apache.org/confluence/display/OODT/Home
>>>
>>>
>>>To document your testing and results. Thank you and let me know!
>>>
>>>Cheers,
>>>Chris
>>>
>>>-----Original Message-----
>>>From: Etienne Koen <et...@scs-space.com>
>>>Date: Thursday, September 11, 2014 12:55 AM
>>>To: Chris Mattmann <ch...@gmail.com>
>>>Cc: Shakeh Khudikyan <Sh...@jpl.nasa.gov>
>>>Subject: PushPull
>>>
>>>>Hi Chris and Shakeh,
>>>>
>>>>Attached are some of the results which were performed according to the
>>>>baseline testing requirements. This was simply to transfer a directory
>>>>of
>>>>1GB with varying file sizes. For completeness I have gone so far as to
>>>>transfer files of 1MB each (This scenario might not be very probable
>>>>for
>>>>SKA though...). I have noticed a substantiation drop in the transfer
>>>>rate
>>>>achieved compared to the 100MB files as well as the transfer rate being
>>>>quite variable. What would be the main contributor for this? I see that
>>>>there is a metadata file created for each transfer which might perhaps
>>>>contribute to the overhead and become quite prominent in the 1000 x 1MB
>>>>file case. All these tests used the FTP protocol and were performed on
>>>>the same machine and network link:
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>For testing single file transfer I found the maximum transfer rate only
>>>>being achieved for files > 256 MB:
>>>>
>>>>
>>>>
>>>>
>>>>I also monitored the transfer rate of a 8192 MB file which constantly
>>>>revealed an interesting behaviour of achieving a maximum transfer rate
>>>>after which the transfer rate then drops. I am also unsure what the
>>>>cause
>>>>for this might be as it happened constantly and in both transfer
>>>>directions:
>>>>
>>>>
>>>>
>>>>I would greatly appreciate your comments on this and it include it in
>>>>my
>>>>report before I submit it during next week.
>>>>
>>>>All the best!
>>>>
>>>>Cheers
>>>>Etienne
>>>>
>>>>
>>>>
>>>>
>>>>Etienne Koen
>>>>Data Processing Systems Engineer
>>>>
>>>>
>>>>
>>>>
>>>>Space Advisory Company
>>>>
>>>>O: +27 (21) 300 0060 I C: +27 (76) 661 0170 I E: etiennek@scs-space.com
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>________________________________
>>>
>>>Disclaimer: This E-mail message, including any attachments, is intended
>>>only for the person or entity to which it is addressed, and may contain
>>>confidential information. Each page attached hereto must also be read in
>>>conjunction with this disclaimer.
>>>If you are not the intended recipient you are hereby notified that any
>>>disclosure, copying, distribution or reliance upon the contents of this
>>>e-mail is strictly prohibited. E.&O.E.
>>>
>>>________________________________
>>>
>>>Disclaimer: This E-mail message, including any attachments, is intended
>>>only for the person or entity to which it is addressed, and may contain
>>>confidential information. Each page attached hereto must also be read in
>>>conjunction with this disclaimer.
>>>If you are not the intended recipient you are hereby notified that any
>>>disclosure, copying, distribution or reliance upon the contents of this
>>>e-mail is strictly prohibited. E.&O.E.
>>
>>
>>________________________________
>>
>>Disclaimer: This E-mail message, including any attachments, is intended
>>only for the person or entity to which it is addressed, and may contain
>>confidential information. Each page attached hereto must also be read in
>>conjunction with this disclaimer.
>>If you are not the intended recipient you are hereby notified that any
>>disclosure, copying, distribution or reliance upon the contents of this
>>e-mail is strictly prohibited. E.&O.E.
>>
>>________________________________
>>
>>Disclaimer: This E-mail message, including any attachments, is intended
>>only for the person or entity to which it is addressed, and may contain
>>confidential information. Each page attached hereto must also be read in
>>conjunction with this disclaimer.
>>If you are not the intended recipient you are hereby notified that any
>>disclosure, copying, distribution or reliance upon the contents of this
>>e-mail is strictly prohibited. E.&O.E.
>
>
>________________________________
>
>Disclaimer: This E-mail message, including any attachments, is intended
>only for the person or entity to which it is addressed, and may contain
>confidential information. Each page attached hereto must also be read in
>conjunction with this disclaimer.
>If you are not the intended recipient you are hereby notified that any
>disclosure, copying, distribution or reliance upon the contents of this
>e-mail is strictly prohibited. E.&O.E.
>
>________________________________
>
>Disclaimer: This E-mail message, including any attachments, is intended
>only for the person or entity to which it is addressed, and may contain
>confidential information. Each page attached hereto must also be read in
>conjunction with this disclaimer.
>If you are not the intended recipient you are hereby notified that any
>disclosure, copying, distribution or reliance upon the contents of this
>e-mail is strictly prohibited. E.&O.E.


RE: PushPull

Posted by Etienne Koen <et...@scs-space.com>.
Hi Chris,

Sorry for the lack of communication the last couple of weeks. I had my last PhD responsibilities which is now finally completed :-)

I am getting back in to things again with OODT... I just want to get some clarification on the parallel transfer of pushpull. Sorry if I am repeating myself but I just want some clear clarification about the parallel file transfer. Is the parallelism of pushpull implemented and capable to download a single file using parallel threads or is it only applicable to downloading multiple files?

I am referring to the lines:

org.apache.oodt.cas.pushpull.crawler.use.tracker=false

org.apache.oodt.cas.pushpull.file.retrieval.system.recommended.thread.count=30

Thanks

Etienne Koen
Data Processing Systems Engineer

Space Advisory Company

O: +27 (21) 300 0060 I C: +27 (76) 661 0170 I E: etiennek@scs-space.com

________________________________________
From: Mattmann, Chris A (3980) [chris.a.mattmann@jpl.nasa.gov]
Sent: Monday, September 15, 2014 8:45 AM
To: Etienne Koen; dev@oodt.apache.org
Cc: Khudikyan, Shakeh E (398J); Brian Foster
Subject: Re: PushPull

Dear Etienne,

Thanks for your questions! Yes, there are ways to manipulate the
manner in which PushPull achieves parallelism, check out:

http://svn.apache.org/repos/asf/oodt/trunk/pushpull/src/main/resources/push
_pull_framework.properties


Look at the File Retrieval System related parameters.

Also check out this documentation produced by Brian Foster which
provides a lot of detail on how to use PushPull.

http://svn.apache.org/repos/asf/oodt/trunk/pushpull/src/main/resources/docu
mentation/


Cheers,
Chris


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++






-----Original Message-----
From: Etienne Koen <et...@scs-space.com>
Date: Sunday, September 14, 2014 11:42 PM
To: Chris Mattmann <Ch...@jpl.nasa.gov>, "dev@oodt.apache.org"
<de...@oodt.apache.org>
Cc: Shakeh Khudikyan <Sh...@jpl.nasa.gov>
Subject: RE: PushPull

>Thanks for the information!
>
>Please correct me if I am wrong, PushPull in it's default operation
>downloads files in parallel? Is there a way to specify any of the
>parallel parameters when downloading files? For example, thread number?
>Is there any way to have more control over the parallelism?
>
>Thanks
>Etienne
>
>Etienne Koen
>Data Processing Systems Engineer
>
>Space Advisory Company
>
>O: +27 (21) 300 0060 I C: +27 (76) 661 0170 I E: etiennek@scs-space.com
>
>________________________________________
>From: Mattmann, Chris A (3980) [chris.a.mattmann@jpl.nasa.gov]
>Sent: Friday, September 12, 2014 4:18 PM
>To: Etienne Koen; dev@oodt.apache.org
>Cc: Khudikyan, Shakeh E (398J)
>Subject: Re: PushPull
>
>Hi Etienne,
>
>Thanks for your question! Yes, PushPull has parallel downloading
>capability, so in terms of "pulling" data it definitely has similar
>capability to GridFTP. PushPull can't initiate or "push" a transfer
>like GridFTP can in that sense, so it's not exactly an apples to
>apples comparison.
>
>For the wiki, you can sign up to create an account here:
>
>https://cwiki.apache.org/confluence/signup.action
>
>Cheers!
>
>Chris
>
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>Chris Mattmann, Ph.D.
>Chief Architect
>Instrument Software and Science Data Systems Section (398)
>NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>Office: 168-519, Mailstop: 168-527
>Email: chris.a.mattmann@nasa.gov
>WWW:  http://sunset.usc.edu/~mattmann/
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>Adjunct Associate Professor, Computer Science Department
>University of Southern California, Los Angeles, CA 90089 USA
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>
>
>
>-----Original Message-----
>From: Etienne Koen <et...@scs-space.com>
>Date: Friday, September 12, 2014 12:15 AM
>To: Chris Mattmann <Ch...@jpl.nasa.gov>, "dev@oodt.apache.org"
><de...@oodt.apache.org>
>Cc: Shakeh Khudikyan <Sh...@jpl.nasa.gov>
>Subject: RE: PushPull
>
>>Hi Chris,
>>
>>Thank you for your response and info! I would be happy to document my
>>results and would appreciate it if the community could respond to some of
>>my questions I still have.
>>
>>At the moment it does not look like I have permissions or the
>>functionality to create a page... Or I am looking at the wrong place to
>>do so :-)
>>
>>My immediate question is whether pushpull have the parallel capability
>>such as GridFTP and how to specify it for the next test phase...
>>
>>Cheers
>>
>>Etienne Koen
>>Data Processing Systems Engineer
>>
>>Space Advisory Company
>>
>>O: +27 (21) 300 0060 I C: +27 (76) 661 0170 I E: etiennek@scs-space.com
>>
>>________________________________________
>>From: Mattmann, Chris A (3980) [chris.a.mattmann@jpl.nasa.gov]
>>Sent: Thursday, September 11, 2014 4:47 PM
>>To: dev@oodt.apache.org
>>Cc: Etienne Koen; Khudikyan, Shakeh E (398J)
>>Subject: FW: PushPull
>>
>>Etienne,
>>
>>
>>Thank you for sending this along! The crazy part about these types of
>>data
>>transfer studies especially with TCP/IP based protocols that aren't
>>parallelized
>>(e.g., FTP) is that you are limited by what's going on in the surrounding
>>network.
>>For example see the attached studies my team has published on data
>>movement
>>over the past 5-7 years and notice a similar type of behavior. Pretty
>>interesting
>>independent of the family of data transfer you're using.
>>
>>Take a look at my Dissertation too:
>>
>>http://sunset.usc.edu/~mattmann/Dissertation.pdf
>>
>>This concluded that parallel TCP/IP technologies like GridFTP (now
>>GlobusOnline)
>>and bbFTP performed the best across the public WAN for performance and
>>efficiency
>>related parameters, whereas if those aren't the overall properties you
>>are
>>trying
>>to maximize (and instead care about good enough performance, but with
>>ease
>>of
>>install and use - then things like WebDAV and so forth are probably good
>>enough).
>>
>>I'd be happy to discuss your results more in general. It would be great
>>if
>>you
>>created a wiki page here:
>>
>>https://cwiki.apache.org/confluence/display/OODT/Home
>>
>>
>>To document your testing and results. Thank you and let me know!
>>
>>Cheers,
>>Chris
>>
>>-----Original Message-----
>>From: Etienne Koen <et...@scs-space.com>
>>Date: Thursday, September 11, 2014 12:55 AM
>>To: Chris Mattmann <ch...@gmail.com>
>>Cc: Shakeh Khudikyan <Sh...@jpl.nasa.gov>
>>Subject: PushPull
>>
>>>Hi Chris and Shakeh,
>>>
>>>Attached are some of the results which were performed according to the
>>>baseline testing requirements. This was simply to transfer a directory
>>>of
>>>1GB with varying file sizes. For completeness I have gone so far as to
>>>transfer files of 1MB each (This scenario might not be very probable for
>>>SKA though...). I have noticed a substantiation drop in the transfer
>>>rate
>>>achieved compared to the 100MB files as well as the transfer rate being
>>>quite variable. What would be the main contributor for this? I see that
>>>there is a metadata file created for each transfer which might perhaps
>>>contribute to the overhead and become quite prominent in the 1000 x 1MB
>>>file case. All these tests used the FTP protocol and were performed on
>>>the same machine and network link:
>>>
>>>
>>>
>>>
>>>
>>>For testing single file transfer I found the maximum transfer rate only
>>>being achieved for files > 256 MB:
>>>
>>>
>>>
>>>
>>>I also monitored the transfer rate of a 8192 MB file which constantly
>>>revealed an interesting behaviour of achieving a maximum transfer rate
>>>after which the transfer rate then drops. I am also unsure what the
>>>cause
>>>for this might be as it happened constantly and in both transfer
>>>directions:
>>>
>>>
>>>
>>>I would greatly appreciate your comments on this and it include it in my
>>>report before I submit it during next week.
>>>
>>>All the best!
>>>
>>>Cheers
>>>Etienne
>>>
>>>
>>>
>>>
>>>Etienne Koen
>>>Data Processing Systems Engineer
>>>
>>>
>>>
>>>
>>>Space Advisory Company
>>>
>>>O: +27 (21) 300 0060 I C: +27 (76) 661 0170 I E: etiennek@scs-space.com
>>>
>>>
>>>
>>>
>>>
>>
>>
>>________________________________
>>
>>Disclaimer: This E-mail message, including any attachments, is intended
>>only for the person or entity to which it is addressed, and may contain
>>confidential information. Each page attached hereto must also be read in
>>conjunction with this disclaimer.
>>If you are not the intended recipient you are hereby notified that any
>>disclosure, copying, distribution or reliance upon the contents of this
>>e-mail is strictly prohibited. E.&O.E.
>>
>>________________________________
>>
>>Disclaimer: This E-mail message, including any attachments, is intended
>>only for the person or entity to which it is addressed, and may contain
>>confidential information. Each page attached hereto must also be read in
>>conjunction with this disclaimer.
>>If you are not the intended recipient you are hereby notified that any
>>disclosure, copying, distribution or reliance upon the contents of this
>>e-mail is strictly prohibited. E.&O.E.
>
>
>________________________________
>
>Disclaimer: This E-mail message, including any attachments, is intended
>only for the person or entity to which it is addressed, and may contain
>confidential information. Each page attached hereto must also be read in
>conjunction with this disclaimer.
>If you are not the intended recipient you are hereby notified that any
>disclosure, copying, distribution or reliance upon the contents of this
>e-mail is strictly prohibited. E.&O.E.
>
>________________________________
>
>Disclaimer: This E-mail message, including any attachments, is intended
>only for the person or entity to which it is addressed, and may contain
>confidential information. Each page attached hereto must also be read in
>conjunction with this disclaimer.
>If you are not the intended recipient you are hereby notified that any
>disclosure, copying, distribution or reliance upon the contents of this
>e-mail is strictly prohibited. E.&O.E.


________________________________

Disclaimer: This E-mail message, including any attachments, is intended only for the person or entity to which it is addressed, and may contain confidential information. Each page attached hereto must also be read in conjunction with this disclaimer.
If you are not the intended recipient you are hereby notified that any disclosure, copying, distribution or reliance upon the contents of this e-mail is strictly prohibited. E.&O.E.

________________________________

Disclaimer: This E-mail message, including any attachments, is intended only for the person or entity to which it is addressed, and may contain confidential information. Each page attached hereto must also be read in conjunction with this disclaimer.
If you are not the intended recipient you are hereby notified that any disclosure, copying, distribution or reliance upon the contents of this e-mail is strictly prohibited. E.&O.E.

Re: PushPull

Posted by "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov>.
Dear Etienne,

Thanks for your questions! Yes, there are ways to manipulate the
manner in which PushPull achieves parallelism, check out:

http://svn.apache.org/repos/asf/oodt/trunk/pushpull/src/main/resources/push
_pull_framework.properties


Look at the File Retrieval System related parameters.

Also check out this documentation produced by Brian Foster which
provides a lot of detail on how to use PushPull.

http://svn.apache.org/repos/asf/oodt/trunk/pushpull/src/main/resources/docu
mentation/


Cheers,
Chris


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++






-----Original Message-----
From: Etienne Koen <et...@scs-space.com>
Date: Sunday, September 14, 2014 11:42 PM
To: Chris Mattmann <Ch...@jpl.nasa.gov>, "dev@oodt.apache.org"
<de...@oodt.apache.org>
Cc: Shakeh Khudikyan <Sh...@jpl.nasa.gov>
Subject: RE: PushPull

>Thanks for the information!
>
>Please correct me if I am wrong, PushPull in it's default operation
>downloads files in parallel? Is there a way to specify any of the
>parallel parameters when downloading files? For example, thread number?
>Is there any way to have more control over the parallelism?
>
>Thanks
>Etienne
>
>Etienne Koen
>Data Processing Systems Engineer
>
>Space Advisory Company
>
>O: +27 (21) 300 0060 I C: +27 (76) 661 0170 I E: etiennek@scs-space.com
>
>________________________________________
>From: Mattmann, Chris A (3980) [chris.a.mattmann@jpl.nasa.gov]
>Sent: Friday, September 12, 2014 4:18 PM
>To: Etienne Koen; dev@oodt.apache.org
>Cc: Khudikyan, Shakeh E (398J)
>Subject: Re: PushPull
>
>Hi Etienne,
>
>Thanks for your question! Yes, PushPull has parallel downloading
>capability, so in terms of "pulling" data it definitely has similar
>capability to GridFTP. PushPull can't initiate or "push" a transfer
>like GridFTP can in that sense, so it's not exactly an apples to
>apples comparison.
>
>For the wiki, you can sign up to create an account here:
>
>https://cwiki.apache.org/confluence/signup.action
>
>Cheers!
>
>Chris
>
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>Chris Mattmann, Ph.D.
>Chief Architect
>Instrument Software and Science Data Systems Section (398)
>NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>Office: 168-519, Mailstop: 168-527
>Email: chris.a.mattmann@nasa.gov
>WWW:  http://sunset.usc.edu/~mattmann/
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>Adjunct Associate Professor, Computer Science Department
>University of Southern California, Los Angeles, CA 90089 USA
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>
>
>
>-----Original Message-----
>From: Etienne Koen <et...@scs-space.com>
>Date: Friday, September 12, 2014 12:15 AM
>To: Chris Mattmann <Ch...@jpl.nasa.gov>, "dev@oodt.apache.org"
><de...@oodt.apache.org>
>Cc: Shakeh Khudikyan <Sh...@jpl.nasa.gov>
>Subject: RE: PushPull
>
>>Hi Chris,
>>
>>Thank you for your response and info! I would be happy to document my
>>results and would appreciate it if the community could respond to some of
>>my questions I still have.
>>
>>At the moment it does not look like I have permissions or the
>>functionality to create a page... Or I am looking at the wrong place to
>>do so :-)
>>
>>My immediate question is whether pushpull have the parallel capability
>>such as GridFTP and how to specify it for the next test phase...
>>
>>Cheers
>>
>>Etienne Koen
>>Data Processing Systems Engineer
>>
>>Space Advisory Company
>>
>>O: +27 (21) 300 0060 I C: +27 (76) 661 0170 I E: etiennek@scs-space.com
>>
>>________________________________________
>>From: Mattmann, Chris A (3980) [chris.a.mattmann@jpl.nasa.gov]
>>Sent: Thursday, September 11, 2014 4:47 PM
>>To: dev@oodt.apache.org
>>Cc: Etienne Koen; Khudikyan, Shakeh E (398J)
>>Subject: FW: PushPull
>>
>>Etienne,
>>
>>
>>Thank you for sending this along! The crazy part about these types of
>>data
>>transfer studies especially with TCP/IP based protocols that aren't
>>parallelized
>>(e.g., FTP) is that you are limited by what's going on in the surrounding
>>network.
>>For example see the attached studies my team has published on data
>>movement
>>over the past 5-7 years and notice a similar type of behavior. Pretty
>>interesting
>>independent of the family of data transfer you're using.
>>
>>Take a look at my Dissertation too:
>>
>>http://sunset.usc.edu/~mattmann/Dissertation.pdf
>>
>>This concluded that parallel TCP/IP technologies like GridFTP (now
>>GlobusOnline)
>>and bbFTP performed the best across the public WAN for performance and
>>efficiency
>>related parameters, whereas if those aren't the overall properties you
>>are
>>trying
>>to maximize (and instead care about good enough performance, but with
>>ease
>>of
>>install and use - then things like WebDAV and so forth are probably good
>>enough).
>>
>>I'd be happy to discuss your results more in general. It would be great
>>if
>>you
>>created a wiki page here:
>>
>>https://cwiki.apache.org/confluence/display/OODT/Home
>>
>>
>>To document your testing and results. Thank you and let me know!
>>
>>Cheers,
>>Chris
>>
>>-----Original Message-----
>>From: Etienne Koen <et...@scs-space.com>
>>Date: Thursday, September 11, 2014 12:55 AM
>>To: Chris Mattmann <ch...@gmail.com>
>>Cc: Shakeh Khudikyan <Sh...@jpl.nasa.gov>
>>Subject: PushPull
>>
>>>Hi Chris and Shakeh,
>>>
>>>Attached are some of the results which were performed according to the
>>>baseline testing requirements. This was simply to transfer a directory
>>>of
>>>1GB with varying file sizes. For completeness I have gone so far as to
>>>transfer files of 1MB each (This scenario might not be very probable for
>>>SKA though...). I have noticed a substantiation drop in the transfer
>>>rate
>>>achieved compared to the 100MB files as well as the transfer rate being
>>>quite variable. What would be the main contributor for this? I see that
>>>there is a metadata file created for each transfer which might perhaps
>>>contribute to the overhead and become quite prominent in the 1000 x 1MB
>>>file case. All these tests used the FTP protocol and were performed on
>>>the same machine and network link:
>>>
>>>
>>>
>>>
>>>
>>>For testing single file transfer I found the maximum transfer rate only
>>>being achieved for files > 256 MB:
>>>
>>>
>>>
>>>
>>>I also monitored the transfer rate of a 8192 MB file which constantly
>>>revealed an interesting behaviour of achieving a maximum transfer rate
>>>after which the transfer rate then drops. I am also unsure what the
>>>cause
>>>for this might be as it happened constantly and in both transfer
>>>directions:
>>>
>>>
>>>
>>>I would greatly appreciate your comments on this and it include it in my
>>>report before I submit it during next week.
>>>
>>>All the best!
>>>
>>>Cheers
>>>Etienne
>>>
>>>
>>>
>>>
>>>Etienne Koen
>>>Data Processing Systems Engineer
>>>
>>>
>>>
>>>
>>>Space Advisory Company
>>>
>>>O: +27 (21) 300 0060 I C: +27 (76) 661 0170 I E: etiennek@scs-space.com
>>>
>>>
>>>
>>>
>>>
>>
>>
>>________________________________
>>
>>Disclaimer: This E-mail message, including any attachments, is intended
>>only for the person or entity to which it is addressed, and may contain
>>confidential information. Each page attached hereto must also be read in
>>conjunction with this disclaimer.
>>If you are not the intended recipient you are hereby notified that any
>>disclosure, copying, distribution or reliance upon the contents of this
>>e-mail is strictly prohibited. E.&O.E.
>>
>>________________________________
>>
>>Disclaimer: This E-mail message, including any attachments, is intended
>>only for the person or entity to which it is addressed, and may contain
>>confidential information. Each page attached hereto must also be read in
>>conjunction with this disclaimer.
>>If you are not the intended recipient you are hereby notified that any
>>disclosure, copying, distribution or reliance upon the contents of this
>>e-mail is strictly prohibited. E.&O.E.
>
>
>________________________________
>
>Disclaimer: This E-mail message, including any attachments, is intended
>only for the person or entity to which it is addressed, and may contain
>confidential information. Each page attached hereto must also be read in
>conjunction with this disclaimer.
>If you are not the intended recipient you are hereby notified that any
>disclosure, copying, distribution or reliance upon the contents of this
>e-mail is strictly prohibited. E.&O.E.
>
>________________________________
>
>Disclaimer: This E-mail message, including any attachments, is intended
>only for the person or entity to which it is addressed, and may contain
>confidential information. Each page attached hereto must also be read in
>conjunction with this disclaimer.
>If you are not the intended recipient you are hereby notified that any
>disclosure, copying, distribution or reliance upon the contents of this
>e-mail is strictly prohibited. E.&O.E.


RE: PushPull

Posted by Etienne Koen <et...@scs-space.com>.
Thanks for the information!

Please correct me if I am wrong, PushPull in it's default operation downloads files in parallel? Is there a way to specify any of the parallel parameters when downloading files? For example, thread number? Is there any way to have more control over the parallelism?

Thanks
Etienne

Etienne Koen
Data Processing Systems Engineer

Space Advisory Company

O: +27 (21) 300 0060 I C: +27 (76) 661 0170 I E: etiennek@scs-space.com

________________________________________
From: Mattmann, Chris A (3980) [chris.a.mattmann@jpl.nasa.gov]
Sent: Friday, September 12, 2014 4:18 PM
To: Etienne Koen; dev@oodt.apache.org
Cc: Khudikyan, Shakeh E (398J)
Subject: Re: PushPull

Hi Etienne,

Thanks for your question! Yes, PushPull has parallel downloading
capability, so in terms of "pulling" data it definitely has similar
capability to GridFTP. PushPull can't initiate or "push" a transfer
like GridFTP can in that sense, so it's not exactly an apples to
apples comparison.

For the wiki, you can sign up to create an account here:

https://cwiki.apache.org/confluence/signup.action

Cheers!

Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++






-----Original Message-----
From: Etienne Koen <et...@scs-space.com>
Date: Friday, September 12, 2014 12:15 AM
To: Chris Mattmann <Ch...@jpl.nasa.gov>, "dev@oodt.apache.org"
<de...@oodt.apache.org>
Cc: Shakeh Khudikyan <Sh...@jpl.nasa.gov>
Subject: RE: PushPull

>Hi Chris,
>
>Thank you for your response and info! I would be happy to document my
>results and would appreciate it if the community could respond to some of
>my questions I still have.
>
>At the moment it does not look like I have permissions or the
>functionality to create a page... Or I am looking at the wrong place to
>do so :-)
>
>My immediate question is whether pushpull have the parallel capability
>such as GridFTP and how to specify it for the next test phase...
>
>Cheers
>
>Etienne Koen
>Data Processing Systems Engineer
>
>Space Advisory Company
>
>O: +27 (21) 300 0060 I C: +27 (76) 661 0170 I E: etiennek@scs-space.com
>
>________________________________________
>From: Mattmann, Chris A (3980) [chris.a.mattmann@jpl.nasa.gov]
>Sent: Thursday, September 11, 2014 4:47 PM
>To: dev@oodt.apache.org
>Cc: Etienne Koen; Khudikyan, Shakeh E (398J)
>Subject: FW: PushPull
>
>Etienne,
>
>
>Thank you for sending this along! The crazy part about these types of data
>transfer studies especially with TCP/IP based protocols that aren't
>parallelized
>(e.g., FTP) is that you are limited by what's going on in the surrounding
>network.
>For example see the attached studies my team has published on data
>movement
>over the past 5-7 years and notice a similar type of behavior. Pretty
>interesting
>independent of the family of data transfer you're using.
>
>Take a look at my Dissertation too:
>
>http://sunset.usc.edu/~mattmann/Dissertation.pdf
>
>This concluded that parallel TCP/IP technologies like GridFTP (now
>GlobusOnline)
>and bbFTP performed the best across the public WAN for performance and
>efficiency
>related parameters, whereas if those aren't the overall properties you are
>trying
>to maximize (and instead care about good enough performance, but with ease
>of
>install and use - then things like WebDAV and so forth are probably good
>enough).
>
>I'd be happy to discuss your results more in general. It would be great if
>you
>created a wiki page here:
>
>https://cwiki.apache.org/confluence/display/OODT/Home
>
>
>To document your testing and results. Thank you and let me know!
>
>Cheers,
>Chris
>
>-----Original Message-----
>From: Etienne Koen <et...@scs-space.com>
>Date: Thursday, September 11, 2014 12:55 AM
>To: Chris Mattmann <ch...@gmail.com>
>Cc: Shakeh Khudikyan <Sh...@jpl.nasa.gov>
>Subject: PushPull
>
>>Hi Chris and Shakeh,
>>
>>Attached are some of the results which were performed according to the
>>baseline testing requirements. This was simply to transfer a directory of
>>1GB with varying file sizes. For completeness I have gone so far as to
>>transfer files of 1MB each (This scenario might not be very probable for
>>SKA though...). I have noticed a substantiation drop in the transfer rate
>>achieved compared to the 100MB files as well as the transfer rate being
>>quite variable. What would be the main contributor for this? I see that
>>there is a metadata file created for each transfer which might perhaps
>>contribute to the overhead and become quite prominent in the 1000 x 1MB
>>file case. All these tests used the FTP protocol and were performed on
>>the same machine and network link:
>>
>>
>>
>>
>>
>>For testing single file transfer I found the maximum transfer rate only
>>being achieved for files > 256 MB:
>>
>>
>>
>>
>>I also monitored the transfer rate of a 8192 MB file which constantly
>>revealed an interesting behaviour of achieving a maximum transfer rate
>>after which the transfer rate then drops. I am also unsure what the cause
>>for this might be as it happened constantly and in both transfer
>>directions:
>>
>>
>>
>>I would greatly appreciate your comments on this and it include it in my
>>report before I submit it during next week.
>>
>>All the best!
>>
>>Cheers
>>Etienne
>>
>>
>>
>>
>>Etienne Koen
>>Data Processing Systems Engineer
>>
>>
>>
>>
>>Space Advisory Company
>>
>>O: +27 (21) 300 0060 I C: +27 (76) 661 0170 I E: etiennek@scs-space.com
>>
>>
>>
>>
>>
>
>
>________________________________
>
>Disclaimer: This E-mail message, including any attachments, is intended
>only for the person or entity to which it is addressed, and may contain
>confidential information. Each page attached hereto must also be read in
>conjunction with this disclaimer.
>If you are not the intended recipient you are hereby notified that any
>disclosure, copying, distribution or reliance upon the contents of this
>e-mail is strictly prohibited. E.&O.E.
>
>________________________________
>
>Disclaimer: This E-mail message, including any attachments, is intended
>only for the person or entity to which it is addressed, and may contain
>confidential information. Each page attached hereto must also be read in
>conjunction with this disclaimer.
>If you are not the intended recipient you are hereby notified that any
>disclosure, copying, distribution or reliance upon the contents of this
>e-mail is strictly prohibited. E.&O.E.


________________________________

Disclaimer: This E-mail message, including any attachments, is intended only for the person or entity to which it is addressed, and may contain confidential information. Each page attached hereto must also be read in conjunction with this disclaimer.
If you are not the intended recipient you are hereby notified that any disclosure, copying, distribution or reliance upon the contents of this e-mail is strictly prohibited. E.&O.E.

________________________________

Disclaimer: This E-mail message, including any attachments, is intended only for the person or entity to which it is addressed, and may contain confidential information. Each page attached hereto must also be read in conjunction with this disclaimer.
If you are not the intended recipient you are hereby notified that any disclosure, copying, distribution or reliance upon the contents of this e-mail is strictly prohibited. E.&O.E.

Re: PushPull

Posted by "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov>.
Hi Etienne,

Thanks for your question! Yes, PushPull has parallel downloading
capability, so in terms of "pulling" data it definitely has similar
capability to GridFTP. PushPull can't initiate or "push" a transfer
like GridFTP can in that sense, so it's not exactly an apples to
apples comparison.

For the wiki, you can sign up to create an account here:

https://cwiki.apache.org/confluence/signup.action

Cheers!

Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++






-----Original Message-----
From: Etienne Koen <et...@scs-space.com>
Date: Friday, September 12, 2014 12:15 AM
To: Chris Mattmann <Ch...@jpl.nasa.gov>, "dev@oodt.apache.org"
<de...@oodt.apache.org>
Cc: Shakeh Khudikyan <Sh...@jpl.nasa.gov>
Subject: RE: PushPull

>Hi Chris,
>
>Thank you for your response and info! I would be happy to document my
>results and would appreciate it if the community could respond to some of
>my questions I still have.
>
>At the moment it does not look like I have permissions or the
>functionality to create a page... Or I am looking at the wrong place to
>do so :-)
>
>My immediate question is whether pushpull have the parallel capability
>such as GridFTP and how to specify it for the next test phase...
>
>Cheers
>
>Etienne Koen
>Data Processing Systems Engineer
>
>Space Advisory Company
>
>O: +27 (21) 300 0060 I C: +27 (76) 661 0170 I E: etiennek@scs-space.com
>
>________________________________________
>From: Mattmann, Chris A (3980) [chris.a.mattmann@jpl.nasa.gov]
>Sent: Thursday, September 11, 2014 4:47 PM
>To: dev@oodt.apache.org
>Cc: Etienne Koen; Khudikyan, Shakeh E (398J)
>Subject: FW: PushPull
>
>Etienne,
>
>
>Thank you for sending this along! The crazy part about these types of data
>transfer studies especially with TCP/IP based protocols that aren't
>parallelized
>(e.g., FTP) is that you are limited by what's going on in the surrounding
>network.
>For example see the attached studies my team has published on data
>movement
>over the past 5-7 years and notice a similar type of behavior. Pretty
>interesting
>independent of the family of data transfer you're using.
>
>Take a look at my Dissertation too:
>
>http://sunset.usc.edu/~mattmann/Dissertation.pdf
>
>This concluded that parallel TCP/IP technologies like GridFTP (now
>GlobusOnline)
>and bbFTP performed the best across the public WAN for performance and
>efficiency
>related parameters, whereas if those aren't the overall properties you are
>trying
>to maximize (and instead care about good enough performance, but with ease
>of
>install and use - then things like WebDAV and so forth are probably good
>enough).
>
>I'd be happy to discuss your results more in general. It would be great if
>you
>created a wiki page here:
>
>https://cwiki.apache.org/confluence/display/OODT/Home
>
>
>To document your testing and results. Thank you and let me know!
>
>Cheers,
>Chris
>
>-----Original Message-----
>From: Etienne Koen <et...@scs-space.com>
>Date: Thursday, September 11, 2014 12:55 AM
>To: Chris Mattmann <ch...@gmail.com>
>Cc: Shakeh Khudikyan <Sh...@jpl.nasa.gov>
>Subject: PushPull
>
>>Hi Chris and Shakeh,
>>
>>Attached are some of the results which were performed according to the
>>baseline testing requirements. This was simply to transfer a directory of
>>1GB with varying file sizes. For completeness I have gone so far as to
>>transfer files of 1MB each (This scenario might not be very probable for
>>SKA though...). I have noticed a substantiation drop in the transfer rate
>>achieved compared to the 100MB files as well as the transfer rate being
>>quite variable. What would be the main contributor for this? I see that
>>there is a metadata file created for each transfer which might perhaps
>>contribute to the overhead and become quite prominent in the 1000 x 1MB
>>file case. All these tests used the FTP protocol and were performed on
>>the same machine and network link:
>>
>>
>>
>>
>>
>>For testing single file transfer I found the maximum transfer rate only
>>being achieved for files > 256 MB:
>>
>>
>>
>>
>>I also monitored the transfer rate of a 8192 MB file which constantly
>>revealed an interesting behaviour of achieving a maximum transfer rate
>>after which the transfer rate then drops. I am also unsure what the cause
>>for this might be as it happened constantly and in both transfer
>>directions:
>>
>>
>>
>>I would greatly appreciate your comments on this and it include it in my
>>report before I submit it during next week.
>>
>>All the best!
>>
>>Cheers
>>Etienne
>>
>>
>>
>>
>>Etienne Koen
>>Data Processing Systems Engineer
>>
>>
>>
>>
>>Space Advisory Company
>>
>>O: +27 (21) 300 0060 I C: +27 (76) 661 0170 I E: etiennek@scs-space.com
>>
>>
>>
>>
>>
>
>
>________________________________
>
>Disclaimer: This E-mail message, including any attachments, is intended
>only for the person or entity to which it is addressed, and may contain
>confidential information. Each page attached hereto must also be read in
>conjunction with this disclaimer.
>If you are not the intended recipient you are hereby notified that any
>disclosure, copying, distribution or reliance upon the contents of this
>e-mail is strictly prohibited. E.&O.E.
>
>________________________________
>
>Disclaimer: This E-mail message, including any attachments, is intended
>only for the person or entity to which it is addressed, and may contain
>confidential information. Each page attached hereto must also be read in
>conjunction with this disclaimer.
>If you are not the intended recipient you are hereby notified that any
>disclosure, copying, distribution or reliance upon the contents of this
>e-mail is strictly prohibited. E.&O.E.