You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@airavata.apache.org by "Marcus Christie (JIRA)" <ji...@apache.org> on 2017/07/26 14:10:00 UTC

[jira] [Created] (AIRAVATA-2492) File transfer slow from alamo.uthscsa.edu to uslims3.uthscsa.edu

Marcus Christie created AIRAVATA-2492:
-----------------------------------------

             Summary: File transfer slow from alamo.uthscsa.edu to uslims3.uthscsa.edu
                 Key: AIRAVATA-2492
                 URL: https://issues.apache.org/jira/browse/AIRAVATA-2492
             Project: Airavata
          Issue Type: Bug
            Reporter: Marcus Christie
            Assignee: Marcus Christie


Experiment ids in ultrascan.scigap.org:

US3-AIRA_f200b78e-5d86-499c-b2e6-c3b6fbb3bfa3
US3-AIRA_b69482ba-b34f-4809-9274-bcda71b8dccc

Hipchat discussion:
{quote}
[11:27 PM] Gary Gorbet: On Alamo, 20 minutes are more elapse from end of job until Airavata reports a job status of COMPLETE. Can't *something* be done to speed that up!
---- Wednesday July 26, 2017 ----
[9:31 AM] Eroma Abeysinghe: @gary sorry for this. Currently all experiments are completed, and two running in alamo. but we found two which took rather unusually long time to transfers the files. we are investigating. will keep you posted.
[9:51 AM] Marcus Christie: @gary here's what we've been able to confirm: gfac is getting the completed email from the scheduler and processing it immediately. Airavata doesn't mark an experiment as being completed until the output data staging completes. The output data staging for the jobs you gave us to look at took a long time to complete.
[9:52 AM] Marlon Pierce: You can test network speed in various ways (like https://askubuntu.com/questions/7976/how-do-you-test-the-network-speed-betwen-two-boxes, which I googled).
[9:53 AM] Marlon Pierce: It won't fix the problem, but it may help network admins diagnose the problem
[9:53 AM] Marlon Pierce: I'm assuming (maybe incorrectly) that this is an issue between Alamo and our servers
[9:54 AM] Marcus Christie: This is partly because transfer times from alamo to lims are happening very slowly, by my rough calculations at about 200kb/s.  But also the analysis-results.tar file is much larger in these experiments. One example we looked at it was about 200 MB and another was about 1.13 GB.  Looking through the logs back to April this is historically very large for an analysis-results.tar file.
[9:54 AM] Marlon Pierce: But we can confirm this by looking at other servers
[9:54 AM] Marcus Christie: @marlon this is between alamo and lims. Do we have a login on lims?
[9:56 AM] Marlon Pierce: Oh...where is the LIMS server? At UTHSCSA?
[9:57 AM] Marlon Pierce: I think it is.
[9:58 AM] Marcus Christie: @marlon uslims3.uthscsa.edu
[9:59 AM] Gary Gorbet: That other job completed. But now there is new, critical, job that FINISHED 40 minutes ago. Airavata job status is EXECUTING.
[9:59 AM] Marcus Christie: It would be nice if we could run a command line scp transfer test from alamo to uslims3 just to see if it is fundamentally limited or this is a slowness in GFac.
[10:00 AM] Gary Gorbet: That is easy to do. Will do a test and report.
[10:00 AM] Marcus Christie: Thanks @gary
[10:01 AM] Gary Gorbet: Transferred a 1.6M file from uslims3 to alamo in under a second.
[10:02 AM] Marcus Christie: Actually I realized that GFac will do the transfer from alamo to uslims3 via GFac. It's not a true third party transfer but rather the data is streamed through the GFac server.  So @marlon was right we need to test between alamo -> Gfac and GFac -> uslims3.
[10:03 AM] Gary Gorbet: This would be alamo/uslims3 to gw153; right?
[10:03 AM] Marcus Christie: Thanks @gary. So that's our theoretical upper bound.
[10:04 AM] Marcus Christie: @gary that's right
[10:08 AM] Sudhakar Pamidighantam: I think If it is reasonably secure GFac should do a true third party transfer to avoid this delay.
[10:08 AM] Gary Gorbet: The LIMS work directory for the job has all the output files. So, it seems to me that gw153 to uslims3 transfers completed. So, why the EXECUTING gfac status?
{quote}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)