You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@airavata.apache.org by "Marcus Christie (JIRA)" <ji...@apache.org> on 2018/05/21 18:33:00 UTC

[jira] [Assigned] (AIRAVATA-2741) Ideas for better way to deal with arbitrary output files than ARCHIVE

     [ https://issues.apache.org/jira/browse/AIRAVATA-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marcus Christie reassigned AIRAVATA-2741:
-----------------------------------------

    Assignee: Dimuthu Upeksha  (was: Marcus Christie)

> Ideas for better way to deal with arbitrary output files than ARCHIVE
> ---------------------------------------------------------------------
>
>                 Key: AIRAVATA-2741
>                 URL: https://issues.apache.org/jira/browse/AIRAVATA-2741
>             Project: Airavata
>          Issue Type: Improvement
>            Reporter: Marcus Christie
>            Assignee: Dimuthu Upeksha
>            Priority: Major
>
> Just want to capture some details of recent conversations with [~eroma_a] and [~spamidig] on how to improve Airavata capabilities so we can move beyond using ARCHIVE.  The ARCHIVE capability is a bit of a hack and causes some issues for us. Just briefly, here are some of the problems:
> * pulls back absolutely every file but some aren't needed and some intermediate files are very large. For some applications it isn't even practical to use ARCHIVE
> * pulls back duplicates of Application Output files, further filling gateway data storage
> * these files are basically opaque to Airavata, so there is a limit on what can be done in a programmatic way for some of these files
> Here are some potential improvements:
> * improve wildcard support: allow specifying a wildcard that can match a single or multiple files. For multiple files these can all be registered as a URI_COLLECTION type data output. (Side note: I'm not sure what all is currently supported with the wildcard support, need to investigate)
> * Show all of the job files in the portal, including ones that aren't defined as Application Outputs and haven't actually been staged back to the portal, and allow the user to request pulling back one of these other files. This would be nice because there are certainly going to be cases where a file is generated that wasn't anticipated (either lack of configuration or just something truly not anticipatable). Would mean needing to register every file in the job directory, not just the Application Outputs (not sure where, replica catalog?). Would also mean we need backend task execution support for fetching these files as needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)