You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@reef.apache.org by "Julia (JIRA)" <ji...@apache.org> on 2016/01/20 23:15:40 UTC

[jira] [Created] (REEF-1143) Adding API to allow deserialize data from remote files directly

Julia created REEF-1143:
---------------------------

             Summary: Adding API to allow deserialize data from remote files directly
                 Key: REEF-1143
                 URL: https://issues.apache.org/jira/browse/REEF-1143
             Project: REEF
          Issue Type: New Feature
          Components: REEF-IO
            Reporter: Julia


Currently, Deserialize(string fileFolder) in IFileDeSerializer is used to deserialize localfiles in a given local file folder. For a set of remote files,  FileSystemInputPartition first download remote files to a local folder, then pass the folder to Deserialize(string fileFolder) method. 

For remote files, especially when file size is huge, we would need to read file data chuck by chuck and consume the data instead of downloading the entire file at once. As the remote file paths provided are in a set and the folder of the remote files are controlled at caller side and it may contain some other files, so we cannot just simply use the folder name, but remote file names instead. Therefor the new API for remote file deserialize would be 
 T Deserialize(ISet<string> filePaths);
 
This would end up two methods in IFileDeSerializer<T>:
 T Deserialize(string fileFolder);  -- for local file
 T Deserialize(ISet<string> filePaths); -- for remote file

It is clean. The only issue is the method name don't explain themselves for the usage. Another option is to make method name explicit:
T DeserializeLocalFIles(string fileFolder);  -- for local file
T DeserializeRemoteFIles(ISet<string> filePaths); -- for remote file

For second option, original Deserialize() API will be renamed, it is a breaking change, although I don't think anyone else is using it. 

Please comments. 






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)