You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@giraph.apache.org by "Nitay Joffe (JIRA)" <ji...@apache.org> on 2013/01/19 07:10:12 UTC

[jira] [Created] (GIRAPH-483) InputSplit needs to be Writable

Nitay Joffe created GIRAPH-483:
----------------------------------

             Summary: InputSplit needs to be Writable
                 Key: GIRAPH-483
                 URL: https://issues.apache.org/jira/browse/GIRAPH-483
             Project: Giraph
          Issue Type: Improvement
            Reporter: Nitay Joffe
            Priority: Minor


Working on Hive I/O recently I found this out the hard way...
We use InputSplit in Giraph in order to make things work easily with Hadoop. However our usage of the interface is not actually consistent. Specifically, in InputSplitsCallable#getInputSplit we have the following:

  ((Writable) inputSplit).readFields(inputStream);

This means our InputSplit has to be Writable. If it's not (as mine wasn't initially when implementing a new input format) things break badly. For a simple start we should at least put some instanceof check around that cast and an informative error message.

Furthermore, looking deeper into it I noticed we don't actually ever use the getLength() method in InputSplit, just getLocations(). So really the "right" way to have things IMO is to have our own GiraphInputSplit interface, which extends Writable, and has the getLocations() method.

Doing this is tricky though as it will likely break existing I/O formats, so will require some care...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira