You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@giraph.apache.org by "Ed Kohlwey (Created) (JIRA)" <ji...@apache.org> on 2011/12/20 21:27:30 UTC

[jira] [Created] (GIRAPH-111) Refactor I/O to be independent of Map/Reduce

Refactor I/O to be independent of Map/Reduce
--------------------------------------------

                 Key: GIRAPH-111
                 URL: https://issues.apache.org/jira/browse/GIRAPH-111
             Project: Giraph
          Issue Type: Improvement
          Components: graph
            Reporter: Ed Kohlwey


The I/O mechanisms should probably be abstracted entirely from Map/Reduce in order to support making Giraph an independent framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-111) Refactor I/O to be independent of Map/Reduce

Posted by "Ed Kohlwey (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/GIRAPH-111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177401#comment-13177401 ] 

Ed Kohlwey commented on GIRAPH-111:
-----------------------------------

After looking at the code over the last few weeks I think that I made some faulty assumptions that, once corrected, will allow a cleaner implementation of GIRAPH-108 and that would alleviate the need for this ticket. I'll update and close as I make progress.
                
> Refactor I/O to be independent of Map/Reduce
> --------------------------------------------
>
>                 Key: GIRAPH-111
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-111
>             Project: Giraph
>          Issue Type: Improvement
>          Components: graph
>            Reporter: Ed Kohlwey
>
> The I/O mechanisms should probably be abstracted entirely from Map/Reduce in order to support making Giraph an independent framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-111) Refactor I/O to be independent of Map/Reduce

Posted by "Avery Ching (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/GIRAPH-111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13173512#comment-13173512 ] 

Avery Ching commented on GIRAPH-111:
------------------------------------

I'm not clear on why this is necessary.  Couldn't we simply call the I/O methods as Hadoop would when we're not using Hadoop?  Am I missing something?
                
> Refactor I/O to be independent of Map/Reduce
> --------------------------------------------
>
>                 Key: GIRAPH-111
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-111
>             Project: Giraph
>          Issue Type: Improvement
>          Components: graph
>            Reporter: Ed Kohlwey
>
> The I/O mechanisms should probably be abstracted entirely from Map/Reduce in order to support making Giraph an independent framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-111) Refactor I/O to be independent of Map/Reduce

Posted by "Ed Kohlwey (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/GIRAPH-111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13174085#comment-13174085 ] 

Ed Kohlwey commented on GIRAPH-111:
-----------------------------------

Maybe I'm getting ahead of myself. I discovered the need for such a thing while working on my patch for GIRAPH-108, but others might have better ideas on how to address the problem.

The issue comes from the desire to create a giraph-specific context that can use delegation to either hook into the existing hadoop context/reporting system or report back to a context system that is specific to the resource allocator being used on the cluster (Mesos, YARN). Furthermore, this system should be backwards-compatible with (meaning, they should inherit from) Hadoop's system in order to instantiate InputFormats, etc. This is where the problem occurs.

Since Hadoop's io libraries use class-based inheritance rather than interfaces, you have to use the hack I reference in GIRAPH-108 (creating a wrapper class where every method is overrided) that uses a sort of nasty constructor with null arguments to trick the underlying superclass's code into being instantiated, when in reality you're really ignoring the superclass implementation entirely. You can check the code out in the patch - GraphMapper.HadoopContext and GraphProcess.Context.
                
> Refactor I/O to be independent of Map/Reduce
> --------------------------------------------
>
>                 Key: GIRAPH-111
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-111
>             Project: Giraph
>          Issue Type: Improvement
>          Components: graph
>            Reporter: Ed Kohlwey
>
> The I/O mechanisms should probably be abstracted entirely from Map/Reduce in order to support making Giraph an independent framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-111) Refactor I/O to be independent of Map/Reduce

Posted by "Jakob Homan (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/GIRAPH-111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13173537#comment-13173537 ] 

Jakob Homan commented on GIRAPH-111:
------------------------------------

bq. I'm not clear on why this is necessary.
I agree.  Hadoop's file formats, etc. are designed to be exceedingly forgiving and flexible as to the underlying storage mechanism.  Can you point to where they're lacking for Mesos' case?

bq. We could also copy out the relevant Hadoop I/O classes (InputFormat, OutputFormat, etc) into Giraph, rename their packages, and begin reworking them in an appropriate way to better suit Giraph.
-1.  Therein lies madness.

                
> Refactor I/O to be independent of Map/Reduce
> --------------------------------------------
>
>                 Key: GIRAPH-111
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-111
>             Project: Giraph
>          Issue Type: Improvement
>          Components: graph
>            Reporter: Ed Kohlwey
>
> The I/O mechanisms should probably be abstracted entirely from Map/Reduce in order to support making Giraph an independent framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-111) Refactor I/O to be independent of Map/Reduce

Posted by "Ed Kohlwey (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/GIRAPH-111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13173493#comment-13173493 ] 

Ed Kohlwey commented on GIRAPH-111:
-----------------------------------

I think there's a few ways to do this.

There's generalized parallel I/O libraries starting to appear, like HCatalog, so thats definitely one option. From what I can tell, HCatalog is primarily for tabular data though, so that may not make the most sense given Giraph's current focus on using Java objects in the regular Java class system to represent data.

We could also copy out the relevant Hadoop I/O classes (InputFormat, OutputFormat, etc) into Giraph, rename their packages, and begin reworking them in an appropriate way to better suit Giraph.

Finally, we could also just start designing an I/O package from scratch. I think this is probably the least incremental or pragmatic approach, so its probably not a fantastic option.
                
> Refactor I/O to be independent of Map/Reduce
> --------------------------------------------
>
>                 Key: GIRAPH-111
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-111
>             Project: Giraph
>          Issue Type: Improvement
>          Components: graph
>            Reporter: Ed Kohlwey
>
> The I/O mechanisms should probably be abstracted entirely from Map/Reduce in order to support making Giraph an independent framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-111) Refactor I/O to be independent of Map/Reduce

Posted by "Jakob Homan (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/GIRAPH-111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13174429#comment-13174429 ] 

Jakob Homan commented on GIRAPH-111:
------------------------------------

So this is about counters/reporting (ie, all the stuff that's being proxied by HadoopContext in GIRAPH-108)? Yeah, that will need to be handled in a more generic way once Giraph is less MR-centric.  GIRAPH-77 was actually intended to partially work on this by creating a central places for statistics to be handled (and exposed initially via a webpage) and then hooked into whatever the proper final destination was for whatever framework we're running on top of.  I'm afraid I'm still not sure how this brings in {Input|Output}Formats into the mix.  
                
> Refactor I/O to be independent of Map/Reduce
> --------------------------------------------
>
>                 Key: GIRAPH-111
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-111
>             Project: Giraph
>          Issue Type: Improvement
>          Components: graph
>            Reporter: Ed Kohlwey
>
> The I/O mechanisms should probably be abstracted entirely from Map/Reduce in order to support making Giraph an independent framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira