You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@giraph.apache.org by "Jaeho Shin (JIRA)" <ji...@apache.org> on 2012/08/02 08:14:02 UTC

[jira] [Created] (GIRAPH-277) Text Vertex Input/Output Format base classes overhaul

Jaeho Shin created GIRAPH-277:
---------------------------------

             Summary: Text Vertex Input/Output Format base classes overhaul
                 Key: GIRAPH-277
                 URL: https://issues.apache.org/jira/browse/GIRAPH-277
             Project: Giraph
          Issue Type: Improvement
          Components: examples, lib
            Reporter: Jaeho Shin


The current way of implementing {{VertexInputFormat}} and {{VertexReader}} had bad smell.  It required users to understand how these two classes are glued together, and forced similar codes to be duplicated in every new input format.  (Similarly for the VertexOutputFormat and VertexWriter.)  Anyone who wants to create a new format should create an underlying record reader or writer at the right moment and delegate some calls to it, which seemed unnecessary detail being exposed.  Besides, type parameters had to appear all over every new format code, which was extremely annoying for both reading existing code and writing a new one.  I was very frustrated writing my first format code especially when I compared it to writing a new vertex code.  I thought writing a new input/output format should be as simple as vertex.

So, I have refactored {{TextVertexInputFormat}} and {{OutputFormat}} into new forms that have no difference in their interfaces, but remove a lot of burden for subclassing.  Instead of providing static VertexReader base classes, I made it a non-static inner-class of its format class, which helps eliminate the repeated code for gluing these two, already tightly coupled classes.  This has additional advantage of eliminating all the Generics type variables on the VertexReader side, which makes overall code much more concise.  I added several useful TextVertexReader base classes that can save efforts for implementing line-oriented formats.

Please comment if you see my proposed change have any impact on other aspects.  I'm unsure of how these additional layers of abstraction could affect performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (GIRAPH-277) Text Vertex Input/Output Format base classes overhaul

Posted by "Nitay Joffe (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/GIRAPH-277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nitay Joffe updated GIRAPH-277:
-------------------------------

    Attachment: GIRAPH-277-5.patch

Ah nice good catch. Fixed.
                
> Text Vertex Input/Output Format base classes overhaul
> -----------------------------------------------------
>
>                 Key: GIRAPH-277
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-277
>             Project: Giraph
>          Issue Type: Improvement
>          Components: examples, lib
>            Reporter: Jaeho Shin
>            Assignee: Nitay Joffe
>         Attachments: GIRAPH-227-4.patch, GIRAPH-277-2.patch, GIRAPH-277-3.patch, GIRAPH-277-5.patch, GIRAPH-277.patch
>
>
> The current way of implementing {{VertexInputFormat}} and {{VertexReader}} had bad smell.  It required users to understand how these two classes are glued together, and forced similar codes to be duplicated in every new input format.  (Similarly for the VertexOutputFormat and VertexWriter.)  Anyone who wants to create a new format should create an underlying record reader or writer at the right moment and delegate some calls to it, which seemed unnecessary detail being exposed.  Besides, type parameters had to appear all over every new format code, which was extremely annoying for both reading existing code and writing a new one.  I was very frustrated writing my first format code especially when I compared it to writing a new vertex code.  I thought writing a new input/output format should be as simple as vertex.
> So, I have refactored {{TextVertexInputFormat}} and {{OutputFormat}} into new forms that have no difference in their interfaces, but remove a lot of burden for subclassing.  Instead of providing static VertexReader base classes, I made it a non-static inner-class of its format class, which helps eliminate the repeated code for gluing these two, already tightly coupled classes.  This has additional advantage of eliminating all the Generics type variables on the VertexReader side, which makes overall code much more concise.  I added several useful TextVertexReader base classes that can save efforts for implementing line-oriented formats.
> Please comment if you see my proposed change have any impact on other aspects.  I'm unsure of how these additional layers of abstraction could affect performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-277) Text Vertex Input/Output Format base classes overhaul

Posted by "Eli Reisman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427702#comment-13427702 ] 

Eli Reisman commented on GIRAPH-277:
------------------------------------

Oh, and anyone who removes lines from the checkstyle file like this has my respect and esteem. :)
                
> Text Vertex Input/Output Format base classes overhaul
> -----------------------------------------------------
>
>                 Key: GIRAPH-277
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-277
>             Project: Giraph
>          Issue Type: Improvement
>          Components: examples, lib
>            Reporter: Jaeho Shin
>         Attachments: GIRAPH-277.patch
>
>
> The current way of implementing {{VertexInputFormat}} and {{VertexReader}} had bad smell.  It required users to understand how these two classes are glued together, and forced similar codes to be duplicated in every new input format.  (Similarly for the VertexOutputFormat and VertexWriter.)  Anyone who wants to create a new format should create an underlying record reader or writer at the right moment and delegate some calls to it, which seemed unnecessary detail being exposed.  Besides, type parameters had to appear all over every new format code, which was extremely annoying for both reading existing code and writing a new one.  I was very frustrated writing my first format code especially when I compared it to writing a new vertex code.  I thought writing a new input/output format should be as simple as vertex.
> So, I have refactored {{TextVertexInputFormat}} and {{OutputFormat}} into new forms that have no difference in their interfaces, but remove a lot of burden for subclassing.  Instead of providing static VertexReader base classes, I made it a non-static inner-class of its format class, which helps eliminate the repeated code for gluing these two, already tightly coupled classes.  This has additional advantage of eliminating all the Generics type variables on the VertexReader side, which makes overall code much more concise.  I added several useful TextVertexReader base classes that can save efforts for implementing line-oriented formats.
> Please comment if you see my proposed change have any impact on other aspects.  I'm unsure of how these additional layers of abstraction could affect performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-277) Text Vertex Input/Output Format base classes overhaul

Posted by "Avery Ching (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13459371#comment-13459371 ] 

Avery Ching commented on GIRAPH-277:
------------------------------------

Nitay, I think we're almost there on this. 

1) Just need to address the 'XXX' comments from the review

2) 1 checkstyle error, see emacs target/munged/checkstyle-result.xml

<file name="/Users/aching/Avery/source/giraph_trunk/src/main/java/org/apache/giraph/io/TextVertexInputFormat.java">
<error line="91" column="13" severity="error" message="Unclosed HTML tag found: &lt;I&gt; Vertex index value" source="com.puppycrawl.tools.checkstyle.ch\
ecks.javadoc.JavadocStyleCheck"/>

This might be related to the checkstyle.xml changes since the line looks fine to me.
                
> Text Vertex Input/Output Format base classes overhaul
> -----------------------------------------------------
>
>                 Key: GIRAPH-277
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-277
>             Project: Giraph
>          Issue Type: Improvement
>          Components: examples, lib
>            Reporter: Jaeho Shin
>            Assignee: Nitay Joffe
>         Attachments: GIRAPH-277-2.patch, GIRAPH-277-3.patch, GIRAPH-277.patch
>
>
> The current way of implementing {{VertexInputFormat}} and {{VertexReader}} had bad smell.  It required users to understand how these two classes are glued together, and forced similar codes to be duplicated in every new input format.  (Similarly for the VertexOutputFormat and VertexWriter.)  Anyone who wants to create a new format should create an underlying record reader or writer at the right moment and delegate some calls to it, which seemed unnecessary detail being exposed.  Besides, type parameters had to appear all over every new format code, which was extremely annoying for both reading existing code and writing a new one.  I was very frustrated writing my first format code especially when I compared it to writing a new vertex code.  I thought writing a new input/output format should be as simple as vertex.
> So, I have refactored {{TextVertexInputFormat}} and {{OutputFormat}} into new forms that have no difference in their interfaces, but remove a lot of burden for subclassing.  Instead of providing static VertexReader base classes, I made it a non-static inner-class of its format class, which helps eliminate the repeated code for gluing these two, already tightly coupled classes.  This has additional advantage of eliminating all the Generics type variables on the VertexReader side, which makes overall code much more concise.  I added several useful TextVertexReader base classes that can save efforts for implementing line-oriented formats.
> Please comment if you see my proposed change have any impact on other aspects.  I'm unsure of how these additional layers of abstraction could affect performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-277) Text Vertex Input/Output Format base classes overhaul

Posted by "Jaeho Shin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429358#comment-13429358 ] 

Jaeho Shin commented on GIRAPH-277:
-----------------------------------

Thanks, Jakob.  I also found it simple to just keep another svn checkout and generate correct patches there.  It seems rb wants the diff to have {{Index: pathname}} followed by {{======...}} lines.  Adding those revision suffixes was no use.

Looking at my patch from reviewboard, I realized I made a lot of reordered imports and some whitespace changes.  Let me polish them and update the diff before anyone really dives deep into it.  Is there a place where our import order is documented?  Maybe I'll dig up from the mailing list.

Re: how-to-contribute, I'll let you know when I update the wiki.  I think I want to update the wiki a little later maybe after a few more iterations like this, or as I contribute the archetype work for new users.
                
> Text Vertex Input/Output Format base classes overhaul
> -----------------------------------------------------
>
>                 Key: GIRAPH-277
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-277
>             Project: Giraph
>          Issue Type: Improvement
>          Components: examples, lib
>            Reporter: Jaeho Shin
>         Attachments: GIRAPH-277.patch
>
>
> The current way of implementing {{VertexInputFormat}} and {{VertexReader}} had bad smell.  It required users to understand how these two classes are glued together, and forced similar codes to be duplicated in every new input format.  (Similarly for the VertexOutputFormat and VertexWriter.)  Anyone who wants to create a new format should create an underlying record reader or writer at the right moment and delegate some calls to it, which seemed unnecessary detail being exposed.  Besides, type parameters had to appear all over every new format code, which was extremely annoying for both reading existing code and writing a new one.  I was very frustrated writing my first format code especially when I compared it to writing a new vertex code.  I thought writing a new input/output format should be as simple as vertex.
> So, I have refactored {{TextVertexInputFormat}} and {{OutputFormat}} into new forms that have no difference in their interfaces, but remove a lot of burden for subclassing.  Instead of providing static VertexReader base classes, I made it a non-static inner-class of its format class, which helps eliminate the repeated code for gluing these two, already tightly coupled classes.  This has additional advantage of eliminating all the Generics type variables on the VertexReader side, which makes overall code much more concise.  I added several useful TextVertexReader base classes that can save efforts for implementing line-oriented formats.
> Please comment if you see my proposed change have any impact on other aspects.  I'm unsure of how these additional layers of abstraction could affect performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-277) Text Vertex Input/Output Format base classes overhaul

Posted by "Jaeho Shin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429323#comment-13429323 ] 

Jaeho Shin commented on GIRAPH-277:
-----------------------------------

I am trying with {{--no-prefix}}, {{--full-index}} (as mentioned in [reviewboard Issue 1918|http://code.google.com/p/reviewboard/issues/detail?id=1918]) and post-review with git svn, but all gives me the same error.  I think either my setup is wrong (maybe git being to recent? 1.7.11) or reviewboard simply wants a diff that looks exactly like one generated by svn.  I'll try adding {{(revision ...)}} and {{(working copy)}} manually to the pathname lines.  Git's diff only has pathnames on those lines.

BTW, when you mentioned the how-to-contribute section, I thought you were talking about [the wiki page|https://cwiki.apache.org/confluence/display/GIRAPH/How+to+Contribute] and was puzzled.  The {{--no-prefix}} was only described at the end of [Giraph's website|http://giraph.apache.org].  The wiki contains a lot of outdated instructions from the incubator age, e.g., base repo url.  I believe these duplicate/outdated docs should be cleaned up before they confuse too many new users or developers.  Do we prefer to put more on the wiki or the web site?
                
> Text Vertex Input/Output Format base classes overhaul
> -----------------------------------------------------
>
>                 Key: GIRAPH-277
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-277
>             Project: Giraph
>          Issue Type: Improvement
>          Components: examples, lib
>            Reporter: Jaeho Shin
>         Attachments: GIRAPH-277.patch
>
>
> The current way of implementing {{VertexInputFormat}} and {{VertexReader}} had bad smell.  It required users to understand how these two classes are glued together, and forced similar codes to be duplicated in every new input format.  (Similarly for the VertexOutputFormat and VertexWriter.)  Anyone who wants to create a new format should create an underlying record reader or writer at the right moment and delegate some calls to it, which seemed unnecessary detail being exposed.  Besides, type parameters had to appear all over every new format code, which was extremely annoying for both reading existing code and writing a new one.  I was very frustrated writing my first format code especially when I compared it to writing a new vertex code.  I thought writing a new input/output format should be as simple as vertex.
> So, I have refactored {{TextVertexInputFormat}} and {{OutputFormat}} into new forms that have no difference in their interfaces, but remove a lot of burden for subclassing.  Instead of providing static VertexReader base classes, I made it a non-static inner-class of its format class, which helps eliminate the repeated code for gluing these two, already tightly coupled classes.  This has additional advantage of eliminating all the Generics type variables on the VertexReader side, which makes overall code much more concise.  I added several useful TextVertexReader base classes that can save efforts for implementing line-oriented formats.
> Please comment if you see my proposed change have any impact on other aspects.  I'm unsure of how these additional layers of abstraction could affect performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-277) Text Vertex Input/Output Format base classes overhaul

Posted by "Eli Reisman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427697#comment-13427697 ] 

Eli Reisman commented on GIRAPH-277:
------------------------------------

That was a lot of work, good job. This does need to be made easier, I agree. As far as the preprocessing, it seems a bit over engineered as it just pushes the parsing work to another thing the user has to implement, and they still have to place those values, so the net difficulty does not really change for the IO format author. Adding more generics to this already loaded-up area of the code does not seem to welcome in the new users either.

The new thinking on IO formats as it was explained to me is that developers should be providing lots of IO formats, and that users should simply specify an appropriate one at the command line with -if -of. The benchmark code is the "old way", the examples/ code is more representative of what I was told we want now, which is you specify IO at the command line, write your vertex class, and you're done. So if the IO code seems a bit cryptic the way it is (I agree!) its also not meant to be user-facing anymore, at least not for new users. This also means a really useful upgrade (some are in the works now) would be to add support for more IO formats (Avro etc.) and more strongly typed subclasses for the ones we have, thereby presenting the user with lots of reasons not to write their own IO formats at all. This "new thinking" has also not been properly documented on the site/wiki.

That said, great work and as long as the record reader plumbing coming out of Hadoop is properly tested and still happy in these IO formats (have you run jobs with this patch and real chunk of sample data or just unit tests on mvn?) then this looks great. I'd be happier without the custom exception and preprocessing stuff, but if others find this useful then good enough for me.

                
> Text Vertex Input/Output Format base classes overhaul
> -----------------------------------------------------
>
>                 Key: GIRAPH-277
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-277
>             Project: Giraph
>          Issue Type: Improvement
>          Components: examples, lib
>            Reporter: Jaeho Shin
>         Attachments: GIRAPH-277.patch
>
>
> The current way of implementing {{VertexInputFormat}} and {{VertexReader}} had bad smell.  It required users to understand how these two classes are glued together, and forced similar codes to be duplicated in every new input format.  (Similarly for the VertexOutputFormat and VertexWriter.)  Anyone who wants to create a new format should create an underlying record reader or writer at the right moment and delegate some calls to it, which seemed unnecessary detail being exposed.  Besides, type parameters had to appear all over every new format code, which was extremely annoying for both reading existing code and writing a new one.  I was very frustrated writing my first format code especially when I compared it to writing a new vertex code.  I thought writing a new input/output format should be as simple as vertex.
> So, I have refactored {{TextVertexInputFormat}} and {{OutputFormat}} into new forms that have no difference in their interfaces, but remove a lot of burden for subclassing.  Instead of providing static VertexReader base classes, I made it a non-static inner-class of its format class, which helps eliminate the repeated code for gluing these two, already tightly coupled classes.  This has additional advantage of eliminating all the Generics type variables on the VertexReader side, which makes overall code much more concise.  I added several useful TextVertexReader base classes that can save efforts for implementing line-oriented formats.
> Please comment if you see my proposed change have any impact on other aspects.  I'm unsure of how these additional layers of abstraction could affect performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-277) Text Vertex Input/Output Format base classes overhaul

Posted by "Jakob Homan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13428455#comment-13428455 ] 

Jakob Homan commented on GIRAPH-277:
------------------------------------

@Jaeho, please open a reviewboard request for the patch.  Thanks.
                
> Text Vertex Input/Output Format base classes overhaul
> -----------------------------------------------------
>
>                 Key: GIRAPH-277
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-277
>             Project: Giraph
>          Issue Type: Improvement
>          Components: examples, lib
>            Reporter: Jaeho Shin
>         Attachments: GIRAPH-277.patch
>
>
> The current way of implementing {{VertexInputFormat}} and {{VertexReader}} had bad smell.  It required users to understand how these two classes are glued together, and forced similar codes to be duplicated in every new input format.  (Similarly for the VertexOutputFormat and VertexWriter.)  Anyone who wants to create a new format should create an underlying record reader or writer at the right moment and delegate some calls to it, which seemed unnecessary detail being exposed.  Besides, type parameters had to appear all over every new format code, which was extremely annoying for both reading existing code and writing a new one.  I was very frustrated writing my first format code especially when I compared it to writing a new vertex code.  I thought writing a new input/output format should be as simple as vertex.
> So, I have refactored {{TextVertexInputFormat}} and {{OutputFormat}} into new forms that have no difference in their interfaces, but remove a lot of burden for subclassing.  Instead of providing static VertexReader base classes, I made it a non-static inner-class of its format class, which helps eliminate the repeated code for gluing these two, already tightly coupled classes.  This has additional advantage of eliminating all the Generics type variables on the VertexReader side, which makes overall code much more concise.  I added several useful TextVertexReader base classes that can save efforts for implementing line-oriented formats.
> Please comment if you see my proposed change have any impact on other aspects.  I'm unsure of how these additional layers of abstraction could affect performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (GIRAPH-277) Text Vertex Input/Output Format base classes overhaul

Posted by "Nitay Joffe (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/GIRAPH-277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nitay Joffe updated GIRAPH-277:
-------------------------------

    Attachment: GIRAPH-227-4.patch

Clean up XXX
                
> Text Vertex Input/Output Format base classes overhaul
> -----------------------------------------------------
>
>                 Key: GIRAPH-277
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-277
>             Project: Giraph
>          Issue Type: Improvement
>          Components: examples, lib
>            Reporter: Jaeho Shin
>            Assignee: Nitay Joffe
>         Attachments: GIRAPH-227-4.patch, GIRAPH-277-2.patch, GIRAPH-277-3.patch, GIRAPH-277.patch
>
>
> The current way of implementing {{VertexInputFormat}} and {{VertexReader}} had bad smell.  It required users to understand how these two classes are glued together, and forced similar codes to be duplicated in every new input format.  (Similarly for the VertexOutputFormat and VertexWriter.)  Anyone who wants to create a new format should create an underlying record reader or writer at the right moment and delegate some calls to it, which seemed unnecessary detail being exposed.  Besides, type parameters had to appear all over every new format code, which was extremely annoying for both reading existing code and writing a new one.  I was very frustrated writing my first format code especially when I compared it to writing a new vertex code.  I thought writing a new input/output format should be as simple as vertex.
> So, I have refactored {{TextVertexInputFormat}} and {{OutputFormat}} into new forms that have no difference in their interfaces, but remove a lot of burden for subclassing.  Instead of providing static VertexReader base classes, I made it a non-static inner-class of its format class, which helps eliminate the repeated code for gluing these two, already tightly coupled classes.  This has additional advantage of eliminating all the Generics type variables on the VertexReader side, which makes overall code much more concise.  I added several useful TextVertexReader base classes that can save efforts for implementing line-oriented formats.
> Please comment if you see my proposed change have any impact on other aspects.  I'm unsure of how these additional layers of abstraction could affect performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (GIRAPH-277) Text Vertex Input/Output Format base classes overhaul

Posted by "Avery Ching (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/GIRAPH-277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Avery Ching reassigned GIRAPH-277:
----------------------------------

    Assignee: Jaeho Shin
    
> Text Vertex Input/Output Format base classes overhaul
> -----------------------------------------------------
>
>                 Key: GIRAPH-277
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-277
>             Project: Giraph
>          Issue Type: Improvement
>          Components: examples, lib
>            Reporter: Jaeho Shin
>            Assignee: Jaeho Shin
>         Attachments: GIRAPH-277-2.patch, GIRAPH-277.patch
>
>
> The current way of implementing {{VertexInputFormat}} and {{VertexReader}} had bad smell.  It required users to understand how these two classes are glued together, and forced similar codes to be duplicated in every new input format.  (Similarly for the VertexOutputFormat and VertexWriter.)  Anyone who wants to create a new format should create an underlying record reader or writer at the right moment and delegate some calls to it, which seemed unnecessary detail being exposed.  Besides, type parameters had to appear all over every new format code, which was extremely annoying for both reading existing code and writing a new one.  I was very frustrated writing my first format code especially when I compared it to writing a new vertex code.  I thought writing a new input/output format should be as simple as vertex.
> So, I have refactored {{TextVertexInputFormat}} and {{OutputFormat}} into new forms that have no difference in their interfaces, but remove a lot of burden for subclassing.  Instead of providing static VertexReader base classes, I made it a non-static inner-class of its format class, which helps eliminate the repeated code for gluing these two, already tightly coupled classes.  This has additional advantage of eliminating all the Generics type variables on the VertexReader side, which makes overall code much more concise.  I added several useful TextVertexReader base classes that can save efforts for implementing line-oriented formats.
> Please comment if you see my proposed change have any impact on other aspects.  I'm unsure of how these additional layers of abstraction could affect performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (GIRAPH-277) Text Vertex Input/Output Format base classes overhaul

Posted by "Nitay Joffe (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/GIRAPH-277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nitay Joffe updated GIRAPH-277:
-------------------------------

    Attachment: GIRAPH-277-3.patch

Taking over from Jaeho. Here's the latest patch, rebased to trunk, with comments from previous review at https://reviews.apache.org/r/6402/

I've uploaded a new review here: https://reviews.apache.org/r/7181/
                
> Text Vertex Input/Output Format base classes overhaul
> -----------------------------------------------------
>
>                 Key: GIRAPH-277
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-277
>             Project: Giraph
>          Issue Type: Improvement
>          Components: examples, lib
>            Reporter: Jaeho Shin
>            Assignee: Nitay Joffe
>         Attachments: GIRAPH-277-2.patch, GIRAPH-277-3.patch, GIRAPH-277.patch
>
>
> The current way of implementing {{VertexInputFormat}} and {{VertexReader}} had bad smell.  It required users to understand how these two classes are glued together, and forced similar codes to be duplicated in every new input format.  (Similarly for the VertexOutputFormat and VertexWriter.)  Anyone who wants to create a new format should create an underlying record reader or writer at the right moment and delegate some calls to it, which seemed unnecessary detail being exposed.  Besides, type parameters had to appear all over every new format code, which was extremely annoying for both reading existing code and writing a new one.  I was very frustrated writing my first format code especially when I compared it to writing a new vertex code.  I thought writing a new input/output format should be as simple as vertex.
> So, I have refactored {{TextVertexInputFormat}} and {{OutputFormat}} into new forms that have no difference in their interfaces, but remove a lot of burden for subclassing.  Instead of providing static VertexReader base classes, I made it a non-static inner-class of its format class, which helps eliminate the repeated code for gluing these two, already tightly coupled classes.  This has additional advantage of eliminating all the Generics type variables on the VertexReader side, which makes overall code much more concise.  I added several useful TextVertexReader base classes that can save efforts for implementing line-oriented formats.
> Please comment if you see my proposed change have any impact on other aspects.  I'm unsure of how these additional layers of abstraction could affect performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-277) Text Vertex Input/Output Format base classes overhaul

Posted by "Jaeho Shin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427736#comment-13427736 ] 

Jaeho Shin commented on GIRAPH-277:
-----------------------------------

Thanks Eli.

It would be great to entirely eliminate the need of writing I/O code and let users pick from a large collection, but I think that is unlikely if complex data structure is used in the input graph.  We can easily exhaust the possible combinations of int, double, Text I/O formats, but Map and List were the types of values and messages mostly in my cases.  I admit some parts could look over engineered, and don't really reduce the net amount of code, but the big picture is clearly to make the blanks in the code obvious for users and only ask them to implement what they really need to.

I didn't have chance to test these text formats on large scale, but I'm using identically structured format codes for reading and writing Hive tables of >100M rows, ~15GB without problem.  The holes user needs to fill in this case are how to get vertex id, value and edge maps from a particular row in the table, instead of a single text line.

About the checkstyle xml, I was reluctant to make the change because it might encourage us to abuse suppress comments, but thought it was also silly to blow up the xml itself. I'm in no way a checkstyle expert, and just found [good hints|http://stackoverflow.com/a/4023351/390044] from the web :)
                
> Text Vertex Input/Output Format base classes overhaul
> -----------------------------------------------------
>
>                 Key: GIRAPH-277
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-277
>             Project: Giraph
>          Issue Type: Improvement
>          Components: examples, lib
>            Reporter: Jaeho Shin
>         Attachments: GIRAPH-277.patch
>
>
> The current way of implementing {{VertexInputFormat}} and {{VertexReader}} had bad smell.  It required users to understand how these two classes are glued together, and forced similar codes to be duplicated in every new input format.  (Similarly for the VertexOutputFormat and VertexWriter.)  Anyone who wants to create a new format should create an underlying record reader or writer at the right moment and delegate some calls to it, which seemed unnecessary detail being exposed.  Besides, type parameters had to appear all over every new format code, which was extremely annoying for both reading existing code and writing a new one.  I was very frustrated writing my first format code especially when I compared it to writing a new vertex code.  I thought writing a new input/output format should be as simple as vertex.
> So, I have refactored {{TextVertexInputFormat}} and {{OutputFormat}} into new forms that have no difference in their interfaces, but remove a lot of burden for subclassing.  Instead of providing static VertexReader base classes, I made it a non-static inner-class of its format class, which helps eliminate the repeated code for gluing these two, already tightly coupled classes.  This has additional advantage of eliminating all the Generics type variables on the VertexReader side, which makes overall code much more concise.  I added several useful TextVertexReader base classes that can save efforts for implementing line-oriented formats.
> Please comment if you see my proposed change have any impact on other aspects.  I'm unsure of how these additional layers of abstraction could affect performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (GIRAPH-277) Text Vertex Input/Output Format base classes overhaul

Posted by "Jaeho Shin (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/GIRAPH-277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jaeho Shin updated GIRAPH-277:
------------------------------

    Attachment: GIRAPH-277-2.patch

Ok, here's a new patch without the noises.

The correct reviewboard link: https://reviews.apache.org/r/6402/
(Jakob, sorry I couldn't update diff to the above, so created a new one)

In addition to minor fixes, there's a crucial one line change to the signature of {{TextVertexWriterToEachLine#writeVertex}}.
Using wildcard type here works fine when compiled together as source at the same time.  But if I try to extend it from my own project depending on Giraph classes as jar, then I get strange compilation error saying I'm not implementing {{writeVertex(Vertex<I, V, E, ?>)}} even though it's already there on the superclass even marked final.  Removing the generics type variables solves this problem so included the fix in this patch.  Maybe this is another thing we should watch out when separating message types from I/O formats.
                
> Text Vertex Input/Output Format base classes overhaul
> -----------------------------------------------------
>
>                 Key: GIRAPH-277
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-277
>             Project: Giraph
>          Issue Type: Improvement
>          Components: examples, lib
>            Reporter: Jaeho Shin
>         Attachments: GIRAPH-277-2.patch, GIRAPH-277.patch
>
>
> The current way of implementing {{VertexInputFormat}} and {{VertexReader}} had bad smell.  It required users to understand how these two classes are glued together, and forced similar codes to be duplicated in every new input format.  (Similarly for the VertexOutputFormat and VertexWriter.)  Anyone who wants to create a new format should create an underlying record reader or writer at the right moment and delegate some calls to it, which seemed unnecessary detail being exposed.  Besides, type parameters had to appear all over every new format code, which was extremely annoying for both reading existing code and writing a new one.  I was very frustrated writing my first format code especially when I compared it to writing a new vertex code.  I thought writing a new input/output format should be as simple as vertex.
> So, I have refactored {{TextVertexInputFormat}} and {{OutputFormat}} into new forms that have no difference in their interfaces, but remove a lot of burden for subclassing.  Instead of providing static VertexReader base classes, I made it a non-static inner-class of its format class, which helps eliminate the repeated code for gluing these two, already tightly coupled classes.  This has additional advantage of eliminating all the Generics type variables on the VertexReader side, which makes overall code much more concise.  I added several useful TextVertexReader base classes that can save efforts for implementing line-oriented formats.
> Please comment if you see my proposed change have any impact on other aspects.  I'm unsure of how these additional layers of abstraction could affect performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-277) Text Vertex Input/Output Format base classes overhaul

Posted by "Avery Ching (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13459390#comment-13459390 ] 

Avery Ching commented on GIRAPH-277:
------------------------------------

I don't know why it's not happening in the normal build either.  mvn package also fails in the giraph-formats-contrib =(.  Well, since it's not getting worse, once you address 1), let's get this in.
                
> Text Vertex Input/Output Format base classes overhaul
> -----------------------------------------------------
>
>                 Key: GIRAPH-277
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-277
>             Project: Giraph
>          Issue Type: Improvement
>          Components: examples, lib
>            Reporter: Jaeho Shin
>            Assignee: Nitay Joffe
>         Attachments: GIRAPH-277-2.patch, GIRAPH-277-3.patch, GIRAPH-277.patch
>
>
> The current way of implementing {{VertexInputFormat}} and {{VertexReader}} had bad smell.  It required users to understand how these two classes are glued together, and forced similar codes to be duplicated in every new input format.  (Similarly for the VertexOutputFormat and VertexWriter.)  Anyone who wants to create a new format should create an underlying record reader or writer at the right moment and delegate some calls to it, which seemed unnecessary detail being exposed.  Besides, type parameters had to appear all over every new format code, which was extremely annoying for both reading existing code and writing a new one.  I was very frustrated writing my first format code especially when I compared it to writing a new vertex code.  I thought writing a new input/output format should be as simple as vertex.
> So, I have refactored {{TextVertexInputFormat}} and {{OutputFormat}} into new forms that have no difference in their interfaces, but remove a lot of burden for subclassing.  Instead of providing static VertexReader base classes, I made it a non-static inner-class of its format class, which helps eliminate the repeated code for gluing these two, already tightly coupled classes.  This has additional advantage of eliminating all the Generics type variables on the VertexReader side, which makes overall code much more concise.  I added several useful TextVertexReader base classes that can save efforts for implementing line-oriented formats.
> Please comment if you see my proposed change have any impact on other aspects.  I'm unsure of how these additional layers of abstraction could affect performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-277) Text Vertex Input/Output Format base classes overhaul

Posted by "Avery Ching (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13459460#comment-13459460 ] 

Avery Ching commented on GIRAPH-277:
------------------------------------

Did you run 'mvn clean verify' with this patch?  I got 

There are 2 checkstyle errors.  We need to fix this before moving on.  Before this patch, everything is fine.
                
> Text Vertex Input/Output Format base classes overhaul
> -----------------------------------------------------
>
>                 Key: GIRAPH-277
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-277
>             Project: Giraph
>          Issue Type: Improvement
>          Components: examples, lib
>            Reporter: Jaeho Shin
>            Assignee: Nitay Joffe
>         Attachments: GIRAPH-227-4.patch, GIRAPH-277-2.patch, GIRAPH-277-3.patch, GIRAPH-277.patch
>
>
> The current way of implementing {{VertexInputFormat}} and {{VertexReader}} had bad smell.  It required users to understand how these two classes are glued together, and forced similar codes to be duplicated in every new input format.  (Similarly for the VertexOutputFormat and VertexWriter.)  Anyone who wants to create a new format should create an underlying record reader or writer at the right moment and delegate some calls to it, which seemed unnecessary detail being exposed.  Besides, type parameters had to appear all over every new format code, which was extremely annoying for both reading existing code and writing a new one.  I was very frustrated writing my first format code especially when I compared it to writing a new vertex code.  I thought writing a new input/output format should be as simple as vertex.
> So, I have refactored {{TextVertexInputFormat}} and {{OutputFormat}} into new forms that have no difference in their interfaces, but remove a lot of burden for subclassing.  Instead of providing static VertexReader base classes, I made it a non-static inner-class of its format class, which helps eliminate the repeated code for gluing these two, already tightly coupled classes.  This has additional advantage of eliminating all the Generics type variables on the VertexReader side, which makes overall code much more concise.  I added several useful TextVertexReader base classes that can save efforts for implementing line-oriented formats.
> Please comment if you see my proposed change have any impact on other aspects.  I'm unsure of how these additional layers of abstraction could affect performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-277) Text Vertex Input/Output Format base classes overhaul

Posted by "Eli Reisman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13428451#comment-13428451 ] 

Eli Reisman commented on GIRAPH-277:
------------------------------------

If you have a use case where this sort of extra preprocessing is needed, and others think it will be valuable then we should leave it in. If you would be comfortable putting up something on the wiki documenting for new users how you think they should go about writing new IO formats, that would be good too.

Actually, given the way the other tools in the Hadoop ecosystem work I don't think its at all unreasonable to create at least base classes that do 90% of the work for many common IO formats in use with big data in the Apache universe, and we use Giraph almost exclusively this way around here. I do think the current collection has many obvious gaps and we're still a ways from supplying what people would need. I think a complete basic set would have to meet the criteria that: 1. a new user could very easily test-drive Giraph with some of their own sample data without too much trouble, 2. they could use the supplied IO formats as a guide to writing their own without too much code duplication or confusion along the way.

                
> Text Vertex Input/Output Format base classes overhaul
> -----------------------------------------------------
>
>                 Key: GIRAPH-277
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-277
>             Project: Giraph
>          Issue Type: Improvement
>          Components: examples, lib
>            Reporter: Jaeho Shin
>         Attachments: GIRAPH-277.patch
>
>
> The current way of implementing {{VertexInputFormat}} and {{VertexReader}} had bad smell.  It required users to understand how these two classes are glued together, and forced similar codes to be duplicated in every new input format.  (Similarly for the VertexOutputFormat and VertexWriter.)  Anyone who wants to create a new format should create an underlying record reader or writer at the right moment and delegate some calls to it, which seemed unnecessary detail being exposed.  Besides, type parameters had to appear all over every new format code, which was extremely annoying for both reading existing code and writing a new one.  I was very frustrated writing my first format code especially when I compared it to writing a new vertex code.  I thought writing a new input/output format should be as simple as vertex.
> So, I have refactored {{TextVertexInputFormat}} and {{OutputFormat}} into new forms that have no difference in their interfaces, but remove a lot of burden for subclassing.  Instead of providing static VertexReader base classes, I made it a non-static inner-class of its format class, which helps eliminate the repeated code for gluing these two, already tightly coupled classes.  This has additional advantage of eliminating all the Generics type variables on the VertexReader side, which makes overall code much more concise.  I added several useful TextVertexReader base classes that can save efforts for implementing line-oriented formats.
> Please comment if you see my proposed change have any impact on other aspects.  I'm unsure of how these additional layers of abstraction could affect performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-277) Text Vertex Input/Output Format base classes overhaul

Posted by "Jakob Homan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429475#comment-13429475 ] 

Jakob Homan commented on GIRAPH-277:
------------------------------------

As an aside, for some reason I can't add Jaeho as a contributor in JIRA so that I can assign the issue to him.  I can add him as a developer though.  Can another commiter/admin try?
                
> Text Vertex Input/Output Format base classes overhaul
> -----------------------------------------------------
>
>                 Key: GIRAPH-277
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-277
>             Project: Giraph
>          Issue Type: Improvement
>          Components: examples, lib
>            Reporter: Jaeho Shin
>         Attachments: GIRAPH-277-2.patch, GIRAPH-277.patch
>
>
> The current way of implementing {{VertexInputFormat}} and {{VertexReader}} had bad smell.  It required users to understand how these two classes are glued together, and forced similar codes to be duplicated in every new input format.  (Similarly for the VertexOutputFormat and VertexWriter.)  Anyone who wants to create a new format should create an underlying record reader or writer at the right moment and delegate some calls to it, which seemed unnecessary detail being exposed.  Besides, type parameters had to appear all over every new format code, which was extremely annoying for both reading existing code and writing a new one.  I was very frustrated writing my first format code especially when I compared it to writing a new vertex code.  I thought writing a new input/output format should be as simple as vertex.
> So, I have refactored {{TextVertexInputFormat}} and {{OutputFormat}} into new forms that have no difference in their interfaces, but remove a lot of burden for subclassing.  Instead of providing static VertexReader base classes, I made it a non-static inner-class of its format class, which helps eliminate the repeated code for gluing these two, already tightly coupled classes.  This has additional advantage of eliminating all the Generics type variables on the VertexReader side, which makes overall code much more concise.  I added several useful TextVertexReader base classes that can save efforts for implementing line-oriented formats.
> Please comment if you see my proposed change have any impact on other aspects.  I'm unsure of how these additional layers of abstraction could affect performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-277) Text Vertex Input/Output Format base classes overhaul

Posted by "Avery Ching (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13460032#comment-13460032 ] 

Avery Ching commented on GIRAPH-277:
------------------------------------

Looks great and passed the tests.  +1, committing.
                
> Text Vertex Input/Output Format base classes overhaul
> -----------------------------------------------------
>
>                 Key: GIRAPH-277
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-277
>             Project: Giraph
>          Issue Type: Improvement
>          Components: examples, lib
>            Reporter: Jaeho Shin
>            Assignee: Nitay Joffe
>         Attachments: GIRAPH-227-4.patch, GIRAPH-277-2.patch, GIRAPH-277-3.patch, GIRAPH-277-5.patch, GIRAPH-277.patch
>
>
> The current way of implementing {{VertexInputFormat}} and {{VertexReader}} had bad smell.  It required users to understand how these two classes are glued together, and forced similar codes to be duplicated in every new input format.  (Similarly for the VertexOutputFormat and VertexWriter.)  Anyone who wants to create a new format should create an underlying record reader or writer at the right moment and delegate some calls to it, which seemed unnecessary detail being exposed.  Besides, type parameters had to appear all over every new format code, which was extremely annoying for both reading existing code and writing a new one.  I was very frustrated writing my first format code especially when I compared it to writing a new vertex code.  I thought writing a new input/output format should be as simple as vertex.
> So, I have refactored {{TextVertexInputFormat}} and {{OutputFormat}} into new forms that have no difference in their interfaces, but remove a lot of burden for subclassing.  Instead of providing static VertexReader base classes, I made it a non-static inner-class of its format class, which helps eliminate the repeated code for gluing these two, already tightly coupled classes.  This has additional advantage of eliminating all the Generics type variables on the VertexReader side, which makes overall code much more concise.  I added several useful TextVertexReader base classes that can save efforts for implementing line-oriented formats.
> Please comment if you see my proposed change have any impact on other aspects.  I'm unsure of how these additional layers of abstraction could affect performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (GIRAPH-277) Text Vertex Input/Output Format base classes overhaul

Posted by "Jaeho Shin (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/GIRAPH-277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jaeho Shin updated GIRAPH-277:
------------------------------

    Attachment: GIRAPH-277.patch
    
> Text Vertex Input/Output Format base classes overhaul
> -----------------------------------------------------
>
>                 Key: GIRAPH-277
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-277
>             Project: Giraph
>          Issue Type: Improvement
>          Components: examples, lib
>            Reporter: Jaeho Shin
>         Attachments: GIRAPH-277.patch
>
>
> The current way of implementing {{VertexInputFormat}} and {{VertexReader}} had bad smell.  It required users to understand how these two classes are glued together, and forced similar codes to be duplicated in every new input format.  (Similarly for the VertexOutputFormat and VertexWriter.)  Anyone who wants to create a new format should create an underlying record reader or writer at the right moment and delegate some calls to it, which seemed unnecessary detail being exposed.  Besides, type parameters had to appear all over every new format code, which was extremely annoying for both reading existing code and writing a new one.  I was very frustrated writing my first format code especially when I compared it to writing a new vertex code.  I thought writing a new input/output format should be as simple as vertex.
> So, I have refactored {{TextVertexInputFormat}} and {{OutputFormat}} into new forms that have no difference in their interfaces, but remove a lot of burden for subclassing.  Instead of providing static VertexReader base classes, I made it a non-static inner-class of its format class, which helps eliminate the repeated code for gluing these two, already tightly coupled classes.  This has additional advantage of eliminating all the Generics type variables on the VertexReader side, which makes overall code much more concise.  I added several useful TextVertexReader base classes that can save efforts for implementing line-oriented formats.
> Please comment if you see my proposed change have any impact on other aspects.  I'm unsure of how these additional layers of abstraction could affect performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-277) Text Vertex Input/Output Format base classes overhaul

Posted by "Jaeho Shin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13428513#comment-13428513 ] 

Jaeho Shin commented on GIRAPH-277:
-----------------------------------

@Eli, In fact, the preprocessing is already used within the patch several times for {{AdjacencyListTextVertexReader}}, {{JsonBase64VertexReader}}, and {{JsonLongDoubleFloatDoubleVertexReader}} to name a few. (We are discusssing the {{TextVertexReaderFromEachLineProcessed}} class, right?)
To make it clearer, let me explain how I came up with the design.  First I started rewriting existing readers based on the plain {{TextVertexReaderFromEachLine}}, but found there was a pattern in most of them.  They were processing each line to yield an intermediate object, such as String[], JSONObject, or JSONArray, then create vertex id, value and edges out of it.  Without the {{TextVertexReaderFromEachLineProcessed}} base class, it was going to force users to do the String#split() or JSON parsing from the Text object multiple times (in getId/Value/Edges), or make them hacking the base classes for performance.  I hope this convinces you.
As my patch gets committed, I will update the wiki and try to impress new users showing how simple it is to load their custom data to Giraph. :)

@Jakob, I tried with the reviewboard web form, but getting 500 server error every time I submit the patch.  I tried post-review from my git-svn tree, but also failed multiple times.  It left some empty review requests, but I still can't upload my diff there.  It complains about my patch having "No valid separator after the filename was found in the diff header"  What's the right way to generate the patch for RB?  Shouldn't {{git diff}} work?
                
> Text Vertex Input/Output Format base classes overhaul
> -----------------------------------------------------
>
>                 Key: GIRAPH-277
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-277
>             Project: Giraph
>          Issue Type: Improvement
>          Components: examples, lib
>            Reporter: Jaeho Shin
>         Attachments: GIRAPH-277.patch
>
>
> The current way of implementing {{VertexInputFormat}} and {{VertexReader}} had bad smell.  It required users to understand how these two classes are glued together, and forced similar codes to be duplicated in every new input format.  (Similarly for the VertexOutputFormat and VertexWriter.)  Anyone who wants to create a new format should create an underlying record reader or writer at the right moment and delegate some calls to it, which seemed unnecessary detail being exposed.  Besides, type parameters had to appear all over every new format code, which was extremely annoying for both reading existing code and writing a new one.  I was very frustrated writing my first format code especially when I compared it to writing a new vertex code.  I thought writing a new input/output format should be as simple as vertex.
> So, I have refactored {{TextVertexInputFormat}} and {{OutputFormat}} into new forms that have no difference in their interfaces, but remove a lot of burden for subclassing.  Instead of providing static VertexReader base classes, I made it a non-static inner-class of its format class, which helps eliminate the repeated code for gluing these two, already tightly coupled classes.  This has additional advantage of eliminating all the Generics type variables on the VertexReader side, which makes overall code much more concise.  I added several useful TextVertexReader base classes that can save efforts for implementing line-oriented formats.
> Please comment if you see my proposed change have any impact on other aspects.  I'm unsure of how these additional layers of abstraction could affect performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (GIRAPH-277) Text Vertex Input/Output Format base classes overhaul

Posted by "Nitay Joffe (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/GIRAPH-277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nitay Joffe reassigned GIRAPH-277:
----------------------------------

    Assignee: Nitay Joffe  (was: Jaeho Shin)
    
> Text Vertex Input/Output Format base classes overhaul
> -----------------------------------------------------
>
>                 Key: GIRAPH-277
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-277
>             Project: Giraph
>          Issue Type: Improvement
>          Components: examples, lib
>            Reporter: Jaeho Shin
>            Assignee: Nitay Joffe
>         Attachments: GIRAPH-277-2.patch, GIRAPH-277.patch
>
>
> The current way of implementing {{VertexInputFormat}} and {{VertexReader}} had bad smell.  It required users to understand how these two classes are glued together, and forced similar codes to be duplicated in every new input format.  (Similarly for the VertexOutputFormat and VertexWriter.)  Anyone who wants to create a new format should create an underlying record reader or writer at the right moment and delegate some calls to it, which seemed unnecessary detail being exposed.  Besides, type parameters had to appear all over every new format code, which was extremely annoying for both reading existing code and writing a new one.  I was very frustrated writing my first format code especially when I compared it to writing a new vertex code.  I thought writing a new input/output format should be as simple as vertex.
> So, I have refactored {{TextVertexInputFormat}} and {{OutputFormat}} into new forms that have no difference in their interfaces, but remove a lot of burden for subclassing.  Instead of providing static VertexReader base classes, I made it a non-static inner-class of its format class, which helps eliminate the repeated code for gluing these two, already tightly coupled classes.  This has additional advantage of eliminating all the Generics type variables on the VertexReader side, which makes overall code much more concise.  I added several useful TextVertexReader base classes that can save efforts for implementing line-oriented formats.
> Please comment if you see my proposed change have any impact on other aspects.  I'm unsure of how these additional layers of abstraction could affect performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-277) Text Vertex Input/Output Format base classes overhaul

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13460052#comment-13460052 ] 

Hudson commented on GIRAPH-277:
-------------------------------

Integrated in Giraph-trunk-Commit #201 (See [https://builds.apache.org/job/Giraph-trunk-Commit/201/])
    Part of GIRAPH-277. (Revision 1388268)
GIRAPH-277: Text Vertex Input/Output Format base classes overhaul.
(nitayj via aching) (Revision 1388258)

     Result = SUCCESS
aching : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1388268
Files : 
* /giraph/trunk/src/main/java/org/apache/giraph/io/AdjacencyListVertexReader.java

aching : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1388258
Files : 
* /giraph/trunk/CHANGELOG
* /giraph/trunk/checkstyle.xml
* /giraph/trunk/src/main/java/org/apache/giraph/examples/LongDoubleFloatDoubleTextInputFormat.java
* /giraph/trunk/src/main/java/org/apache/giraph/examples/NormalizingLongDoubleFloatDoubleTextInputFormat.java
* /giraph/trunk/src/main/java/org/apache/giraph/examples/SimplePageRankVertex.java
* /giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleSuperstepVertex.java
* /giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleTextVertexOutputFormat.java
* /giraph/trunk/src/main/java/org/apache/giraph/examples/VertexWithDoubleValueFloatEdgeTextOutputFormat.java
* /giraph/trunk/src/main/java/org/apache/giraph/graph/VertexReader.java
* /giraph/trunk/src/main/java/org/apache/giraph/graph/VertexWriter.java
* /giraph/trunk/src/main/java/org/apache/giraph/io/AdjacencyListTextVertexInputFormat.java
* /giraph/trunk/src/main/java/org/apache/giraph/io/AdjacencyListTextVertexOutputFormat.java
* /giraph/trunk/src/main/java/org/apache/giraph/io/IdWithValueTextOutputFormat.java
* /giraph/trunk/src/main/java/org/apache/giraph/io/IntIntNullIntTextInputFormat.java
* /giraph/trunk/src/main/java/org/apache/giraph/io/JsonBase64VertexInputFormat.java
* /giraph/trunk/src/main/java/org/apache/giraph/io/JsonBase64VertexOutputFormat.java
* /giraph/trunk/src/main/java/org/apache/giraph/io/JsonLongDoubleFloatDoubleVertexInputFormat.java
* /giraph/trunk/src/main/java/org/apache/giraph/io/JsonLongDoubleFloatDoubleVertexOutputFormat.java
* /giraph/trunk/src/main/java/org/apache/giraph/io/LongDoubleDoubleAdjacencyListVertexInputFormat.java
* /giraph/trunk/src/main/java/org/apache/giraph/io/TextDoubleDoubleAdjacencyListVertexInputFormat.java
* /giraph/trunk/src/main/java/org/apache/giraph/io/TextVertexInputFormat.java
* /giraph/trunk/src/main/java/org/apache/giraph/io/TextVertexOutputFormat.java
* /giraph/trunk/src/test/java/org/apache/giraph/io/TestAdjacencyListTextVertexOutputFormat.java
* /giraph/trunk/src/test/java/org/apache/giraph/io/TestIdWithValueTextOutputFormat.java
* /giraph/trunk/src/test/java/org/apache/giraph/io/TestLongDoubleDoubleAdjacencyListVertexInputFormat.java
* /giraph/trunk/src/test/java/org/apache/giraph/io/TestTextDoubleDoubleAdjacencyListVertexInputFormat.java

                
> Text Vertex Input/Output Format base classes overhaul
> -----------------------------------------------------
>
>                 Key: GIRAPH-277
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-277
>             Project: Giraph
>          Issue Type: Improvement
>          Components: examples, lib
>            Reporter: Jaeho Shin
>            Assignee: Nitay Joffe
>         Attachments: GIRAPH-227-4.patch, GIRAPH-277-2.patch, GIRAPH-277-3.patch, GIRAPH-277-5.patch, GIRAPH-277.patch
>
>
> The current way of implementing {{VertexInputFormat}} and {{VertexReader}} had bad smell.  It required users to understand how these two classes are glued together, and forced similar codes to be duplicated in every new input format.  (Similarly for the VertexOutputFormat and VertexWriter.)  Anyone who wants to create a new format should create an underlying record reader or writer at the right moment and delegate some calls to it, which seemed unnecessary detail being exposed.  Besides, type parameters had to appear all over every new format code, which was extremely annoying for both reading existing code and writing a new one.  I was very frustrated writing my first format code especially when I compared it to writing a new vertex code.  I thought writing a new input/output format should be as simple as vertex.
> So, I have refactored {{TextVertexInputFormat}} and {{OutputFormat}} into new forms that have no difference in their interfaces, but remove a lot of burden for subclassing.  Instead of providing static VertexReader base classes, I made it a non-static inner-class of its format class, which helps eliminate the repeated code for gluing these two, already tightly coupled classes.  This has additional advantage of eliminating all the Generics type variables on the VertexReader side, which makes overall code much more concise.  I added several useful TextVertexReader base classes that can save efforts for implementing line-oriented formats.
> Please comment if you see my proposed change have any impact on other aspects.  I'm unsure of how these additional layers of abstraction could affect performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-277) Text Vertex Input/Output Format base classes overhaul

Posted by "Nitay Joffe (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13459465#comment-13459465 ] 

Nitay Joffe commented on GIRAPH-277:
------------------------------------

I did, the two errors are the <I> unless you're seeing something I'm not
                
> Text Vertex Input/Output Format base classes overhaul
> -----------------------------------------------------
>
>                 Key: GIRAPH-277
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-277
>             Project: Giraph
>          Issue Type: Improvement
>          Components: examples, lib
>            Reporter: Jaeho Shin
>            Assignee: Nitay Joffe
>         Attachments: GIRAPH-227-4.patch, GIRAPH-277-2.patch, GIRAPH-277-3.patch, GIRAPH-277.patch
>
>
> The current way of implementing {{VertexInputFormat}} and {{VertexReader}} had bad smell.  It required users to understand how these two classes are glued together, and forced similar codes to be duplicated in every new input format.  (Similarly for the VertexOutputFormat and VertexWriter.)  Anyone who wants to create a new format should create an underlying record reader or writer at the right moment and delegate some calls to it, which seemed unnecessary detail being exposed.  Besides, type parameters had to appear all over every new format code, which was extremely annoying for both reading existing code and writing a new one.  I was very frustrated writing my first format code especially when I compared it to writing a new vertex code.  I thought writing a new input/output format should be as simple as vertex.
> So, I have refactored {{TextVertexInputFormat}} and {{OutputFormat}} into new forms that have no difference in their interfaces, but remove a lot of burden for subclassing.  Instead of providing static VertexReader base classes, I made it a non-static inner-class of its format class, which helps eliminate the repeated code for gluing these two, already tightly coupled classes.  This has additional advantage of eliminating all the Generics type variables on the VertexReader side, which makes overall code much more concise.  I added several useful TextVertexReader base classes that can save efforts for implementing line-oriented formats.
> Please comment if you see my proposed change have any impact on other aspects.  I'm unsure of how these additional layers of abstraction could affect performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-277) Text Vertex Input/Output Format base classes overhaul

Posted by "Jakob Homan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429336#comment-13429336 ] 

Jakob Homan commented on GIRAPH-277:
------------------------------------

I'm not sure aobut the -full-index.  I downloaded your patch, applied it to a fresh checkout (with -p1), created a new patch (since I don't know how to get rb to use -p1) and then created a review from that: https://reviews.apache.org/r/6401/  Are you using /trunk as the basepath for rb? I was also getting 500 until I got clarification on that.

Re: how-to-contribute.  I was referring to the site.  It's not important where the info is but it shouldn't be duplicated.  If you'd like to clean up with the wiki with your recent experience, I'll remove the info from the website and point people to the wiki.
                
> Text Vertex Input/Output Format base classes overhaul
> -----------------------------------------------------
>
>                 Key: GIRAPH-277
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-277
>             Project: Giraph
>          Issue Type: Improvement
>          Components: examples, lib
>            Reporter: Jaeho Shin
>         Attachments: GIRAPH-277.patch
>
>
> The current way of implementing {{VertexInputFormat}} and {{VertexReader}} had bad smell.  It required users to understand how these two classes are glued together, and forced similar codes to be duplicated in every new input format.  (Similarly for the VertexOutputFormat and VertexWriter.)  Anyone who wants to create a new format should create an underlying record reader or writer at the right moment and delegate some calls to it, which seemed unnecessary detail being exposed.  Besides, type parameters had to appear all over every new format code, which was extremely annoying for both reading existing code and writing a new one.  I was very frustrated writing my first format code especially when I compared it to writing a new vertex code.  I thought writing a new input/output format should be as simple as vertex.
> So, I have refactored {{TextVertexInputFormat}} and {{OutputFormat}} into new forms that have no difference in their interfaces, but remove a lot of burden for subclassing.  Instead of providing static VertexReader base classes, I made it a non-static inner-class of its format class, which helps eliminate the repeated code for gluing these two, already tightly coupled classes.  This has additional advantage of eliminating all the Generics type variables on the VertexReader side, which makes overall code much more concise.  I added several useful TextVertexReader base classes that can save efforts for implementing line-oriented formats.
> Please comment if you see my proposed change have any impact on other aspects.  I'm unsure of how these additional layers of abstraction could affect performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-277) Text Vertex Input/Output Format base classes overhaul

Posted by "Brian Femiano (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427782#comment-13427782 ] 

Brian Femiano commented on GIRAPH-277:
--------------------------------------

I'm in favor of anything that makes subclassing I/O formats easier on the users. I've implemented my own Giraph jobs about 10-12 times already and have only been able to recycle I/O formats once or twice. The combination of id/value/message and the variation I require across jobs usually forces a new format from scratch. This is just my own observation based on my data. The work going on to separate messages from vertices will help quite a bit with regard to this.

In the spirit of this thread, I could definitely refactor HBaseVertexInputFormat to be more friendly towards subclassing like this patch shows. I particularly like the generic propogation down to the nested record reader. Having to retype all the generic parameters gets old. I should also make some subclasses for HBaseVertexInputFormat that let people define column-families for vertex values and other handy behavior. It might help with adoption and code reuse.  
                
> Text Vertex Input/Output Format base classes overhaul
> -----------------------------------------------------
>
>                 Key: GIRAPH-277
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-277
>             Project: Giraph
>          Issue Type: Improvement
>          Components: examples, lib
>            Reporter: Jaeho Shin
>         Attachments: GIRAPH-277.patch
>
>
> The current way of implementing {{VertexInputFormat}} and {{VertexReader}} had bad smell.  It required users to understand how these two classes are glued together, and forced similar codes to be duplicated in every new input format.  (Similarly for the VertexOutputFormat and VertexWriter.)  Anyone who wants to create a new format should create an underlying record reader or writer at the right moment and delegate some calls to it, which seemed unnecessary detail being exposed.  Besides, type parameters had to appear all over every new format code, which was extremely annoying for both reading existing code and writing a new one.  I was very frustrated writing my first format code especially when I compared it to writing a new vertex code.  I thought writing a new input/output format should be as simple as vertex.
> So, I have refactored {{TextVertexInputFormat}} and {{OutputFormat}} into new forms that have no difference in their interfaces, but remove a lot of burden for subclassing.  Instead of providing static VertexReader base classes, I made it a non-static inner-class of its format class, which helps eliminate the repeated code for gluing these two, already tightly coupled classes.  This has additional advantage of eliminating all the Generics type variables on the VertexReader side, which makes overall code much more concise.  I added several useful TextVertexReader base classes that can save efforts for implementing line-oriented formats.
> Please comment if you see my proposed change have any impact on other aspects.  I'm unsure of how these additional layers of abstraction could affect performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-277) Text Vertex Input/Output Format base classes overhaul

Posted by "Nitay Joffe (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13459385#comment-13459385 ] 

Nitay Joffe commented on GIRAPH-277:
------------------------------------

1) Will do.
2) Yeah I saw that forgot to comment on it - To be honest I'm fairly confused by this. It thinks the <I> is an html sequence (italics). It makes sense that it's complaining about this. In fact I'm confused why it's not complaining about this everywhere else in the code? I don't think the checkstyle change in the diff causes it because all it is changing is the rules for how to turn it off. We can add a // CHECKSTYLE: stop JavadocStyleCheck thing but that seems ugly. What do you think, should we just ignore it?
                
> Text Vertex Input/Output Format base classes overhaul
> -----------------------------------------------------
>
>                 Key: GIRAPH-277
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-277
>             Project: Giraph
>          Issue Type: Improvement
>          Components: examples, lib
>            Reporter: Jaeho Shin
>            Assignee: Nitay Joffe
>         Attachments: GIRAPH-277-2.patch, GIRAPH-277-3.patch, GIRAPH-277.patch
>
>
> The current way of implementing {{VertexInputFormat}} and {{VertexReader}} had bad smell.  It required users to understand how these two classes are glued together, and forced similar codes to be duplicated in every new input format.  (Similarly for the VertexOutputFormat and VertexWriter.)  Anyone who wants to create a new format should create an underlying record reader or writer at the right moment and delegate some calls to it, which seemed unnecessary detail being exposed.  Besides, type parameters had to appear all over every new format code, which was extremely annoying for both reading existing code and writing a new one.  I was very frustrated writing my first format code especially when I compared it to writing a new vertex code.  I thought writing a new input/output format should be as simple as vertex.
> So, I have refactored {{TextVertexInputFormat}} and {{OutputFormat}} into new forms that have no difference in their interfaces, but remove a lot of burden for subclassing.  Instead of providing static VertexReader base classes, I made it a non-static inner-class of its format class, which helps eliminate the repeated code for gluing these two, already tightly coupled classes.  This has additional advantage of eliminating all the Generics type variables on the VertexReader side, which makes overall code much more concise.  I added several useful TextVertexReader base classes that can save efforts for implementing line-oriented formats.
> Please comment if you see my proposed change have any impact on other aspects.  I'm unsure of how these additional layers of abstraction could affect performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-277) Text Vertex Input/Output Format base classes overhaul

Posted by "Avery Ching (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13459468#comment-13459468 ] 

Avery Ching commented on GIRAPH-277:
------------------------------------

Actually, I think i know the issue here.
{code}
   * @param <I> Vertex index value                                                                                                                       
   * @param <V> Vertex value                                                                                                                             
   * @param <E> Edge value                                                                                                                               
   */
  protected abstract class TextVertexReader implements
    VertexReader<I, V, E, M> {
{code}

I, V, E are not valid javadoc parameters here.  They are only valid at the wrapper class.

{code}
 * @param <I> Vertex index value                                                                                                                         
 * @param <V> Vertex value                                                                                                                               
 * @param <E> Edge value                                                                                                                                 
 * @param <M> Message value                                                                                                                              
 */
@SuppressWarnings("rawtypes")
public abstract class TextVertexInputFormat<I extends WritableComparable,
    V extends Writable, E extends Writable, M extends Writable>
    extends VertexInputFormat<I, V, E, M> {
{code}

If you remove the invalid parameters, everything works great.  This can fix the other issue in GIRAPH-93 as well.
                
> Text Vertex Input/Output Format base classes overhaul
> -----------------------------------------------------
>
>                 Key: GIRAPH-277
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-277
>             Project: Giraph
>          Issue Type: Improvement
>          Components: examples, lib
>            Reporter: Jaeho Shin
>            Assignee: Nitay Joffe
>         Attachments: GIRAPH-227-4.patch, GIRAPH-277-2.patch, GIRAPH-277-3.patch, GIRAPH-277.patch
>
>
> The current way of implementing {{VertexInputFormat}} and {{VertexReader}} had bad smell.  It required users to understand how these two classes are glued together, and forced similar codes to be duplicated in every new input format.  (Similarly for the VertexOutputFormat and VertexWriter.)  Anyone who wants to create a new format should create an underlying record reader or writer at the right moment and delegate some calls to it, which seemed unnecessary detail being exposed.  Besides, type parameters had to appear all over every new format code, which was extremely annoying for both reading existing code and writing a new one.  I was very frustrated writing my first format code especially when I compared it to writing a new vertex code.  I thought writing a new input/output format should be as simple as vertex.
> So, I have refactored {{TextVertexInputFormat}} and {{OutputFormat}} into new forms that have no difference in their interfaces, but remove a lot of burden for subclassing.  Instead of providing static VertexReader base classes, I made it a non-static inner-class of its format class, which helps eliminate the repeated code for gluing these two, already tightly coupled classes.  This has additional advantage of eliminating all the Generics type variables on the VertexReader side, which makes overall code much more concise.  I added several useful TextVertexReader base classes that can save efforts for implementing line-oriented formats.
> Please comment if you see my proposed change have any impact on other aspects.  I'm unsure of how these additional layers of abstraction could affect performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-277) Text Vertex Input/Output Format base classes overhaul

Posted by "Jakob Homan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429301#comment-13429301 ] 

Jakob Homan commented on GIRAPH-277:
------------------------------------

with git you also have to add --no-prefix as git adds some useless letters to the path.  This is covered in the how-to-contribute section on the webpage.
                
> Text Vertex Input/Output Format base classes overhaul
> -----------------------------------------------------
>
>                 Key: GIRAPH-277
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-277
>             Project: Giraph
>          Issue Type: Improvement
>          Components: examples, lib
>            Reporter: Jaeho Shin
>         Attachments: GIRAPH-277.patch
>
>
> The current way of implementing {{VertexInputFormat}} and {{VertexReader}} had bad smell.  It required users to understand how these two classes are glued together, and forced similar codes to be duplicated in every new input format.  (Similarly for the VertexOutputFormat and VertexWriter.)  Anyone who wants to create a new format should create an underlying record reader or writer at the right moment and delegate some calls to it, which seemed unnecessary detail being exposed.  Besides, type parameters had to appear all over every new format code, which was extremely annoying for both reading existing code and writing a new one.  I was very frustrated writing my first format code especially when I compared it to writing a new vertex code.  I thought writing a new input/output format should be as simple as vertex.
> So, I have refactored {{TextVertexInputFormat}} and {{OutputFormat}} into new forms that have no difference in their interfaces, but remove a lot of burden for subclassing.  Instead of providing static VertexReader base classes, I made it a non-static inner-class of its format class, which helps eliminate the repeated code for gluing these two, already tightly coupled classes.  This has additional advantage of eliminating all the Generics type variables on the VertexReader side, which makes overall code much more concise.  I added several useful TextVertexReader base classes that can save efforts for implementing line-oriented formats.
> Please comment if you see my proposed change have any impact on other aspects.  I'm unsure of how these additional layers of abstraction could affect performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira