You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@crunch.apache.org by "Vinod Kumar Vavilapalli (JIRA)" <ji...@apache.org> on 2012/09/20 07:37:07 UTC

[jira] [Created] (CRUNCH-70) Simplify Pipeline API

Vinod Kumar Vavilapalli created CRUNCH-70:
---------------------------------------------

             Summary: Simplify Pipeline API
                 Key: CRUNCH-70
                 URL: https://issues.apache.org/jira/browse/CRUNCH-70
             Project: Crunch
          Issue Type: Bug
            Reporter: Vinod Kumar Vavilapalli


Today Pipeline interface has the following APIs which really belong to a utils class:
 - readTextFile
 - writeTextFile
 - enableDebug

The implementation of these APIs is the same in both the Pipeline-types present today and are most likely going to be the same if ever we have one more impl.

I propose we move these to a util/lib to make the core interface cleaner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CRUNCH-70) Simplify Pipeline API

Posted by "Vinod Kumar Vavilapalli (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CRUNCH-70?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vinod Kumar Vavilapalli updated CRUNCH-70:
------------------------------------------

    Attachment: CRUNCH-70-20120919.txt

Here's a patch to do this.

I added a util called Pipelines following the convention.

One question though (left as a TODO in the patch): In writeTextFile() run as part of a MR pipeline, we do the following:
{code}
+      collection =
+          collection.parallelDo("asText", IdentityFn.<T> getInstance(),
+            WritableTypeFamily.getInstance().as(collection.getPType()));
{code}

Why do we do it? And do we really need MRPipeline to force the PTypeFamily to be Writables?
                
> Simplify Pipeline API
> ---------------------
>
>                 Key: CRUNCH-70
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-70
>             Project: Crunch
>          Issue Type: Bug
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
>         Attachments: CRUNCH-70-20120919.txt
>
>
> Today Pipeline interface has the following APIs which really belong to a utils class:
>  - readTextFile
>  - writeTextFile
>  - enableDebug
> The implementation of these APIs is the same in both the Pipeline-types present today and are most likely going to be the same if ever we have one more impl.
> I propose we move these to a util/lib to make the core interface cleaner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CRUNCH-70) Simplify Pipeline API

Posted by "Josh Wills (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CRUNCH-70?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13460808#comment-13460808 ] 

Josh Wills commented on CRUNCH-70:
----------------------------------

When you write Avro records to a TextOutputFormat, they come out w/values like "AvroWrapper@hexcode", so we had to force convert to a Writable type to be compatible with TextOutputFormat. I *think* that got fixed in CRUNCH-52, so that code can probably go.

My default preference is for not removing API functions unless it's part of an overall redesign (e.g., in the context of untangling the various components of Crunch core) that buys us something. I think I could squint and make a case for removing readTextFile and writeTextFile anymore, since doing read(From.textFile(...)) or write(To.textFile(...)) exist now, but I would argue for keeping enableDebug. It's way too useful.
                
> Simplify Pipeline API
> ---------------------
>
>                 Key: CRUNCH-70
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-70
>             Project: Crunch
>          Issue Type: Bug
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
>         Attachments: CRUNCH-70-20120919.txt
>
>
> Today Pipeline interface has the following APIs which really belong to a utils class:
>  - readTextFile
>  - writeTextFile
>  - enableDebug
> The implementation of these APIs is the same in both the Pipeline-types present today and are most likely going to be the same if ever we have one more impl.
> I propose we move these to a util/lib to make the core interface cleaner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (CRUNCH-70) Simplify Pipeline API

Posted by "Vinod Kumar Vavilapalli (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CRUNCH-70?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vinod Kumar Vavilapalli reassigned CRUNCH-70:
---------------------------------------------

    Assignee: Vinod Kumar Vavilapalli
    
> Simplify Pipeline API
> ---------------------
>
>                 Key: CRUNCH-70
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-70
>             Project: Crunch
>          Issue Type: Bug
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
>
> Today Pipeline interface has the following APIs which really belong to a utils class:
>  - readTextFile
>  - writeTextFile
>  - enableDebug
> The implementation of these APIs is the same in both the Pipeline-types present today and are most likely going to be the same if ever we have one more impl.
> I propose we move these to a util/lib to make the core interface cleaner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira