You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@datafu.apache.org by "Russell Jurney (JIRA)" <ji...@apache.org> on 2015/01/01 17:34:13 UTC

[jira] [Commented] (DATAFU-85) Add SPRINTF to provide this functionality to Pig < 0.14.0

    [ https://issues.apache.org/jira/browse/DATAFU-85?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14262591#comment-14262591 ] 

Russell Jurney commented on DATAFU-85:
--------------------------------------

I've emailed the list. No replies yet. Pasted below:

-------

I think this raises an issue that merits discussion.

Background:

There are two different release schedules that occur between Pig and DataFu:

1) Pig is released about twice a year (14 major releases in 6 years). Getting UDF code into Pig (builtin) or Piggybank is a major undertaking. What is more, the popular Hadoop distributions (Cloudera, Hortonworks, MapR, Pivotal) lag behind the current Apache version of Pig by a year or more. In other words: adding a simple UDF to Pig can take a year and a half to actually reach users.

2) DataFu releases every month or two, as new features are added. Using DataFu is as simple as grabbing a jar file, so it isn't tied to a distribution (although several include it). One needn't upgrade Pig to use new features of DataFu.

This leads to an interesting situation... take PIG-3939, which added SPRINTF as a Pig builtin, in Pig 0.14, released in November, 2014. In practice, Pig users wanting SPRINTF must wait for the distributions to include Pig 0.14, which could take a year or more. When you factor in the six-month time between the patch's submission (June, 2014) and release (November, 2014), it could take two or more years for most users to get the SPRINTF feature.

Issue:

For me, this begs the question... why don't we add SPRINTF to DataFu, so that older versions of Pig (before 0.14) can have this feature? I happen to be in a situation where we're using CDH 5.2/Pig 0.12, and we need SPRINTF. I think this is a common situation.

So the question I'm raising is: Is it appropriate to implement UDF/builtin features of Pig in DataFu, to enable older versions of Pig to use them and dramatically decrease the delay until users can start using them?

In the case of SPRINTF, I believe we should add it to DataFu. I've created DATAFU-85 to track this issue. The Hadoop distributions won't ship 0.14 for some time. The majority of Hadoop users will be using Pig 0.12 for several years. Adding this kind of feature will benefit users in the meanwhile.

> Add SPRINTF to provide this functionality to Pig < 0.14.0
> ---------------------------------------------------------
>
>                 Key: DATAFU-85
>                 URL: https://issues.apache.org/jira/browse/DATAFU-85
>             Project: DataFu
>          Issue Type: Bug
>            Reporter: Russell Jurney
>            Assignee: Russell Jurney
>
> I need SPRINTF in DataFu for a book I'm working on. I'd like to add this to DataFu so that CDH, HDP, MapR, etc. users can use SPRINTF as soon as DataFu cuts a new release.
> See PIG-3939
> Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)