You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by Kai Londenberg <ka...@googlemail.com> on 2013/02/18 11:27:07 UTC

pig 0.11 candidate 2 feedback: Several problems

Hi,

I just subscribed to the dev mailing list in order to give you some
feedback on pig 0.11 candidate 2.

The following three issues are currently present in 0.11 candidate 2:

https://issues.apache.org/jira/browse/PIG-3144 - 'Erroneous map entry
alias resolution leading to "Duplicate schema alias" errors'
https://issues.apache.org/jira/browse/PIG-3194 - Changes to
ObjectSerializer.java break compatibility with Hadoop 0.20.2
https://issues.apache.org/jira/browse/PIG-3195 - Race Condition in
PhysicalOperator leads to ExecException "Error while trying to get
next result in POStream"

The last two of these are easily solveable (see the tickets for
details on that). The first one is a bit trickier I think, but at
least there is a workaround for it (pass Map fields through an UDF)

In my personal opinion, each of these problems is pretty severe, but
opinions about the importance of the MAP Datatype and STREAM Operator,
as well as Hadoop 0.20.2 compatibility might differ.

so far ..

Kai Londenberg

Re: pig 0.11 candidate 2 feedback: Several problems

Posted by Julien Le Dem <ju...@twitter.com>.
I agree we should support 0.20.2 in this case. There is significant
community need and it is a small and easy change it seems.
That said, if we're going to stop the release candidate for this, we need a
patch right away so that we can have a new release candidate immediately.
Does someone have a patch ready?


On Wednesday, February 20, 2013, Kai Londenberg wrote:

> Hi,
>
> I stronly agree with Jonathan here. If there are good reasons why you
> can't support an older version of Hadoop any more, that's one thing.
> But having to change 2 lines of code doesn't really qualify as such in
> my point of view ;)
>
> At least for me, pig support for 0.20.2 is essential - without it, I
> can't use it. If it doesn't support it, I'll have to branch pig and
> hack it myself, or stop using it.
>
> I guess, there are a lot of people still running 0.20.2 Clusters. If
> you really have lots of data stored on HDFS and a continuously busy
> cluster, an upgrade is nothing you do "just because".
>
>
> 2013/2/20 Jonathan Coveney <jcoveney@gmail.com <javascript:;>>:
> > I agree that we shouldn't have to support old versions forever. That
> said,
> > I also don't think we should be too blase about supporting older versions
> > where it is not odious to do so. We have a lot of competition in the
> > language space and the broader the versions we can support, the better
> > (assuming it isn't too odious to do so). In this case, I don't think it
> > should be too hard to change ObjectSerializer so that the commons-codec
> > code used is compatible with both versions...we could just in-line some
> of
> > the Base64 code, and comment accordingly.
> >
> > That said, we also should be clear about what versions we support, but
> 6-12
> > months seems short. The upgrade cycles on Hadoop are really, really long.
> >
> >
> > 2013/2/20 Prashant Kommireddi <pr...@gmail.com>
> >
> >> Agreed, that makes sense. Probably supporting older hadoop version for
> a 1
> >> or 2 pig releases before moving to a newer/stable version?
> >>
> >> Having said that, should we use 0.11 period to communicate the same to
> the
> >> community and start moving on 0.12 onwards? I know we are way past 6-12
> >> months (1-2 release) time frame with 0.20.2, but we also need to make
> sure
> >> users are aware and plan accordingly.
> >>
> >> I'd also be interested to hear how other projects (Hive, Oozie) are
> >> handling this.
> >>
> >> -Prashant
> >>
> >> On Tue, Feb 19, 2013 at 3:22 PM, Olga Natkovich <onatkovich@yahoo.com
> >> >wrote:
> >>
> >> > It seems that for each Pig release we need to agree and clearly state
> >> > which Hadoop versions it will support. I guess the main question is
> how
> >> we
> >> > decide on this. Perhaps we should say that Pig no longer supports
> older
> >> > Hadoop versions once the newer one is out for at least 6-12 month to
> make
> >> > sure it is stable. I don't think we can support old versions
> >> indefinitely.
> >> > It is in everybody's interest to keep moving forward.
> >> >
> >> > Olga
> >> >
> >> >
> >> > ________________________________
> >> >  From: Prashant Kommireddi <pr...@gmail.com>
> >> > To: dev@pig.apache.org
> >> > Sent: Tuesday, February 19, 2013 10:57 AM
> >> > Subject: Re: pig 0.11 candidate 2 feedback: Several problems
> >> >
> >> > What do you guys feel about the JIRA to do with 0.20.2 compatibility
> >> > (PIG-3194)? I am interested in discussing the strategy around backward
> >> > compatibility as this is something that would haunt us each time we
> move
> >> to
> >> > the next hadoop version. For eg, we might be in a similar situation
> while
> >> > moving to Hadoop 2.0, when some of the stuff might break for 1.0.
> >> >
> >> > I feel it would be good to get this JIRA fix in for 0.11, as 0.20.2
> users
> >> > might be caught unaware. Of course, I must admit there is selfish
> >> interest
> >> > here and it's probably easier for us to have a workaround on Pig
> rather
> >> > than upgrade hadoop in all our production DCs.
> >> >
> >> > -Prashant
> >> >
> >> >
> >> > On Tue, Feb 19, 2013 at 9:54 AM, Russell Jurney <
> >> russell.jurney@gmail.com
> >> > >wrote:
> >> >
> >> > > I think someone should step up and fix the easy ones, if possible.
> >> > >
> >> > >
> >> > > On Tue, Feb 19, 2013 at 9:51 AM, Bill Graham <bi...@gmail.com>
> >> > wrote:
> >> > >
> >> > > > Thanks Kai for reporting these.
> >> > > >
> >> > > > What do people think about the severity of these issues w.r.t. Pig
> >> 11?
> >> > I
> >> > > > see a few possible options:
> >> > > >
> >> > > > 1. We include some or all of these patches in a new Pig 11 rc.
> We'd
> >> > want
> >> > > to
> >> > > > make sure that th

Re: pig 0.11 candidate 2 feedback: Several problems

Posted by Julien Le Dem <ju...@twitter.com>.
sounds good to me too.


On Fri, Mar 1, 2013 at 11:33 AM, Bill Graham <bi...@gmail.com> wrote:

> +1 to releasing Pig 0.11.1 when this is addressed. I should be able to help
> with the release again.
>
>
>
> On Fri, Mar 1, 2013 at 11:25 AM, Prashant Kommireddi <prash1784@gmail.com
> >wrote:
>
> > Hey Guys,
> >
> > I wanted to start a conversation on this again. If Kai is not looking at
> > PIG-3194 I can start working on it to get 0.11 compatible with 20.2. If
> > everyone agrees, we should roll out 0.11.1 sooner than usual and I
> > volunteer to help with it in anyway possible.
> >
> > Any objections to getting 0.11.1 out soon after 3194 is fixed?
> >
> > -Prashant
> >
> > On Wed, Feb 20, 2013 at 3:34 PM, Russell Jurney <
> russell.jurney@gmail.com
> > >wrote:
> >
> > > I stand corrected. Cool, 0.11 is good!
> > >
> > >
> > > On Wed, Feb 20, 2013 at 1:15 PM, Jarek Jarcec Cecho <jarcec@apache.org
> > > >wrote:
> > >
> > > > Just a unrelated note: The CDH3 is more closer to Hadoop 1.x than to
> > > 0.20.
> > > >
> > > > Jarcec
> > > >
> > > > On Wed, Feb 20, 2013 at 12:04:51PM -0800, Dmitriy Ryaboy wrote:
> > > > > I agree -- this is a good release. The bugs Kai pointed out should
> be
> > > > > fixed, but as they are not critical regressions, we can fix them in
> > > > 0.11.1
> > > > > (if someone wants to roll 0.11.1 the minute these fixes are
> > committed,
> > > I
> > > > > won't mind and will dutifully vote for the release).
> > > > >
> > > > > I think the Hadoop 20.2 incompatibility is unfortunate but iirc
> this
> > is
> > > > > fixable by setting HADOOP_USER_CLASSPATH_FIRST=true (was that in
> > 20.2?)
> > > > >
> > > > > FWIW Twitter's running CDH3 and this release works in our
> > environment.
> > > > >
> > > > > At this point things that block a release are critical regressions
> in
> > > > > performance or correctness.
> > > > >
> > > > > D
> > > > >
> > > > >
> > > > > On Wed, Feb 20, 2013 at 11:52 AM, Alan Gates <
> gates@hortonworks.com>
> > > > wrote:
> > > > >
> > > > > > No.  Bugs like these are supposed to be found and fixed after we
> > > branch
> > > > > > from trunk (which happened several months ago in the case of
> 0.11).
> > > >  The
> > > > > > point of RCs are to check that it's a good build, licenses are
> > right,
> > > > etc.
> > > > > >  Any bugs found this late in the game have to be seen as failures
> > of
> > > > > > earlier testing.
> > > > > >
> > > > > > Alan.
> > > > > >
> > > > > > On Feb 20, 2013, at 11:33 AM, Russell Jurney wrote:
> > > > > >
> > > > > > > Isn't the point of an RC to find and fix bugs like these>
> > > > > > >
> > > > > > >
> > > > > > > On Wed, Feb 20, 2013 at 11:31 AM, Bill Graham <
> > > billgraham@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > >> Regarding Pig 11 rc2, I propose we continue with the current
> > vote
> > > > as is
> > > > > > >> (which closes today EOD). Patches for 0.20.2 issues can be
> > rolled
> > > > into a
> > > > > > >> Pig 0.11.1 release whenever they're available and tested.
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >> On Wed, Feb 20, 2013 at 9:24 AM, Olga Natkovich <
> > > > onatkovich@yahoo.com
> > > > > > >>> wrote:
> > > > > > >>
> > > > > > >>> I agree that supporting as much as we can is a good goal. The
> > > > issue is
> > > > > > >> who
> > > > > > >>> is going to be testing against all these versions? We found
> the
> > > > issues
> > > > > > >>> under discussion because of a customer report, not because we
> > > > > > >> consistently
> > > > > > >>> test against all versions. Perhaps when we decide which
> > versions
> > > to
> > > > > > >> support
> > > > > > >>> for next release we need also to agree who is going to be
> > testing
> > > > and
> > > > > > >>> maintaining compatibility with a particular version.
> > > > > > >>>
> > > > > > >>> For instance since Hadoop 23 compatibility is important for
> us
> > at
> > > > Yahoo
> > > > > > >> we
> > > > > > >>> have been maintaining compatibility with this version for
> 0.9,
> > > > 0.10 and
> > > > > > >>> will do the same for 0.11 and going forward. I think we would
> > > need
> > > > > > others
> > > > > > >>> to step in and claim the versions of their interest.
> > > > > > >>>
> > > > > > >>> Olga
> > > > > > >>>
> > > > > > >>>
> > > > > > >>> ________________________________
> > > > > > >>> From: Kai Londenberg <ka...@googlemail.com>
> > > > > > >>> To: dev@pig.apache.org
> > > > > > >>> Sent: Wednesday, February 20, 2013 1:51 AM
> > > > > > >>> Subject: Re: pig 0.11 candidate 2 feedback: Several problems
> > > > > > >>>
> > > > > > >>> Hi,
> > > > > > >>>
> > > > > > >>> I stronly agree with Jonathan here. If there are good reasons
> > why
> > > > you
> > > > > > >>> can't support an older version of Hadoop any more, that's one
> > > > thing.
> > > > > > >>> But having to change 2 lines of code doesn't really qualify
> as
> > > > such in
> > > > > > >>> my point of view ;)
> > > > > > >>>
> > > > > > >>> At least for me, pig support for 0.20.2 is essential -
> without
> > > it,
> > > > I
> > > > > > >>> can't use it. If it doesn't support it, I'll have to branch
> pig
> > > and
> > > > > > >>> hack it myself, or stop using it.
> > > > > > >>>
> > > > > > >>> I guess, there are a lot of people still running 0.20.2
> > Clusters.
> > > > If
> > > > > > >>> you really have lots of data stored on HDFS and a
> continuously
> > > busy
> > > > > > >>> cluster, an upgrade is nothing you do "just because".
> > > > > > >>>
> > > > > > >>>
> > > > > > >>> 2013/2/20 Jonathan Coveney <jc...@gmail.com>:
> > > > > > >>>> I agree that we shouldn't have to support old versions
> > forever.
> > > > That
> > > > > > >>> said,
> > > > > > >>>> I also don't think we should be too blase about supporting
> > older
> > > > > > >> versions
> > > > > > >>>> where it is not odious to do so. We have a lot of
> competition
> > in
> > > > the
> > > > > > >>>> language space and the broader the versions we can support,
> > the
> > > > better
> > > > > > >>>> (assuming it isn't too odious to do so). In this case, I
> don't
> > > > think
> > > > > > it
> > > > > > >>>> should be too hard to change ObjectSerializer so that the
> > > > > > commons-codec
> > > > > > >>>> code used is compatible with both versions...we could just
> > > in-line
> > > > > > some
> > > > > > >>> of
> > > > > > >>>> the Base64 code, and comment accordingly.
> > > > > > >>>>
> > > > > > >>>> That said, we also should be clear about what versions we
> > > > support, but
> > > > > > >>> 6-12
> > > > > > >>>> months seems short. The upgrade cycles on Hadoop are really,
> > > > really
> > > > > > >> long.
> > > > > > >>>>
> > > > > > >>>>
> > > > > > >>>> 2013/2/20 Prashant Kommireddi <pr...@gmail.com>
> > > > > > >>>>
> > > > > > >>>>> Agreed, that makes sense. Probably supporting older hadoop
> > > > version
> > > > > > for
> > > > > > >>> a 1
> > > > > > >>>>> or 2 pig releases before moving to a newer/stable version?
> > > > > > >>>>>
> > > > > > >>>>> Having said that, should we use 0.11 period to communicate
> > the
> > > > same
> > > > > > to
> > > > > > >>> the
> > > > > > >>>>> community and start moving on 0.12 onwards? I know we are
> way
> > > > past
> > > > > > >> 6-12
> > > > > > >>>>> months (1-2 release) time frame with 0.20.2, but we also
> need
> > > to
> > > > make
> > > > > > >>> sure
> > > > > > >>>>> users are aware and plan accordingly.
> > > > > > >>>>>
> > > > > > >>>>> I'd also be interested to hear how other projects (Hive,
> > Oozie)
> > > > are
> > > > > > >>>>> handling this.
> > > > > > >>>>>
> > > > > > >>>>> -Prashant
> > > > > > >>>>>
> > > > > > >>>>> On Tue, Feb 19, 2013 at 3:22 PM, Olga Natkovich <
> > > > > > onatkovich@yahoo.com
> > > > > > >>>>>> wrote:
> > > > > > >>>>>
> > > > > > >>>>>> It seems that for each Pig release we need to agree and
> > > clearly
> > > > > > >> state
> > > > > > >>>>>> which Hadoop versions it will support. I guess the main
> > > > question is
> > > > > > >>> how
> > > > > > >>>>> we
> > > > > > >>>>>> decide on this. Perhaps we should say that Pig no longer
> > > > supports
> > > > > > >>> older
> > > > > > >>>>>> Hadoop versions once the newer one is out for at least
> 6-12
> > > > month to
> > > > > > >>> make
> > > > > > >>>>>> sure it is stable. I don't think we can support old
> versions
> > > > > > >>>>> indefinitely.
> > > > > > >>>>>> It is in everybody's interest to keep moving forward.
> > > > > > >>>>>>
> > > > > > >>>>>> Olga
> > > > > > >>>>>>
> > > > > > >>>>>>
> > > > > > >>>>>> ________________________________
> > > > > > >>>>>> From: Prashant Kommireddi <pr...@gmail.com>
> > > > > > >>>>>> To: dev@pig.apache.org
> > > > > > >>>>>> Sent: Tuesday, February 19, 2013 10:57 AM
> > > > > > >>>>>> Subject: Re: pig 0.11 candidate 2 feedback: Several
> problems
> > > > > > >>>>>>
> > > > > > >>>>>> What do you guys feel about the JIRA to do with 0.20.2
> > > > compatibility
> > > > > > >>>>>> (PIG-3194)? I am interested in discussing the strategy
> > around
> > > > > > >> backward
> > > > > > >>>>>> compatibility as this is something that would haunt us
> each
> > > > time we
> > > > > > >>> move
> > > > > > >>>>> to
> > > > > > >>>>>> the next hadoop version. For eg, we might be in a similar
> > > > situation
> > > > > > >>> while
> > > > > > >>>>>> moving to Hadoop 2.0, when some of the stuff might break
> for
> > > > 1.0.
> > > > > > >>>>>>
> > > > > > >>>>>> I feel it would be good to get this JIRA fix in for 0.11,
> as
> > > > 0.20.2
> > > > > > >>> users
> > > > > > >>>>>> might be caught unaware. Of course, I must admit there is
> > > > selfish
> > > > > > >>>>> interest
> > > > > > >>>>>> here and it's probably easier for us to have a workaround
> on
> > > Pig
> > > > > > >>> rather
> > > > > > >>>>>> than upgrade hadoop in all our production DCs.
> > > > > > >>>>>>
> > > > > > >>>>>> -Prashant
> > > > > > >>>>>>
> > > > > > >>>>>>
> > > > > > >>>>>> On Tue, Feb 19, 2013 at 9:54 AM, Russell Jurney <
> > > > > > >>>>> russell.jurney@gmail.com
> > > > > > >>>>>>> wrote:
> > > > > > >>>>>>
> > > > > > >>>>>>> I think someone should step up and fix the easy ones, if
> > > > possible.
> > > > > > >>>>>>>
> > > > > > >>>>>>>
> > > > > > >>>>>>> On Tue, Feb 19, 2013 at 9:51 AM, Bill Graham <
> > > > > > >> billgraham@gmail.com>
> > > > > > >>>>>> wrote:
> > > > > > >>>>>>>
> > > > > > >>>>>>>> Thanks Kai for reporting these.
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> What do people think about the severity of these issues
> > > w.r.t.
> > > > > > >> Pig
> > > > > > >>>>> 11?
> > > > > > >>>>>> I
> > > > > > >>>>>>>> see a few possible options:
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> 1. We include some or all of these patches in a new Pig
> 11
> > > rc.
> > > > > > >>> We'd
> > > > > > >>>>>> want
> > > > > > >>>>>>> to
> > > > > > >>>>>>>> make sure that they don't destabilize the current
> branch.
> > > This
> > > > > > >>>>> approach
> > > > > > >>>>>>>> makes sense if we think Pig 11 wouldn't be a good
> release
> > > > > > >> without
> > > > > > >>> one
> > > > > > >>>>>> or
> > > > > > >>>>>>>> more of these included.
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> 2. We continue with the Pig 11 release without these,
> but
> > > then
> > > > > > >>>>> include
> > > > > > >>>>>>> one
> > > > > > >>>>>>>> or more in a 0.11.1 release.
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> 3. We continue with the Pig 11 release without these,
> but
> > > then
> > > > > > >>>>> include
> > > > > > >>>>>>> them
> > > > > > >>>>>>>> in a 0.12 release.
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> Jon has a patch for the MAP issue
> > > > > > >>>>>>>> (PIG-3144<
> https://issues.apache.org/jira/browse/PIG-3144
> > >)
> > > > > > >>>>>>>> ready, which seems like the most pressing of the three
> to
> > > me.
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> thanks,
> > > > > > >>>>>>>> Bill
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> On Mon, Feb 18, 2013 at 2:27 AM, Kai Londenberg <
> > > > > > >>>>>>>> kai.londenberg@googlemail.com> wrote:
> > > > > > >>>>>>>>
> > > > > > >>>>>>>>> Hi,
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>> I just subscribed to the dev mailing list in order to
> > give
> > > > you
> > > > > > >>> some
> > > > > > >>>>>>>>> feedback on pig 0.11 candidate 2.
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>> The following three issues are currently present in
> 0.11
> > > > > > >>> candidate
> > > > > > >>>>> 2:
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>> https://issues.apache.org/jira/browse/PIG-3144 -
> > > 'Erroneous
> > > > > > >> map
> > > > > > >>>>>> entry
> > > > > > >>>>>>>>> alias resolution leading to "Duplicate schema alias"
> > > errors'
> > > > > > >>>>>>>>> https://issues.apache.org/jira/browse/PIG-3194 -
> Changes
> > > to
> > > > > > >>>>>>>>> ObjectSerializer.java break compatibility with Hadoop
> > > 0.20.2
> > > > > > >>>>>>>>> https://issues.apache.org/jira/browse/PIG-3195 - Race
> > > > > > >>> Condition in
> > > > > > >>>>>>>>> PhysicalOperator leads to ExecException "Error while
> > trying
> > > > to
> > > > > > >>> get
> > > > > > >>>>>>>>> next result in POStream"
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>> The last two of these are easily solveable (see the
> > tickets
> > > > > > >> for
> > > > > > >>>>>>>>> details on that). The first one is a bit trickier I
> > think,
> > > > but
> > > > > > >>> at
> > > > > > >>>>>>>>> least there is a workaround for it (pass Map fields
> > through
> > > > an
> > > > > > >>> UDF)
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>> In my personal opinion, each of these problems is
> pretty
> > > > > > >> severe,
> > > > > > >>>>> but
> > > > > > >>>>>>>>> opinions about the importance of the MAP Datatype and
> > > STREAM
> > > > > > >>>>>> Operator,
> > > > > > >>>>>>>>> as well as Hadoop 0.20.2 compatibility might differ.
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>> so far ..
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>> Kai Londenberg
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>
> > > > > > >>>>>>>>
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> --
> > > > > > >>>>>>>> *Note that I'm no longer using my Yahoo! email address.
> > > Please
> > > > > > >>> email
> > > > > > >>>>> me
> > > > > > >>>>>>> at
> > > > > > >>>>>>>> billgraham@gmail.com going forward.*
> > > > > > >>>>>>>>
> > > > > > >>>>>>>
> > > > > > >>>>>>>
> > > > > > >>>>>>>
> > > > > > >>>>>>> --
> > > > > > >>>>>>> Russell Jurney twitter.com/rjurney
> > russell.jurney@gmail.com
> > > > > > >>>>>>> datasyndrome.com
> > > > > > >>>>>>>
> > > > > > >>>>>>
> > > > > > >>>>>
> > > > > > >>>
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >> --
> > > > > > >> *Note that I'm no longer using my Yahoo! email address. Please
> > > > email me
> > > > > > at
> > > > > > >> billgraham@gmail.com going forward.*
> > > > > > >>
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
> > > > > > datasyndrome.com
> > > > > >
> > > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
> > > datasyndrome.com
> > >
> >
>

Re: pig 0.11 candidate 2 feedback: Several problems

Posted by Prashant Kommireddi <pr...@gmail.com>.
Looks like all outstanding 0.11.1 critical bugs are fixed. Time for an
RC? Please let me know if I can help.

On Fri, Mar 8, 2013 at 3:51 PM, Dmitriy Ryaboy <dv...@gmail.com> wrote:

> Looks like Lohit found a critical bug we should fix for 11.1:
> https://issues.apache.org/jira/browse/PIG-3241(only observed in hadoop
> 2.0)
>
> D
>
>
> On Wed, Mar 6, 2013 at 12:57 PM, Prashant Kommireddi <prash1784@gmail.com
> >wrote:
>
> > Dmitriy, are the gc fixes all in for 0.11.1? PIG-3148 and PIG-3212 are
> the
> > 2 JIRAs I know were fixed, any others?
> >
> > I have a patch up for 3194, I think we should be good for a release once
> > that makes it in.
> >
> > -Prashant
> >
> > On Sat, Mar 2, 2013 at 11:16 AM, Prashant Kommireddi <
> prash1784@gmail.com
> > >wrote:
> >
> > > Great.
> > >
> > > I have commented regarding a possible approach for PIG-3194
> > > http://goo.gl/UQ3zs. Please take a look when you folks have a chance.
> > >
> > >
> > > On Fri, Mar 1, 2013 at 7:00 PM, Dmitriy Ryaboy <dv...@gmail.com>
> > wrote:
> > >
> > >> I'd like to get the gc fix in as well, but looks like Rohini is about
> to
> > >> commit it so we are good there.
> > >>
> > >> On Mar 1, 2013, at 11:33 AM, Bill Graham <bi...@gmail.com>
> wrote:
> > >>
> > >> > +1 to releasing Pig 0.11.1 when this is addressed. I should be able
> to
> > >> help
> > >> > with the release again.
> > >> >
> > >> >
> > >> >
> > >> > On Fri, Mar 1, 2013 at 11:25 AM, Prashant Kommireddi <
> > >> prash1784@gmail.com>wrote:
> > >> >
> > >> >> Hey Guys,
> > >> >>
> > >> >> I wanted to start a conversation on this again. If Kai is not
> looking
> > >> at
> > >> >> PIG-3194 I can start working on it to get 0.11 compatible with
> 20.2.
> > If
> > >> >> everyone agrees, we should roll out 0.11.1 sooner than usual and I
> > >> >> volunteer to help with it in anyway possible.
> > >> >>
> > >> >> Any objections to getting 0.11.1 out soon after 3194 is fixed?
> > >> >>
> > >> >> -Prashant
> > >> >>
> > >> >> On Wed, Feb 20, 2013 at 3:34 PM, Russell Jurney <
> > >> russell.jurney@gmail.com
> > >> >>> wrote:
> > >> >>
> > >> >>> I stand corrected. Cool, 0.11 is good!
> > >> >>>
> > >> >>>
> > >> >>> On Wed, Feb 20, 2013 at 1:15 PM, Jarek Jarcec Cecho <
> > >> jarcec@apache.org
> > >> >>>> wrote:
> > >> >>>
> > >> >>>> Just a unrelated note: The CDH3 is more closer to Hadoop 1.x than
> > to
> > >> >>> 0.20.
> > >> >>>>
> > >> >>>> Jarcec
> > >> >>>>
> > >> >>>> On Wed, Feb 20, 2013 at 12:04:51PM -0800, Dmitriy Ryaboy wrote:
> > >> >>>>> I agree -- this is a good release. The bugs Kai pointed out
> should
> > >> be
> > >> >>>>> fixed, but as they are not critical regressions, we can fix them
> > in
> > >> >>>> 0.11.1
> > >> >>>>> (if someone wants to roll 0.11.1 the minute these fixes are
> > >> >> committed,
> > >> >>> I
> > >> >>>>> won't mind and will dutifully vote for the release).
> > >> >>>>>
> > >> >>>>> I think the Hadoop 20.2 incompatibility is unfortunate but iirc
> > this
> > >> >> is
> > >> >>>>> fixable by setting HADOOP_USER_CLASSPATH_FIRST=true (was that in
> > >> >> 20.2?)
> > >> >>>>>
> > >> >>>>> FWIW Twitter's running CDH3 and this release works in our
> > >> >> environment.
> > >> >>>>>
> > >> >>>>> At this point things that block a release are critical
> regressions
> > >> in
> > >> >>>>> performance or correctness.
> > >> >>>>>
> > >> >>>>> D
> > >> >>>>>
> > >> >>>>>
> > >> >>>>> On Wed, Feb 20, 2013 at 11:52 AM, Alan Gates <
> > gates@hortonworks.com
> > >> >
> > >> >>>> wrote:
> > >> >>>>>
> > >> >>>>>> No.  Bugs like these are supposed to be found and fixed after
> we
> > >> >>> branch
> > >> >>>>>> from trunk (which happened several months ago in the case of
> > 0.11).
> > >> >>>> The
> > >> >>>>>> point of RCs are to check that it's a good build, licenses are
> > >> >> right,
> > >> >>>> etc.
> > >> >>>>>> Any bugs found this late in the game have to be seen as
> failures
> > >> >> of
> > >> >>>>>> earlier testing.
> > >> >>>>>>
> > >> >>>>>> Alan.
> > >> >>>>>>
> > >> >>>>>> On Feb 20, 2013, at 11:33 AM, Russell Jurney wrote:
> > >> >>>>>>
> > >> >>>>>>> Isn't the point of an RC to find and fix bugs like these>
> > >> >>>>>>>
> > >> >>>>>>>
> > >> >>>>>>> On Wed, Feb 20, 2013 at 11:31 AM, Bill Graham <
> > >> >>> billgraham@gmail.com>
> > >> >>>>>> wrote:
> > >> >>>>>>>
> > >> >>>>>>>> Regarding Pig 11 rc2, I propose we continue with the current
> > >> >> vote
> > >> >>>> as is
> > >> >>>>>>>> (which closes today EOD). Patches for 0.20.2 issues can be
> > >> >> rolled
> > >> >>>> into a
> > >> >>>>>>>> Pig 0.11.1 release whenever they're available and tested.
> > >> >>>>>>>>
> > >> >>>>>>>>
> > >> >>>>>>>>
> > >> >>>>>>>> On Wed, Feb 20, 2013 at 9:24 AM, Olga Natkovich <
> > >> >>>> onatkovich@yahoo.com
> > >> >>>>>>>>> wrote:
> > >> >>>>>>>>
> > >> >>>>>>>>> I agree that supporting as much as we can is a good goal.
> The
> > >> >>>> issue is
> > >> >>>>>>>> who
> > >> >>>>>>>>> is going to be testing against all these versions? We found
> > the
> > >> >>>> issues
> > >> >>>>>>>>> under discussion because of a customer report, not because
> we
> > >> >>>>>>>> consistently
> > >> >>>>>>>>> test against all versions. Perhaps when we decide which
> > >> >> versions
> > >> >>> to
> > >> >>>>>>>> support
> > >> >>>>>>>>> for next release we need also to agree who is going to be
> > >> >> testing
> > >> >>>> and
> > >> >>>>>>>>> maintaining compatibility with a particular version.
> > >> >>>>>>>>>
> > >> >>>>>>>>> For instance since Hadoop 23 compatibility is important for
> us
> > >> >> at
> > >> >>>> Yahoo
> > >> >>>>>>>> we
> > >> >>>>>>>>> have been maintaining compatibility with this version for
> 0.9,
> > >> >>>> 0.10 and
> > >> >>>>>>>>> will do the same for 0.11 and going forward. I think we
> would
> > >> >>> need
> > >> >>>>>> others
> > >> >>>>>>>>> to step in and claim the versions of their interest.
> > >> >>>>>>>>>
> > >> >>>>>>>>> Olga
> > >> >>>>>>>>>
> > >> >>>>>>>>>
> > >> >>>>>>>>> ________________________________
> > >> >>>>>>>>> From: Kai Londenberg <ka...@googlemail.com>
> > >> >>>>>>>>> To: dev@pig.apache.org
> > >> >>>>>>>>> Sent: Wednesday, February 20, 2013 1:51 AM
> > >> >>>>>>>>> Subject: Re: pig 0.11 candidate 2 feedback: Several problems
> > >> >>>>>>>>>
> > >> >>>>>>>>> Hi,
> > >> >>>>>>>>>
> > >> >>>>>>>>> I stronly agree with Jonathan here. If there are good
> reasons
> > >> >> why
> > >> >>>> you
> > >> >>>>>>>>> can't support an older version of Hadoop any more, that's
> one
> > >> >>>> thing.
> > >> >>>>>>>>> But having to change 2 lines of code doesn't really qualify
> as
> > >> >>>> such in
> > >> >>>>>>>>> my point of view ;)
> > >> >>>>>>>>>
> > >> >>>>>>>>> At least for me, pig support for 0.20.2 is essential -
> without
> > >> >>> it,
> > >> >>>> I
> > >> >>>>>>>>> can't use it. If it doesn't support it, I'll have to branch
> > pig
> > >> >>> and
> > >> >>>>>>>>> hack it myself, or stop using it.
> > >> >>>>>>>>>
> > >> >>>>>>>>> I guess, there are a lot of people still running 0.20.2
> > >> >> Clusters.
> > >> >>>> If
> > >> >>>>>>>>> you really have lots of data stored on HDFS and a
> continuously
> > >> >>> busy
> > >> >>>>>>>>> cluster, an upgrade is nothing you do "just because".
> > >> >>>>>>>>>
> > >> >>>>>>>>>
> > >> >>>>>>>>> 2013/2/20 Jonathan Coveney <jc...@gmail.com>:
> > >> >>>>>>>>>> I agree that we shouldn't have to support old versions
> > >> >> forever.
> > >> >>>> That
> > >> >>>>>>>>> said,
> > >> >>>>>>>>>> I also don't think we should be too blase about supporting
> > >> >> older
> > >> >>>>>>>> versions
> > >> >>>>>>>>>> where it is not odious to do so. We have a lot of
> competition
> > >> >> in
> > >> >>>> the
> > >> >>>>>>>>>> language space and the broader the versions we can support,
> > >> >> the
> > >> >>>> better
> > >> >>>>>>>>>> (assuming it isn't too odious to do so). In this case, I
> > don't
> > >> >>>> think
> > >> >>>>>> it
> > >> >>>>>>>>>> should be too hard to change ObjectSerializer so that the
> > >> >>>>>> commons-codec
> > >> >>>>>>>>>> code used is compatible with both versions...we could just
> > >> >>> in-line
> > >> >>>>>> some
> > >> >>>>>>>>> of
> > >> >>>>>>>>>> the Base64 code, and comment accordingly.
> > >> >>>>>>>>>>
> > >> >>>>>>>>>> That said, we also should be clear about what versions we
> > >> >>>> support, but
> > >> >>>>>>>>> 6-12
> > >> >>>>>>>>>> months seems short. The upgrade cycles on Hadoop are
> really,
> > >> >>>> really
> > >> >>>>>>>> long.
> > >> >>>>>>>>>>
> > >> >>>>>>>>>>
> > >> >>>>>>>>>> 2013/2/20 Prashant Kommireddi <pr...@gmail.com>
> > >> >>>>>>>>>>
> > >> >>>>>>>>>>> Agreed, that makes sense. Probably supporting older hadoop
> > >> >>>> version
> > >> >>>>>> for
> > >> >>>>>>>>> a 1
> > >> >>>>>>>>>>> or 2 pig releases before moving to a newer/stable version?
> > >> >>>>>>>>>>>
> > >> >>>>>>>>>>> Having said that, should we use 0.11 period to communicate
> > >> >> the
> > >> >>>> same
> > >> >>>>>> to
> > >> >>>>>>>>> the
> > >> >>>>>>>>>>> community and start moving on 0.12 onwards? I know we are
> > way
> > >> >>>> past
> > >> >>>>>>>> 6-12
> > >> >>>>>>>>>>> months (1-2 release) time frame with 0.20.2, but we also
> > need
> > >> >>> to
> > >> >>>> make
> > >> >>>>>>>>> sure
> > >> >>>>>>>>>>> users are aware and plan accordingly.
> > >> >>>>>>>>>>>
> > >> >>>>>>>>>>> I'd also be interested to hear how other projects (Hive,
> > >> >> Oozie)
> > >> >>>> are
> > >> >>>>>>>>>>> handling this.
> > >> >>>>>>>>>>>
> > >> >>>>>>>>>>> -Prashant
> > >> >>>>>>>>>>>
> > >> >>>>>>>>>>> On Tue, Feb 19, 2013 at 3:22 PM, Olga Natkovich <
> > >> >>>>>> onatkovich@yahoo.com
> > >> >>>>>>>>>>>> wrote:
> > >> >>>>>>>>>>>
> > >> >>>>>>>>>>>> It seems that for each Pig release we need to agree and
> > >> >>> clearly
> > >> >>>>>>>> state
> > >> >>>>>>>>>>>> which Hadoop versions it will support. I guess the main
> > >> >>>> question is
> > >> >>>>>>>>> how
> > >> >>>>>>>>>>> we
> > >> >>>>>>>>>>>> decide on this. Perhaps we should say that Pig no longer
> > >> >>>> supports
> > >> >>>>>>>>> older
> > >> >>>>>>>>>>>> Hadoop versions once the newer one is out for at least
> 6-12
> > >> >>>> month to
> > >> >>>>>>>>> make
> > >> >>>>>>>>>>>> sure it is stable. I don't think we can support old
> > versions
> > >> >>>>>>>>>>> indefinitely.
> > >> >>>>>>>>>>>> It is in everybody's interest to keep moving forward.
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>>> Olga
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>>> ________________________________
> > >> >>>>>>>>>>>> From: Prashant Kommireddi <pr...@gmail.com>
> > >> >>>>>>>>>>>> To: dev@pig.apache.org
> > >> >>>>>>>>>>>> Sent: Tuesday, February 19, 2013 10:57 AM
> > >> >>>>>>>>>>>> Subject: Re: pig 0.11 candidate 2 feedback: Several
> > problems
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>>> What do you guys feel about the JIRA to do with 0.20.2
> > >> >>>> compatibility
> > >> >>>>>>>>>>>> (PIG-3194)? I am interested in discussing the strategy
> > >> >> around
> > >> >>>>>>>> backward
> > >> >>>>>>>>>>>> compatibility as this is something that would haunt us
> each
> > >> >>>> time we
> > >> >>>>>>>>> move
> > >> >>>>>>>>>>> to
> > >> >>>>>>>>>>>> the next hadoop version. For eg, we might be in a similar
> > >> >>>> situation
> > >> >>>>>>>>> while
> > >> >>>>>>>>>>>> moving to Hadoop 2.0, when some of the stuff might break
> > for
> > >> >>>> 1.0.
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>>> I feel it would be good to get this JIRA fix in for 0.11,
> > as
> > >> >>>> 0.20.2
> > >> >>>>>>>>> users
> > >> >>>>>>>>>>>> might be caught unaware. Of course, I must admit there is
> > >> >>>> selfish
> > >> >>>>>>>>>>> interest
> > >> >>>>>>>>>>>> here and it's probably easier for us to have a workaround
> > on
> > >> >>> Pig
> > >> >>>>>>>>> rather
> > >> >>>>>>>>>>>> than upgrade hadoop in all our production DCs.
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>>> -Prashant
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>>> On Tue, Feb 19, 2013 at 9:54 AM, Russell Jurney <
> > >> >>>>>>>>>>> russell.jurney@gmail.com
> > >> >>>>>>>>>>>>> wrote:
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>>>> I think someone should step up and fix the easy ones, if
> > >> >>>> possible.
> > >> >>>>>>>>>>>>>
> > >> >>>>>>>>>>>>>
> > >> >>>>>>>>>>>>> On Tue, Feb 19, 2013 at 9:51 AM, Bill Graham <
> > >> >>>>>>>> billgraham@gmail.com>
> > >> >>>>>>>>>>>> wrote:
> > >> >>>>>>>>>>>>>
> > >> >>>>>>>>>>>>>> Thanks Kai for reporting these.
> > >> >>>>>>>>>>>>>>
> > >> >>>>>>>>>>>>>> What do people think about the severity of these issues
> > >> >>> w.r.t.
> > >> >>>>>>>> Pig
> > >> >>>>>>>>>>> 11?
> > >> >>>>>>>>>>>> I
> > >> >>>>>>>>>>>>>> see a few possible options:
> > >> >>>>>>>>>>>>>>
> > >> >>>>>>>>>>>>>> 1. We include some or all of these patches in a new Pig
> > 11
> > >> >>> rc.
> > >> >>>>>>>>> We'd
> > >> >>>>>>>>>>>> want
> > >> >>>>>>>>>>>>> to
> > >> >>>>>>>>>>>>>> make sure that they don't destabilize the current
> branch.
> > >> >>> This
> > >> >>>>>>>>>>> approach
> > >> >>>>>>>>>>>>>> makes sense if we think Pig 11 wouldn't be a good
> release
> > >> >>>>>>>> without
> > >> >>>>>>>>> one
> > >> >>>>>>>>>>>> or
> > >> >>>>>>>>>>>>>> more of these included.
> > >> >>>>>>>>>>>>>>
> > >> >>>>>>>>>>>>>> 2. We continue with the Pig 11 release without these,
> but
> > >> >>> then
> > >> >>>>>>>>>>> include
> > >> >>>>>>>>>>>>> one
> > >> >>>>>>>>>>>>>> or more in a 0.11.1 release.
> > >> >>>>>>>>>>>>>>
> > >> >>>>>>>>>>>>>> 3. We continue with the Pig 11 release without these,
> but
> > >> >>> then
> > >> >>>>>>>>>>> include
> > >> >>>>>>>>>>>>> them
> > >> >>>>>>>>>>>>>> in a 0.12 release.
> > >> >>>>>>>>>>>>>>
> > >> >>>>>>>>>>>>>> Jon has a patch for the MAP issue
> > >> >>>>>>>>>>>>>> (PIG-3144<
> https://issues.apache.org/jira/browse/PIG-3144
> > >> >>> )
> > >> >>>>>>>>>>>>>> ready, which seems like the most pressing of the three
> to
> > >> >>> me.
> > >> >>>>>>>>>>>>>>
> > >> >>>>>>>>>>>>>> thanks,
> > >> >>>>>>>>>>>>>> Bill
> > >> >>>>>>>>>>>>>>
> > >> >>>>>>>>>>>>>> On Mon, Feb 18, 2013 at 2:27 AM, Kai Londenberg <
> > >> >>>>>>>>>>>>>> kai.londenberg@googlemail.com> wrote:
> > >> >>>>>>>>>>>>>>
> > >> >>>>>>>>>>>>>>> Hi,
> > >> >>>>>>>>>>>>>>>
> > >> >>>>>>>>>>>>>>> I just subscribed to the dev mailing list in order to
> > >> >> give
> > >> >>>> you
> > >> >>>>>>>>> some
> > >> >>>>>>>>>>>>>>> feedback on pig 0.11 candidate 2.
> > >> >>>>>>>>>>>>>>>
> > >> >>>>>>>>>>>>>>> The following three issues are currently present in
> 0.11
> > >> >>>>>>>>> candidate
> > >> >>>>>>>>>>> 2:
> > >> >>>>>>>>>>>>>>>
> > >> >>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/PIG-3144 -
> > >> >>> 'Erroneous
> > >> >>>>>>>> map
> > >> >>>>>>>>>>>> entry
> > >> >>>>>>>>>>>>>>> alias resolution leading to "Duplicate schema alias"
> > >> >>> errors'
> > >> >>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/PIG-3194 -
> > Changes
> > >> >>> to
> > >> >>>>>>>>>>>>>>> ObjectSerializer.java break compatibility with Hadoop
> > >> >>> 0.20.2
> > >> >>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/PIG-3195 - Race
> > >> >>>>>>>>> Condition in
> > >> >>>>>>>>>>>>>>> PhysicalOperator leads to ExecException "Error while
> > >> >> trying
> > >> >>>> to
> > >> >>>>>>>>> get
> > >> >>>>>>>>>>>>>>> next result in POStream"
> > >> >>>>>>>>>>>>>>>
> > >> >>>>>>>>>>>>>>> The last two of these are easily solveable (see the
> > >> >> tickets
> > >> >>>>>>>> for
> > >> >>>>>>>>>>>>>>> details on that). The first one is a bit trickier I
> > >> >> think,
> > >> >>>> but
> > >> >>>>>>>>> at
> > >> >>>>>>>>>>>>>>> least there is a workaround for it (pass Map fields
> > >> >> through
> > >> >>>> an
> > >> >>>>>>>>> UDF)
> > >> >>>>>>>>>>>>>>>
> > >> >>>>>>>>>>>>>>> In my personal opinion, each of these problems is
> pretty
> > >> >>>>>>>> severe,
> > >> >>>>>>>>>>> but
> > >> >>>>>>>>>>>>>>> opinions about the importance of the MAP Datatype and
> > >> >>> STREAM
> > >> >>>>>>>>>>>> Operator,
> > >> >>>>>>>>>>>>>>> as well as Hadoop 0.20.2 compatibility might differ.
> > >> >>>>>>>>>>>>>>>
> > >> >>>>>>>>>>>>>>> so far ..
> > >> >>>>>>>>>>>>>>>
> > >> >>>>>>>>>>>>>>> Kai Londenberg
> > >> >>>>>>>>>>>>>>>
> > >> >>>>>>>>>>>>>>
> > >> >>>>>>>>>>>>>>
> > >> >>>>>>>>>>>>>>
> > >> >>>>>>>>>>>>>> --
> > >> >>>>>>>>>>>>>> *Note that I'm no longer using my Yahoo! email address.
> > >> >>> Please
> > >> >>>>>>>>> email
> > >> >>>>>>>>>>> me
> > >> >>>>>>>>>>>>> at
> > >> >>>>>>>>>>>>>> billgraham@gmail.com going forward.*
> > >> >>>>>>>>>>>>>>
> > >> >>>>>>>>>>>>>
> > >> >>>>>>>>>>>>>
> > >> >>>>>>>>>>>>>
> > >> >>>>>>>>>>>>> --
> > >> >>>>>>>>>>>>> Russell Jurney twitter.com/rjurney
> > >> >> russell.jurney@gmail.com
> > >> >>>>>>>>>>>>> datasyndrome.com
> > >> >>>>>>>>>>>>>
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>>
> > >> >>>>>>>>>
> > >> >>>>>>>>
> > >> >>>>>>>>
> > >> >>>>>>>>
> > >> >>>>>>>> --
> > >> >>>>>>>> *Note that I'm no longer using my Yahoo! email address.
> Please
> > >> >>>> email me
> > >> >>>>>> at
> > >> >>>>>>>> billgraham@gmail.com going forward.*
> > >> >>>>>>>>
> > >> >>>>>>>
> > >> >>>>>>>
> > >> >>>>>>>
> > >> >>>>>>> --
> > >> >>>>>>> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
> > >> >>>>>> datasyndrome.com
> > >> >>>>>>
> > >> >>>>>>
> > >> >>>>
> > >> >>>
> > >> >>>
> > >> >>>
> > >> >>> --
> > >> >>> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
> > >> >>> datasyndrome.com
> > >> >>>
> > >> >>
> > >>
> > >
> > >
> >
>

Re: pig 0.11 candidate 2 feedback: Several problems

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
Looks like Lohit found a critical bug we should fix for 11.1:
https://issues.apache.org/jira/browse/PIG-3241(only observed in hadoop 2.0)

D


On Wed, Mar 6, 2013 at 12:57 PM, Prashant Kommireddi <pr...@gmail.com>wrote:

> Dmitriy, are the gc fixes all in for 0.11.1? PIG-3148 and PIG-3212 are the
> 2 JIRAs I know were fixed, any others?
>
> I have a patch up for 3194, I think we should be good for a release once
> that makes it in.
>
> -Prashant
>
> On Sat, Mar 2, 2013 at 11:16 AM, Prashant Kommireddi <prash1784@gmail.com
> >wrote:
>
> > Great.
> >
> > I have commented regarding a possible approach for PIG-3194
> > http://goo.gl/UQ3zs. Please take a look when you folks have a chance.
> >
> >
> > On Fri, Mar 1, 2013 at 7:00 PM, Dmitriy Ryaboy <dv...@gmail.com>
> wrote:
> >
> >> I'd like to get the gc fix in as well, but looks like Rohini is about to
> >> commit it so we are good there.
> >>
> >> On Mar 1, 2013, at 11:33 AM, Bill Graham <bi...@gmail.com> wrote:
> >>
> >> > +1 to releasing Pig 0.11.1 when this is addressed. I should be able to
> >> help
> >> > with the release again.
> >> >
> >> >
> >> >
> >> > On Fri, Mar 1, 2013 at 11:25 AM, Prashant Kommireddi <
> >> prash1784@gmail.com>wrote:
> >> >
> >> >> Hey Guys,
> >> >>
> >> >> I wanted to start a conversation on this again. If Kai is not looking
> >> at
> >> >> PIG-3194 I can start working on it to get 0.11 compatible with 20.2.
> If
> >> >> everyone agrees, we should roll out 0.11.1 sooner than usual and I
> >> >> volunteer to help with it in anyway possible.
> >> >>
> >> >> Any objections to getting 0.11.1 out soon after 3194 is fixed?
> >> >>
> >> >> -Prashant
> >> >>
> >> >> On Wed, Feb 20, 2013 at 3:34 PM, Russell Jurney <
> >> russell.jurney@gmail.com
> >> >>> wrote:
> >> >>
> >> >>> I stand corrected. Cool, 0.11 is good!
> >> >>>
> >> >>>
> >> >>> On Wed, Feb 20, 2013 at 1:15 PM, Jarek Jarcec Cecho <
> >> jarcec@apache.org
> >> >>>> wrote:
> >> >>>
> >> >>>> Just a unrelated note: The CDH3 is more closer to Hadoop 1.x than
> to
> >> >>> 0.20.
> >> >>>>
> >> >>>> Jarcec
> >> >>>>
> >> >>>> On Wed, Feb 20, 2013 at 12:04:51PM -0800, Dmitriy Ryaboy wrote:
> >> >>>>> I agree -- this is a good release. The bugs Kai pointed out should
> >> be
> >> >>>>> fixed, but as they are not critical regressions, we can fix them
> in
> >> >>>> 0.11.1
> >> >>>>> (if someone wants to roll 0.11.1 the minute these fixes are
> >> >> committed,
> >> >>> I
> >> >>>>> won't mind and will dutifully vote for the release).
> >> >>>>>
> >> >>>>> I think the Hadoop 20.2 incompatibility is unfortunate but iirc
> this
> >> >> is
> >> >>>>> fixable by setting HADOOP_USER_CLASSPATH_FIRST=true (was that in
> >> >> 20.2?)
> >> >>>>>
> >> >>>>> FWIW Twitter's running CDH3 and this release works in our
> >> >> environment.
> >> >>>>>
> >> >>>>> At this point things that block a release are critical regressions
> >> in
> >> >>>>> performance or correctness.
> >> >>>>>
> >> >>>>> D
> >> >>>>>
> >> >>>>>
> >> >>>>> On Wed, Feb 20, 2013 at 11:52 AM, Alan Gates <
> gates@hortonworks.com
> >> >
> >> >>>> wrote:
> >> >>>>>
> >> >>>>>> No.  Bugs like these are supposed to be found and fixed after we
> >> >>> branch
> >> >>>>>> from trunk (which happened several months ago in the case of
> 0.11).
> >> >>>> The
> >> >>>>>> point of RCs are to check that it's a good build, licenses are
> >> >> right,
> >> >>>> etc.
> >> >>>>>> Any bugs found this late in the game have to be seen as failures
> >> >> of
> >> >>>>>> earlier testing.
> >> >>>>>>
> >> >>>>>> Alan.
> >> >>>>>>
> >> >>>>>> On Feb 20, 2013, at 11:33 AM, Russell Jurney wrote:
> >> >>>>>>
> >> >>>>>>> Isn't the point of an RC to find and fix bugs like these>
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>> On Wed, Feb 20, 2013 at 11:31 AM, Bill Graham <
> >> >>> billgraham@gmail.com>
> >> >>>>>> wrote:
> >> >>>>>>>
> >> >>>>>>>> Regarding Pig 11 rc2, I propose we continue with the current
> >> >> vote
> >> >>>> as is
> >> >>>>>>>> (which closes today EOD). Patches for 0.20.2 issues can be
> >> >> rolled
> >> >>>> into a
> >> >>>>>>>> Pig 0.11.1 release whenever they're available and tested.
> >> >>>>>>>>
> >> >>>>>>>>
> >> >>>>>>>>
> >> >>>>>>>> On Wed, Feb 20, 2013 at 9:24 AM, Olga Natkovich <
> >> >>>> onatkovich@yahoo.com
> >> >>>>>>>>> wrote:
> >> >>>>>>>>
> >> >>>>>>>>> I agree that supporting as much as we can is a good goal. The
> >> >>>> issue is
> >> >>>>>>>> who
> >> >>>>>>>>> is going to be testing against all these versions? We found
> the
> >> >>>> issues
> >> >>>>>>>>> under discussion because of a customer report, not because we
> >> >>>>>>>> consistently
> >> >>>>>>>>> test against all versions. Perhaps when we decide which
> >> >> versions
> >> >>> to
> >> >>>>>>>> support
> >> >>>>>>>>> for next release we need also to agree who is going to be
> >> >> testing
> >> >>>> and
> >> >>>>>>>>> maintaining compatibility with a particular version.
> >> >>>>>>>>>
> >> >>>>>>>>> For instance since Hadoop 23 compatibility is important for us
> >> >> at
> >> >>>> Yahoo
> >> >>>>>>>> we
> >> >>>>>>>>> have been maintaining compatibility with this version for 0.9,
> >> >>>> 0.10 and
> >> >>>>>>>>> will do the same for 0.11 and going forward. I think we would
> >> >>> need
> >> >>>>>> others
> >> >>>>>>>>> to step in and claim the versions of their interest.
> >> >>>>>>>>>
> >> >>>>>>>>> Olga
> >> >>>>>>>>>
> >> >>>>>>>>>
> >> >>>>>>>>> ________________________________
> >> >>>>>>>>> From: Kai Londenberg <ka...@googlemail.com>
> >> >>>>>>>>> To: dev@pig.apache.org
> >> >>>>>>>>> Sent: Wednesday, February 20, 2013 1:51 AM
> >> >>>>>>>>> Subject: Re: pig 0.11 candidate 2 feedback: Several problems
> >> >>>>>>>>>
> >> >>>>>>>>> Hi,
> >> >>>>>>>>>
> >> >>>>>>>>> I stronly agree with Jonathan here. If there are good reasons
> >> >> why
> >> >>>> you
> >> >>>>>>>>> can't support an older version of Hadoop any more, that's one
> >> >>>> thing.
> >> >>>>>>>>> But having to change 2 lines of code doesn't really qualify as
> >> >>>> such in
> >> >>>>>>>>> my point of view ;)
> >> >>>>>>>>>
> >> >>>>>>>>> At least for me, pig support for 0.20.2 is essential - without
> >> >>> it,
> >> >>>> I
> >> >>>>>>>>> can't use it. If it doesn't support it, I'll have to branch
> pig
> >> >>> and
> >> >>>>>>>>> hack it myself, or stop using it.
> >> >>>>>>>>>
> >> >>>>>>>>> I guess, there are a lot of people still running 0.20.2
> >> >> Clusters.
> >> >>>> If
> >> >>>>>>>>> you really have lots of data stored on HDFS and a continuously
> >> >>> busy
> >> >>>>>>>>> cluster, an upgrade is nothing you do "just because".
> >> >>>>>>>>>
> >> >>>>>>>>>
> >> >>>>>>>>> 2013/2/20 Jonathan Coveney <jc...@gmail.com>:
> >> >>>>>>>>>> I agree that we shouldn't have to support old versions
> >> >> forever.
> >> >>>> That
> >> >>>>>>>>> said,
> >> >>>>>>>>>> I also don't think we should be too blase about supporting
> >> >> older
> >> >>>>>>>> versions
> >> >>>>>>>>>> where it is not odious to do so. We have a lot of competition
> >> >> in
> >> >>>> the
> >> >>>>>>>>>> language space and the broader the versions we can support,
> >> >> the
> >> >>>> better
> >> >>>>>>>>>> (assuming it isn't too odious to do so). In this case, I
> don't
> >> >>>> think
> >> >>>>>> it
> >> >>>>>>>>>> should be too hard to change ObjectSerializer so that the
> >> >>>>>> commons-codec
> >> >>>>>>>>>> code used is compatible with both versions...we could just
> >> >>> in-line
> >> >>>>>> some
> >> >>>>>>>>> of
> >> >>>>>>>>>> the Base64 code, and comment accordingly.
> >> >>>>>>>>>>
> >> >>>>>>>>>> That said, we also should be clear about what versions we
> >> >>>> support, but
> >> >>>>>>>>> 6-12
> >> >>>>>>>>>> months seems short. The upgrade cycles on Hadoop are really,
> >> >>>> really
> >> >>>>>>>> long.
> >> >>>>>>>>>>
> >> >>>>>>>>>>
> >> >>>>>>>>>> 2013/2/20 Prashant Kommireddi <pr...@gmail.com>
> >> >>>>>>>>>>
> >> >>>>>>>>>>> Agreed, that makes sense. Probably supporting older hadoop
> >> >>>> version
> >> >>>>>> for
> >> >>>>>>>>> a 1
> >> >>>>>>>>>>> or 2 pig releases before moving to a newer/stable version?
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> Having said that, should we use 0.11 period to communicate
> >> >> the
> >> >>>> same
> >> >>>>>> to
> >> >>>>>>>>> the
> >> >>>>>>>>>>> community and start moving on 0.12 onwards? I know we are
> way
> >> >>>> past
> >> >>>>>>>> 6-12
> >> >>>>>>>>>>> months (1-2 release) time frame with 0.20.2, but we also
> need
> >> >>> to
> >> >>>> make
> >> >>>>>>>>> sure
> >> >>>>>>>>>>> users are aware and plan accordingly.
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> I'd also be interested to hear how other projects (Hive,
> >> >> Oozie)
> >> >>>> are
> >> >>>>>>>>>>> handling this.
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> -Prashant
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> On Tue, Feb 19, 2013 at 3:22 PM, Olga Natkovich <
> >> >>>>>> onatkovich@yahoo.com
> >> >>>>>>>>>>>> wrote:
> >> >>>>>>>>>>>
> >> >>>>>>>>>>>> It seems that for each Pig release we need to agree and
> >> >>> clearly
> >> >>>>>>>> state
> >> >>>>>>>>>>>> which Hadoop versions it will support. I guess the main
> >> >>>> question is
> >> >>>>>>>>> how
> >> >>>>>>>>>>> we
> >> >>>>>>>>>>>> decide on this. Perhaps we should say that Pig no longer
> >> >>>> supports
> >> >>>>>>>>> older
> >> >>>>>>>>>>>> Hadoop versions once the newer one is out for at least 6-12
> >> >>>> month to
> >> >>>>>>>>> make
> >> >>>>>>>>>>>> sure it is stable. I don't think we can support old
> versions
> >> >>>>>>>>>>> indefinitely.
> >> >>>>>>>>>>>> It is in everybody's interest to keep moving forward.
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> Olga
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> ________________________________
> >> >>>>>>>>>>>> From: Prashant Kommireddi <pr...@gmail.com>
> >> >>>>>>>>>>>> To: dev@pig.apache.org
> >> >>>>>>>>>>>> Sent: Tuesday, February 19, 2013 10:57 AM
> >> >>>>>>>>>>>> Subject: Re: pig 0.11 candidate 2 feedback: Several
> problems
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> What do you guys feel about the JIRA to do with 0.20.2
> >> >>>> compatibility
> >> >>>>>>>>>>>> (PIG-3194)? I am interested in discussing the strategy
> >> >> around
> >> >>>>>>>> backward
> >> >>>>>>>>>>>> compatibility as this is something that would haunt us each
> >> >>>> time we
> >> >>>>>>>>> move
> >> >>>>>>>>>>> to
> >> >>>>>>>>>>>> the next hadoop version. For eg, we might be in a similar
> >> >>>> situation
> >> >>>>>>>>> while
> >> >>>>>>>>>>>> moving to Hadoop 2.0, when some of the stuff might break
> for
> >> >>>> 1.0.
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> I feel it would be good to get this JIRA fix in for 0.11,
> as
> >> >>>> 0.20.2
> >> >>>>>>>>> users
> >> >>>>>>>>>>>> might be caught unaware. Of course, I must admit there is
> >> >>>> selfish
> >> >>>>>>>>>>> interest
> >> >>>>>>>>>>>> here and it's probably easier for us to have a workaround
> on
> >> >>> Pig
> >> >>>>>>>>> rather
> >> >>>>>>>>>>>> than upgrade hadoop in all our production DCs.
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> -Prashant
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> On Tue, Feb 19, 2013 at 9:54 AM, Russell Jurney <
> >> >>>>>>>>>>> russell.jurney@gmail.com
> >> >>>>>>>>>>>>> wrote:
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>>> I think someone should step up and fix the easy ones, if
> >> >>>> possible.
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>> On Tue, Feb 19, 2013 at 9:51 AM, Bill Graham <
> >> >>>>>>>> billgraham@gmail.com>
> >> >>>>>>>>>>>> wrote:
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>>> Thanks Kai for reporting these.
> >> >>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>> What do people think about the severity of these issues
> >> >>> w.r.t.
> >> >>>>>>>> Pig
> >> >>>>>>>>>>> 11?
> >> >>>>>>>>>>>> I
> >> >>>>>>>>>>>>>> see a few possible options:
> >> >>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>> 1. We include some or all of these patches in a new Pig
> 11
> >> >>> rc.
> >> >>>>>>>>> We'd
> >> >>>>>>>>>>>> want
> >> >>>>>>>>>>>>> to
> >> >>>>>>>>>>>>>> make sure that they don't destabilize the current branch.
> >> >>> This
> >> >>>>>>>>>>> approach
> >> >>>>>>>>>>>>>> makes sense if we think Pig 11 wouldn't be a good release
> >> >>>>>>>> without
> >> >>>>>>>>> one
> >> >>>>>>>>>>>> or
> >> >>>>>>>>>>>>>> more of these included.
> >> >>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>> 2. We continue with the Pig 11 release without these, but
> >> >>> then
> >> >>>>>>>>>>> include
> >> >>>>>>>>>>>>> one
> >> >>>>>>>>>>>>>> or more in a 0.11.1 release.
> >> >>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>> 3. We continue with the Pig 11 release without these, but
> >> >>> then
> >> >>>>>>>>>>> include
> >> >>>>>>>>>>>>> them
> >> >>>>>>>>>>>>>> in a 0.12 release.
> >> >>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>> Jon has a patch for the MAP issue
> >> >>>>>>>>>>>>>> (PIG-3144<https://issues.apache.org/jira/browse/PIG-3144
> >> >>> )
> >> >>>>>>>>>>>>>> ready, which seems like the most pressing of the three to
> >> >>> me.
> >> >>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>> thanks,
> >> >>>>>>>>>>>>>> Bill
> >> >>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>> On Mon, Feb 18, 2013 at 2:27 AM, Kai Londenberg <
> >> >>>>>>>>>>>>>> kai.londenberg@googlemail.com> wrote:
> >> >>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>> Hi,
> >> >>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>> I just subscribed to the dev mailing list in order to
> >> >> give
> >> >>>> you
> >> >>>>>>>>> some
> >> >>>>>>>>>>>>>>> feedback on pig 0.11 candidate 2.
> >> >>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>> The following three issues are currently present in 0.11
> >> >>>>>>>>> candidate
> >> >>>>>>>>>>> 2:
> >> >>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/PIG-3144 -
> >> >>> 'Erroneous
> >> >>>>>>>> map
> >> >>>>>>>>>>>> entry
> >> >>>>>>>>>>>>>>> alias resolution leading to "Duplicate schema alias"
> >> >>> errors'
> >> >>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/PIG-3194 -
> Changes
> >> >>> to
> >> >>>>>>>>>>>>>>> ObjectSerializer.java break compatibility with Hadoop
> >> >>> 0.20.2
> >> >>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/PIG-3195 - Race
> >> >>>>>>>>> Condition in
> >> >>>>>>>>>>>>>>> PhysicalOperator leads to ExecException "Error while
> >> >> trying
> >> >>>> to
> >> >>>>>>>>> get
> >> >>>>>>>>>>>>>>> next result in POStream"
> >> >>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>> The last two of these are easily solveable (see the
> >> >> tickets
> >> >>>>>>>> for
> >> >>>>>>>>>>>>>>> details on that). The first one is a bit trickier I
> >> >> think,
> >> >>>> but
> >> >>>>>>>>> at
> >> >>>>>>>>>>>>>>> least there is a workaround for it (pass Map fields
> >> >> through
> >> >>>> an
> >> >>>>>>>>> UDF)
> >> >>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>> In my personal opinion, each of these problems is pretty
> >> >>>>>>>> severe,
> >> >>>>>>>>>>> but
> >> >>>>>>>>>>>>>>> opinions about the importance of the MAP Datatype and
> >> >>> STREAM
> >> >>>>>>>>>>>> Operator,
> >> >>>>>>>>>>>>>>> as well as Hadoop 0.20.2 compatibility might differ.
> >> >>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>> so far ..
> >> >>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>> Kai Londenberg
> >> >>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>> --
> >> >>>>>>>>>>>>>> *Note that I'm no longer using my Yahoo! email address.
> >> >>> Please
> >> >>>>>>>>> email
> >> >>>>>>>>>>> me
> >> >>>>>>>>>>>>> at
> >> >>>>>>>>>>>>>> billgraham@gmail.com going forward.*
> >> >>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>> --
> >> >>>>>>>>>>>>> Russell Jurney twitter.com/rjurney
> >> >> russell.jurney@gmail.com
> >> >>>>>>>>>>>>> datasyndrome.com
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>
> >> >>>>>>>>>
> >> >>>>>>>>
> >> >>>>>>>>
> >> >>>>>>>>
> >> >>>>>>>> --
> >> >>>>>>>> *Note that I'm no longer using my Yahoo! email address. Please
> >> >>>> email me
> >> >>>>>> at
> >> >>>>>>>> billgraham@gmail.com going forward.*
> >> >>>>>>>>
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>> --
> >> >>>>>>> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
> >> >>>>>> datasyndrome.com
> >> >>>>>>
> >> >>>>>>
> >> >>>>
> >> >>>
> >> >>>
> >> >>>
> >> >>> --
> >> >>> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
> >> >>> datasyndrome.com
> >> >>>
> >> >>
> >>
> >
> >
>

Re: pig 0.11 candidate 2 feedback: Several problems

Posted by Prashant Kommireddi <pr...@gmail.com>.
Dmitriy, are the gc fixes all in for 0.11.1? PIG-3148 and PIG-3212 are the
2 JIRAs I know were fixed, any others?

I have a patch up for 3194, I think we should be good for a release once
that makes it in.

-Prashant

On Sat, Mar 2, 2013 at 11:16 AM, Prashant Kommireddi <pr...@gmail.com>wrote:

> Great.
>
> I have commented regarding a possible approach for PIG-3194
> http://goo.gl/UQ3zs. Please take a look when you folks have a chance.
>
>
> On Fri, Mar 1, 2013 at 7:00 PM, Dmitriy Ryaboy <dv...@gmail.com> wrote:
>
>> I'd like to get the gc fix in as well, but looks like Rohini is about to
>> commit it so we are good there.
>>
>> On Mar 1, 2013, at 11:33 AM, Bill Graham <bi...@gmail.com> wrote:
>>
>> > +1 to releasing Pig 0.11.1 when this is addressed. I should be able to
>> help
>> > with the release again.
>> >
>> >
>> >
>> > On Fri, Mar 1, 2013 at 11:25 AM, Prashant Kommireddi <
>> prash1784@gmail.com>wrote:
>> >
>> >> Hey Guys,
>> >>
>> >> I wanted to start a conversation on this again. If Kai is not looking
>> at
>> >> PIG-3194 I can start working on it to get 0.11 compatible with 20.2. If
>> >> everyone agrees, we should roll out 0.11.1 sooner than usual and I
>> >> volunteer to help with it in anyway possible.
>> >>
>> >> Any objections to getting 0.11.1 out soon after 3194 is fixed?
>> >>
>> >> -Prashant
>> >>
>> >> On Wed, Feb 20, 2013 at 3:34 PM, Russell Jurney <
>> russell.jurney@gmail.com
>> >>> wrote:
>> >>
>> >>> I stand corrected. Cool, 0.11 is good!
>> >>>
>> >>>
>> >>> On Wed, Feb 20, 2013 at 1:15 PM, Jarek Jarcec Cecho <
>> jarcec@apache.org
>> >>>> wrote:
>> >>>
>> >>>> Just a unrelated note: The CDH3 is more closer to Hadoop 1.x than to
>> >>> 0.20.
>> >>>>
>> >>>> Jarcec
>> >>>>
>> >>>> On Wed, Feb 20, 2013 at 12:04:51PM -0800, Dmitriy Ryaboy wrote:
>> >>>>> I agree -- this is a good release. The bugs Kai pointed out should
>> be
>> >>>>> fixed, but as they are not critical regressions, we can fix them in
>> >>>> 0.11.1
>> >>>>> (if someone wants to roll 0.11.1 the minute these fixes are
>> >> committed,
>> >>> I
>> >>>>> won't mind and will dutifully vote for the release).
>> >>>>>
>> >>>>> I think the Hadoop 20.2 incompatibility is unfortunate but iirc this
>> >> is
>> >>>>> fixable by setting HADOOP_USER_CLASSPATH_FIRST=true (was that in
>> >> 20.2?)
>> >>>>>
>> >>>>> FWIW Twitter's running CDH3 and this release works in our
>> >> environment.
>> >>>>>
>> >>>>> At this point things that block a release are critical regressions
>> in
>> >>>>> performance or correctness.
>> >>>>>
>> >>>>> D
>> >>>>>
>> >>>>>
>> >>>>> On Wed, Feb 20, 2013 at 11:52 AM, Alan Gates <gates@hortonworks.com
>> >
>> >>>> wrote:
>> >>>>>
>> >>>>>> No.  Bugs like these are supposed to be found and fixed after we
>> >>> branch
>> >>>>>> from trunk (which happened several months ago in the case of 0.11).
>> >>>> The
>> >>>>>> point of RCs are to check that it's a good build, licenses are
>> >> right,
>> >>>> etc.
>> >>>>>> Any bugs found this late in the game have to be seen as failures
>> >> of
>> >>>>>> earlier testing.
>> >>>>>>
>> >>>>>> Alan.
>> >>>>>>
>> >>>>>> On Feb 20, 2013, at 11:33 AM, Russell Jurney wrote:
>> >>>>>>
>> >>>>>>> Isn't the point of an RC to find and fix bugs like these>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> On Wed, Feb 20, 2013 at 11:31 AM, Bill Graham <
>> >>> billgraham@gmail.com>
>> >>>>>> wrote:
>> >>>>>>>
>> >>>>>>>> Regarding Pig 11 rc2, I propose we continue with the current
>> >> vote
>> >>>> as is
>> >>>>>>>> (which closes today EOD). Patches for 0.20.2 issues can be
>> >> rolled
>> >>>> into a
>> >>>>>>>> Pig 0.11.1 release whenever they're available and tested.
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> On Wed, Feb 20, 2013 at 9:24 AM, Olga Natkovich <
>> >>>> onatkovich@yahoo.com
>> >>>>>>>>> wrote:
>> >>>>>>>>
>> >>>>>>>>> I agree that supporting as much as we can is a good goal. The
>> >>>> issue is
>> >>>>>>>> who
>> >>>>>>>>> is going to be testing against all these versions? We found the
>> >>>> issues
>> >>>>>>>>> under discussion because of a customer report, not because we
>> >>>>>>>> consistently
>> >>>>>>>>> test against all versions. Perhaps when we decide which
>> >> versions
>> >>> to
>> >>>>>>>> support
>> >>>>>>>>> for next release we need also to agree who is going to be
>> >> testing
>> >>>> and
>> >>>>>>>>> maintaining compatibility with a particular version.
>> >>>>>>>>>
>> >>>>>>>>> For instance since Hadoop 23 compatibility is important for us
>> >> at
>> >>>> Yahoo
>> >>>>>>>> we
>> >>>>>>>>> have been maintaining compatibility with this version for 0.9,
>> >>>> 0.10 and
>> >>>>>>>>> will do the same for 0.11 and going forward. I think we would
>> >>> need
>> >>>>>> others
>> >>>>>>>>> to step in and claim the versions of their interest.
>> >>>>>>>>>
>> >>>>>>>>> Olga
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> ________________________________
>> >>>>>>>>> From: Kai Londenberg <ka...@googlemail.com>
>> >>>>>>>>> To: dev@pig.apache.org
>> >>>>>>>>> Sent: Wednesday, February 20, 2013 1:51 AM
>> >>>>>>>>> Subject: Re: pig 0.11 candidate 2 feedback: Several problems
>> >>>>>>>>>
>> >>>>>>>>> Hi,
>> >>>>>>>>>
>> >>>>>>>>> I stronly agree with Jonathan here. If there are good reasons
>> >> why
>> >>>> you
>> >>>>>>>>> can't support an older version of Hadoop any more, that's one
>> >>>> thing.
>> >>>>>>>>> But having to change 2 lines of code doesn't really qualify as
>> >>>> such in
>> >>>>>>>>> my point of view ;)
>> >>>>>>>>>
>> >>>>>>>>> At least for me, pig support for 0.20.2 is essential - without
>> >>> it,
>> >>>> I
>> >>>>>>>>> can't use it. If it doesn't support it, I'll have to branch pig
>> >>> and
>> >>>>>>>>> hack it myself, or stop using it.
>> >>>>>>>>>
>> >>>>>>>>> I guess, there are a lot of people still running 0.20.2
>> >> Clusters.
>> >>>> If
>> >>>>>>>>> you really have lots of data stored on HDFS and a continuously
>> >>> busy
>> >>>>>>>>> cluster, an upgrade is nothing you do "just because".
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> 2013/2/20 Jonathan Coveney <jc...@gmail.com>:
>> >>>>>>>>>> I agree that we shouldn't have to support old versions
>> >> forever.
>> >>>> That
>> >>>>>>>>> said,
>> >>>>>>>>>> I also don't think we should be too blase about supporting
>> >> older
>> >>>>>>>> versions
>> >>>>>>>>>> where it is not odious to do so. We have a lot of competition
>> >> in
>> >>>> the
>> >>>>>>>>>> language space and the broader the versions we can support,
>> >> the
>> >>>> better
>> >>>>>>>>>> (assuming it isn't too odious to do so). In this case, I don't
>> >>>> think
>> >>>>>> it
>> >>>>>>>>>> should be too hard to change ObjectSerializer so that the
>> >>>>>> commons-codec
>> >>>>>>>>>> code used is compatible with both versions...we could just
>> >>> in-line
>> >>>>>> some
>> >>>>>>>>> of
>> >>>>>>>>>> the Base64 code, and comment accordingly.
>> >>>>>>>>>>
>> >>>>>>>>>> That said, we also should be clear about what versions we
>> >>>> support, but
>> >>>>>>>>> 6-12
>> >>>>>>>>>> months seems short. The upgrade cycles on Hadoop are really,
>> >>>> really
>> >>>>>>>> long.
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> 2013/2/20 Prashant Kommireddi <pr...@gmail.com>
>> >>>>>>>>>>
>> >>>>>>>>>>> Agreed, that makes sense. Probably supporting older hadoop
>> >>>> version
>> >>>>>> for
>> >>>>>>>>> a 1
>> >>>>>>>>>>> or 2 pig releases before moving to a newer/stable version?
>> >>>>>>>>>>>
>> >>>>>>>>>>> Having said that, should we use 0.11 period to communicate
>> >> the
>> >>>> same
>> >>>>>> to
>> >>>>>>>>> the
>> >>>>>>>>>>> community and start moving on 0.12 onwards? I know we are way
>> >>>> past
>> >>>>>>>> 6-12
>> >>>>>>>>>>> months (1-2 release) time frame with 0.20.2, but we also need
>> >>> to
>> >>>> make
>> >>>>>>>>> sure
>> >>>>>>>>>>> users are aware and plan accordingly.
>> >>>>>>>>>>>
>> >>>>>>>>>>> I'd also be interested to hear how other projects (Hive,
>> >> Oozie)
>> >>>> are
>> >>>>>>>>>>> handling this.
>> >>>>>>>>>>>
>> >>>>>>>>>>> -Prashant
>> >>>>>>>>>>>
>> >>>>>>>>>>> On Tue, Feb 19, 2013 at 3:22 PM, Olga Natkovich <
>> >>>>>> onatkovich@yahoo.com
>> >>>>>>>>>>>> wrote:
>> >>>>>>>>>>>
>> >>>>>>>>>>>> It seems that for each Pig release we need to agree and
>> >>> clearly
>> >>>>>>>> state
>> >>>>>>>>>>>> which Hadoop versions it will support. I guess the main
>> >>>> question is
>> >>>>>>>>> how
>> >>>>>>>>>>> we
>> >>>>>>>>>>>> decide on this. Perhaps we should say that Pig no longer
>> >>>> supports
>> >>>>>>>>> older
>> >>>>>>>>>>>> Hadoop versions once the newer one is out for at least 6-12
>> >>>> month to
>> >>>>>>>>> make
>> >>>>>>>>>>>> sure it is stable. I don't think we can support old versions
>> >>>>>>>>>>> indefinitely.
>> >>>>>>>>>>>> It is in everybody's interest to keep moving forward.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Olga
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> ________________________________
>> >>>>>>>>>>>> From: Prashant Kommireddi <pr...@gmail.com>
>> >>>>>>>>>>>> To: dev@pig.apache.org
>> >>>>>>>>>>>> Sent: Tuesday, February 19, 2013 10:57 AM
>> >>>>>>>>>>>> Subject: Re: pig 0.11 candidate 2 feedback: Several problems
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> What do you guys feel about the JIRA to do with 0.20.2
>> >>>> compatibility
>> >>>>>>>>>>>> (PIG-3194)? I am interested in discussing the strategy
>> >> around
>> >>>>>>>> backward
>> >>>>>>>>>>>> compatibility as this is something that would haunt us each
>> >>>> time we
>> >>>>>>>>> move
>> >>>>>>>>>>> to
>> >>>>>>>>>>>> the next hadoop version. For eg, we might be in a similar
>> >>>> situation
>> >>>>>>>>> while
>> >>>>>>>>>>>> moving to Hadoop 2.0, when some of the stuff might break for
>> >>>> 1.0.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> I feel it would be good to get this JIRA fix in for 0.11, as
>> >>>> 0.20.2
>> >>>>>>>>> users
>> >>>>>>>>>>>> might be caught unaware. Of course, I must admit there is
>> >>>> selfish
>> >>>>>>>>>>> interest
>> >>>>>>>>>>>> here and it's probably easier for us to have a workaround on
>> >>> Pig
>> >>>>>>>>> rather
>> >>>>>>>>>>>> than upgrade hadoop in all our production DCs.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> -Prashant
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> On Tue, Feb 19, 2013 at 9:54 AM, Russell Jurney <
>> >>>>>>>>>>> russell.jurney@gmail.com
>> >>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>> I think someone should step up and fix the easy ones, if
>> >>>> possible.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> On Tue, Feb 19, 2013 at 9:51 AM, Bill Graham <
>> >>>>>>>> billgraham@gmail.com>
>> >>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Thanks Kai for reporting these.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> What do people think about the severity of these issues
>> >>> w.r.t.
>> >>>>>>>> Pig
>> >>>>>>>>>>> 11?
>> >>>>>>>>>>>> I
>> >>>>>>>>>>>>>> see a few possible options:
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> 1. We include some or all of these patches in a new Pig 11
>> >>> rc.
>> >>>>>>>>> We'd
>> >>>>>>>>>>>> want
>> >>>>>>>>>>>>> to
>> >>>>>>>>>>>>>> make sure that they don't destabilize the current branch.
>> >>> This
>> >>>>>>>>>>> approach
>> >>>>>>>>>>>>>> makes sense if we think Pig 11 wouldn't be a good release
>> >>>>>>>> without
>> >>>>>>>>> one
>> >>>>>>>>>>>> or
>> >>>>>>>>>>>>>> more of these included.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> 2. We continue with the Pig 11 release without these, but
>> >>> then
>> >>>>>>>>>>> include
>> >>>>>>>>>>>>> one
>> >>>>>>>>>>>>>> or more in a 0.11.1 release.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> 3. We continue with the Pig 11 release without these, but
>> >>> then
>> >>>>>>>>>>> include
>> >>>>>>>>>>>>> them
>> >>>>>>>>>>>>>> in a 0.12 release.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Jon has a patch for the MAP issue
>> >>>>>>>>>>>>>> (PIG-3144<https://issues.apache.org/jira/browse/PIG-3144
>> >>> )
>> >>>>>>>>>>>>>> ready, which seems like the most pressing of the three to
>> >>> me.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> thanks,
>> >>>>>>>>>>>>>> Bill
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> On Mon, Feb 18, 2013 at 2:27 AM, Kai Londenberg <
>> >>>>>>>>>>>>>> kai.londenberg@googlemail.com> wrote:
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> Hi,
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> I just subscribed to the dev mailing list in order to
>> >> give
>> >>>> you
>> >>>>>>>>> some
>> >>>>>>>>>>>>>>> feedback on pig 0.11 candidate 2.
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> The following three issues are currently present in 0.11
>> >>>>>>>>> candidate
>> >>>>>>>>>>> 2:
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/PIG-3144 -
>> >>> 'Erroneous
>> >>>>>>>> map
>> >>>>>>>>>>>> entry
>> >>>>>>>>>>>>>>> alias resolution leading to "Duplicate schema alias"
>> >>> errors'
>> >>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/PIG-3194 - Changes
>> >>> to
>> >>>>>>>>>>>>>>> ObjectSerializer.java break compatibility with Hadoop
>> >>> 0.20.2
>> >>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/PIG-3195 - Race
>> >>>>>>>>> Condition in
>> >>>>>>>>>>>>>>> PhysicalOperator leads to ExecException "Error while
>> >> trying
>> >>>> to
>> >>>>>>>>> get
>> >>>>>>>>>>>>>>> next result in POStream"
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> The last two of these are easily solveable (see the
>> >> tickets
>> >>>>>>>> for
>> >>>>>>>>>>>>>>> details on that). The first one is a bit trickier I
>> >> think,
>> >>>> but
>> >>>>>>>>> at
>> >>>>>>>>>>>>>>> least there is a workaround for it (pass Map fields
>> >> through
>> >>>> an
>> >>>>>>>>> UDF)
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> In my personal opinion, each of these problems is pretty
>> >>>>>>>> severe,
>> >>>>>>>>>>> but
>> >>>>>>>>>>>>>>> opinions about the importance of the MAP Datatype and
>> >>> STREAM
>> >>>>>>>>>>>> Operator,
>> >>>>>>>>>>>>>>> as well as Hadoop 0.20.2 compatibility might differ.
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> so far ..
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> Kai Londenberg
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> --
>> >>>>>>>>>>>>>> *Note that I'm no longer using my Yahoo! email address.
>> >>> Please
>> >>>>>>>>> email
>> >>>>>>>>>>> me
>> >>>>>>>>>>>>> at
>> >>>>>>>>>>>>>> billgraham@gmail.com going forward.*
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> --
>> >>>>>>>>>>>>> Russell Jurney twitter.com/rjurney
>> >> russell.jurney@gmail.com
>> >>>>>>>>>>>>> datasyndrome.com
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> --
>> >>>>>>>> *Note that I'm no longer using my Yahoo! email address. Please
>> >>>> email me
>> >>>>>> at
>> >>>>>>>> billgraham@gmail.com going forward.*
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> --
>> >>>>>>> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
>> >>>>>> datasyndrome.com
>> >>>>>>
>> >>>>>>
>> >>>>
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
>> >>> datasyndrome.com
>> >>>
>> >>
>>
>
>

Re: pig 0.11 candidate 2 feedback: Several problems

Posted by Prashant Kommireddi <pr...@gmail.com>.
Great.

I have commented regarding a possible approach for PIG-3194
http://goo.gl/UQ3zs. Please take a look when you folks have a chance.

On Fri, Mar 1, 2013 at 7:00 PM, Dmitriy Ryaboy <dv...@gmail.com> wrote:

> I'd like to get the gc fix in as well, but looks like Rohini is about to
> commit it so we are good there.
>
> On Mar 1, 2013, at 11:33 AM, Bill Graham <bi...@gmail.com> wrote:
>
> > +1 to releasing Pig 0.11.1 when this is addressed. I should be able to
> help
> > with the release again.
> >
> >
> >
> > On Fri, Mar 1, 2013 at 11:25 AM, Prashant Kommireddi <
> prash1784@gmail.com>wrote:
> >
> >> Hey Guys,
> >>
> >> I wanted to start a conversation on this again. If Kai is not looking at
> >> PIG-3194 I can start working on it to get 0.11 compatible with 20.2. If
> >> everyone agrees, we should roll out 0.11.1 sooner than usual and I
> >> volunteer to help with it in anyway possible.
> >>
> >> Any objections to getting 0.11.1 out soon after 3194 is fixed?
> >>
> >> -Prashant
> >>
> >> On Wed, Feb 20, 2013 at 3:34 PM, Russell Jurney <
> russell.jurney@gmail.com
> >>> wrote:
> >>
> >>> I stand corrected. Cool, 0.11 is good!
> >>>
> >>>
> >>> On Wed, Feb 20, 2013 at 1:15 PM, Jarek Jarcec Cecho <jarcec@apache.org
> >>>> wrote:
> >>>
> >>>> Just a unrelated note: The CDH3 is more closer to Hadoop 1.x than to
> >>> 0.20.
> >>>>
> >>>> Jarcec
> >>>>
> >>>> On Wed, Feb 20, 2013 at 12:04:51PM -0800, Dmitriy Ryaboy wrote:
> >>>>> I agree -- this is a good release. The bugs Kai pointed out should be
> >>>>> fixed, but as they are not critical regressions, we can fix them in
> >>>> 0.11.1
> >>>>> (if someone wants to roll 0.11.1 the minute these fixes are
> >> committed,
> >>> I
> >>>>> won't mind and will dutifully vote for the release).
> >>>>>
> >>>>> I think the Hadoop 20.2 incompatibility is unfortunate but iirc this
> >> is
> >>>>> fixable by setting HADOOP_USER_CLASSPATH_FIRST=true (was that in
> >> 20.2?)
> >>>>>
> >>>>> FWIW Twitter's running CDH3 and this release works in our
> >> environment.
> >>>>>
> >>>>> At this point things that block a release are critical regressions in
> >>>>> performance or correctness.
> >>>>>
> >>>>> D
> >>>>>
> >>>>>
> >>>>> On Wed, Feb 20, 2013 at 11:52 AM, Alan Gates <ga...@hortonworks.com>
> >>>> wrote:
> >>>>>
> >>>>>> No.  Bugs like these are supposed to be found and fixed after we
> >>> branch
> >>>>>> from trunk (which happened several months ago in the case of 0.11).
> >>>> The
> >>>>>> point of RCs are to check that it's a good build, licenses are
> >> right,
> >>>> etc.
> >>>>>> Any bugs found this late in the game have to be seen as failures
> >> of
> >>>>>> earlier testing.
> >>>>>>
> >>>>>> Alan.
> >>>>>>
> >>>>>> On Feb 20, 2013, at 11:33 AM, Russell Jurney wrote:
> >>>>>>
> >>>>>>> Isn't the point of an RC to find and fix bugs like these>
> >>>>>>>
> >>>>>>>
> >>>>>>> On Wed, Feb 20, 2013 at 11:31 AM, Bill Graham <
> >>> billgraham@gmail.com>
> >>>>>> wrote:
> >>>>>>>
> >>>>>>>> Regarding Pig 11 rc2, I propose we continue with the current
> >> vote
> >>>> as is
> >>>>>>>> (which closes today EOD). Patches for 0.20.2 issues can be
> >> rolled
> >>>> into a
> >>>>>>>> Pig 0.11.1 release whenever they're available and tested.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Wed, Feb 20, 2013 at 9:24 AM, Olga Natkovich <
> >>>> onatkovich@yahoo.com
> >>>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> I agree that supporting as much as we can is a good goal. The
> >>>> issue is
> >>>>>>>> who
> >>>>>>>>> is going to be testing against all these versions? We found the
> >>>> issues
> >>>>>>>>> under discussion because of a customer report, not because we
> >>>>>>>> consistently
> >>>>>>>>> test against all versions. Perhaps when we decide which
> >> versions
> >>> to
> >>>>>>>> support
> >>>>>>>>> for next release we need also to agree who is going to be
> >> testing
> >>>> and
> >>>>>>>>> maintaining compatibility with a particular version.
> >>>>>>>>>
> >>>>>>>>> For instance since Hadoop 23 compatibility is important for us
> >> at
> >>>> Yahoo
> >>>>>>>> we
> >>>>>>>>> have been maintaining compatibility with this version for 0.9,
> >>>> 0.10 and
> >>>>>>>>> will do the same for 0.11 and going forward. I think we would
> >>> need
> >>>>>> others
> >>>>>>>>> to step in and claim the versions of their interest.
> >>>>>>>>>
> >>>>>>>>> Olga
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> ________________________________
> >>>>>>>>> From: Kai Londenberg <ka...@googlemail.com>
> >>>>>>>>> To: dev@pig.apache.org
> >>>>>>>>> Sent: Wednesday, February 20, 2013 1:51 AM
> >>>>>>>>> Subject: Re: pig 0.11 candidate 2 feedback: Several problems
> >>>>>>>>>
> >>>>>>>>> Hi,
> >>>>>>>>>
> >>>>>>>>> I stronly agree with Jonathan here. If there are good reasons
> >> why
> >>>> you
> >>>>>>>>> can't support an older version of Hadoop any more, that's one
> >>>> thing.
> >>>>>>>>> But having to change 2 lines of code doesn't really qualify as
> >>>> such in
> >>>>>>>>> my point of view ;)
> >>>>>>>>>
> >>>>>>>>> At least for me, pig support for 0.20.2 is essential - without
> >>> it,
> >>>> I
> >>>>>>>>> can't use it. If it doesn't support it, I'll have to branch pig
> >>> and
> >>>>>>>>> hack it myself, or stop using it.
> >>>>>>>>>
> >>>>>>>>> I guess, there are a lot of people still running 0.20.2
> >> Clusters.
> >>>> If
> >>>>>>>>> you really have lots of data stored on HDFS and a continuously
> >>> busy
> >>>>>>>>> cluster, an upgrade is nothing you do "just because".
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> 2013/2/20 Jonathan Coveney <jc...@gmail.com>:
> >>>>>>>>>> I agree that we shouldn't have to support old versions
> >> forever.
> >>>> That
> >>>>>>>>> said,
> >>>>>>>>>> I also don't think we should be too blase about supporting
> >> older
> >>>>>>>> versions
> >>>>>>>>>> where it is not odious to do so. We have a lot of competition
> >> in
> >>>> the
> >>>>>>>>>> language space and the broader the versions we can support,
> >> the
> >>>> better
> >>>>>>>>>> (assuming it isn't too odious to do so). In this case, I don't
> >>>> think
> >>>>>> it
> >>>>>>>>>> should be too hard to change ObjectSerializer so that the
> >>>>>> commons-codec
> >>>>>>>>>> code used is compatible with both versions...we could just
> >>> in-line
> >>>>>> some
> >>>>>>>>> of
> >>>>>>>>>> the Base64 code, and comment accordingly.
> >>>>>>>>>>
> >>>>>>>>>> That said, we also should be clear about what versions we
> >>>> support, but
> >>>>>>>>> 6-12
> >>>>>>>>>> months seems short. The upgrade cycles on Hadoop are really,
> >>>> really
> >>>>>>>> long.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> 2013/2/20 Prashant Kommireddi <pr...@gmail.com>
> >>>>>>>>>>
> >>>>>>>>>>> Agreed, that makes sense. Probably supporting older hadoop
> >>>> version
> >>>>>> for
> >>>>>>>>> a 1
> >>>>>>>>>>> or 2 pig releases before moving to a newer/stable version?
> >>>>>>>>>>>
> >>>>>>>>>>> Having said that, should we use 0.11 period to communicate
> >> the
> >>>> same
> >>>>>> to
> >>>>>>>>> the
> >>>>>>>>>>> community and start moving on 0.12 onwards? I know we are way
> >>>> past
> >>>>>>>> 6-12
> >>>>>>>>>>> months (1-2 release) time frame with 0.20.2, but we also need
> >>> to
> >>>> make
> >>>>>>>>> sure
> >>>>>>>>>>> users are aware and plan accordingly.
> >>>>>>>>>>>
> >>>>>>>>>>> I'd also be interested to hear how other projects (Hive,
> >> Oozie)
> >>>> are
> >>>>>>>>>>> handling this.
> >>>>>>>>>>>
> >>>>>>>>>>> -Prashant
> >>>>>>>>>>>
> >>>>>>>>>>> On Tue, Feb 19, 2013 at 3:22 PM, Olga Natkovich <
> >>>>>> onatkovich@yahoo.com
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> It seems that for each Pig release we need to agree and
> >>> clearly
> >>>>>>>> state
> >>>>>>>>>>>> which Hadoop versions it will support. I guess the main
> >>>> question is
> >>>>>>>>> how
> >>>>>>>>>>> we
> >>>>>>>>>>>> decide on this. Perhaps we should say that Pig no longer
> >>>> supports
> >>>>>>>>> older
> >>>>>>>>>>>> Hadoop versions once the newer one is out for at least 6-12
> >>>> month to
> >>>>>>>>> make
> >>>>>>>>>>>> sure it is stable. I don't think we can support old versions
> >>>>>>>>>>> indefinitely.
> >>>>>>>>>>>> It is in everybody's interest to keep moving forward.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Olga
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> ________________________________
> >>>>>>>>>>>> From: Prashant Kommireddi <pr...@gmail.com>
> >>>>>>>>>>>> To: dev@pig.apache.org
> >>>>>>>>>>>> Sent: Tuesday, February 19, 2013 10:57 AM
> >>>>>>>>>>>> Subject: Re: pig 0.11 candidate 2 feedback: Several problems
> >>>>>>>>>>>>
> >>>>>>>>>>>> What do you guys feel about the JIRA to do with 0.20.2
> >>>> compatibility
> >>>>>>>>>>>> (PIG-3194)? I am interested in discussing the strategy
> >> around
> >>>>>>>> backward
> >>>>>>>>>>>> compatibility as this is something that would haunt us each
> >>>> time we
> >>>>>>>>> move
> >>>>>>>>>>> to
> >>>>>>>>>>>> the next hadoop version. For eg, we might be in a similar
> >>>> situation
> >>>>>>>>> while
> >>>>>>>>>>>> moving to Hadoop 2.0, when some of the stuff might break for
> >>>> 1.0.
> >>>>>>>>>>>>
> >>>>>>>>>>>> I feel it would be good to get this JIRA fix in for 0.11, as
> >>>> 0.20.2
> >>>>>>>>> users
> >>>>>>>>>>>> might be caught unaware. Of course, I must admit there is
> >>>> selfish
> >>>>>>>>>>> interest
> >>>>>>>>>>>> here and it's probably easier for us to have a workaround on
> >>> Pig
> >>>>>>>>> rather
> >>>>>>>>>>>> than upgrade hadoop in all our production DCs.
> >>>>>>>>>>>>
> >>>>>>>>>>>> -Prashant
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Tue, Feb 19, 2013 at 9:54 AM, Russell Jurney <
> >>>>>>>>>>> russell.jurney@gmail.com
> >>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> I think someone should step up and fix the easy ones, if
> >>>> possible.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Tue, Feb 19, 2013 at 9:51 AM, Bill Graham <
> >>>>>>>> billgraham@gmail.com>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Thanks Kai for reporting these.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> What do people think about the severity of these issues
> >>> w.r.t.
> >>>>>>>> Pig
> >>>>>>>>>>> 11?
> >>>>>>>>>>>> I
> >>>>>>>>>>>>>> see a few possible options:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> 1. We include some or all of these patches in a new Pig 11
> >>> rc.
> >>>>>>>>> We'd
> >>>>>>>>>>>> want
> >>>>>>>>>>>>> to
> >>>>>>>>>>>>>> make sure that they don't destabilize the current branch.
> >>> This
> >>>>>>>>>>> approach
> >>>>>>>>>>>>>> makes sense if we think Pig 11 wouldn't be a good release
> >>>>>>>> without
> >>>>>>>>> one
> >>>>>>>>>>>> or
> >>>>>>>>>>>>>> more of these included.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> 2. We continue with the Pig 11 release without these, but
> >>> then
> >>>>>>>>>>> include
> >>>>>>>>>>>>> one
> >>>>>>>>>>>>>> or more in a 0.11.1 release.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> 3. We continue with the Pig 11 release without these, but
> >>> then
> >>>>>>>>>>> include
> >>>>>>>>>>>>> them
> >>>>>>>>>>>>>> in a 0.12 release.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Jon has a patch for the MAP issue
> >>>>>>>>>>>>>> (PIG-3144<https://issues.apache.org/jira/browse/PIG-3144
> >>> )
> >>>>>>>>>>>>>> ready, which seems like the most pressing of the three to
> >>> me.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> thanks,
> >>>>>>>>>>>>>> Bill
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Mon, Feb 18, 2013 at 2:27 AM, Kai Londenberg <
> >>>>>>>>>>>>>> kai.londenberg@googlemail.com> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I just subscribed to the dev mailing list in order to
> >> give
> >>>> you
> >>>>>>>>> some
> >>>>>>>>>>>>>>> feedback on pig 0.11 candidate 2.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> The following three issues are currently present in 0.11
> >>>>>>>>> candidate
> >>>>>>>>>>> 2:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/PIG-3144 -
> >>> 'Erroneous
> >>>>>>>> map
> >>>>>>>>>>>> entry
> >>>>>>>>>>>>>>> alias resolution leading to "Duplicate schema alias"
> >>> errors'
> >>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/PIG-3194 - Changes
> >>> to
> >>>>>>>>>>>>>>> ObjectSerializer.java break compatibility with Hadoop
> >>> 0.20.2
> >>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/PIG-3195 - Race
> >>>>>>>>> Condition in
> >>>>>>>>>>>>>>> PhysicalOperator leads to ExecException "Error while
> >> trying
> >>>> to
> >>>>>>>>> get
> >>>>>>>>>>>>>>> next result in POStream"
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> The last two of these are easily solveable (see the
> >> tickets
> >>>>>>>> for
> >>>>>>>>>>>>>>> details on that). The first one is a bit trickier I
> >> think,
> >>>> but
> >>>>>>>>> at
> >>>>>>>>>>>>>>> least there is a workaround for it (pass Map fields
> >> through
> >>>> an
> >>>>>>>>> UDF)
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> In my personal opinion, each of these problems is pretty
> >>>>>>>> severe,
> >>>>>>>>>>> but
> >>>>>>>>>>>>>>> opinions about the importance of the MAP Datatype and
> >>> STREAM
> >>>>>>>>>>>> Operator,
> >>>>>>>>>>>>>>> as well as Hadoop 0.20.2 compatibility might differ.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> so far ..
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Kai Londenberg
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> --
> >>>>>>>>>>>>>> *Note that I'm no longer using my Yahoo! email address.
> >>> Please
> >>>>>>>>> email
> >>>>>>>>>>> me
> >>>>>>>>>>>>> at
> >>>>>>>>>>>>>> billgraham@gmail.com going forward.*
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> --
> >>>>>>>>>>>>> Russell Jurney twitter.com/rjurney
> >> russell.jurney@gmail.com
> >>>>>>>>>>>>> datasyndrome.com
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> *Note that I'm no longer using my Yahoo! email address. Please
> >>>> email me
> >>>>>> at
> >>>>>>>> billgraham@gmail.com going forward.*
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
> >>>>>> datasyndrome.com
> >>>>>>
> >>>>>>
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
> >>> datasyndrome.com
> >>>
> >>
>

Re: pig 0.11 candidate 2 feedback: Several problems

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
I'd like to get the gc fix in as well, but looks like Rohini is about to commit it so we are good there. 

On Mar 1, 2013, at 11:33 AM, Bill Graham <bi...@gmail.com> wrote:

> +1 to releasing Pig 0.11.1 when this is addressed. I should be able to help
> with the release again.
> 
> 
> 
> On Fri, Mar 1, 2013 at 11:25 AM, Prashant Kommireddi <pr...@gmail.com>wrote:
> 
>> Hey Guys,
>> 
>> I wanted to start a conversation on this again. If Kai is not looking at
>> PIG-3194 I can start working on it to get 0.11 compatible with 20.2. If
>> everyone agrees, we should roll out 0.11.1 sooner than usual and I
>> volunteer to help with it in anyway possible.
>> 
>> Any objections to getting 0.11.1 out soon after 3194 is fixed?
>> 
>> -Prashant
>> 
>> On Wed, Feb 20, 2013 at 3:34 PM, Russell Jurney <russell.jurney@gmail.com
>>> wrote:
>> 
>>> I stand corrected. Cool, 0.11 is good!
>>> 
>>> 
>>> On Wed, Feb 20, 2013 at 1:15 PM, Jarek Jarcec Cecho <jarcec@apache.org
>>>> wrote:
>>> 
>>>> Just a unrelated note: The CDH3 is more closer to Hadoop 1.x than to
>>> 0.20.
>>>> 
>>>> Jarcec
>>>> 
>>>> On Wed, Feb 20, 2013 at 12:04:51PM -0800, Dmitriy Ryaboy wrote:
>>>>> I agree -- this is a good release. The bugs Kai pointed out should be
>>>>> fixed, but as they are not critical regressions, we can fix them in
>>>> 0.11.1
>>>>> (if someone wants to roll 0.11.1 the minute these fixes are
>> committed,
>>> I
>>>>> won't mind and will dutifully vote for the release).
>>>>> 
>>>>> I think the Hadoop 20.2 incompatibility is unfortunate but iirc this
>> is
>>>>> fixable by setting HADOOP_USER_CLASSPATH_FIRST=true (was that in
>> 20.2?)
>>>>> 
>>>>> FWIW Twitter's running CDH3 and this release works in our
>> environment.
>>>>> 
>>>>> At this point things that block a release are critical regressions in
>>>>> performance or correctness.
>>>>> 
>>>>> D
>>>>> 
>>>>> 
>>>>> On Wed, Feb 20, 2013 at 11:52 AM, Alan Gates <ga...@hortonworks.com>
>>>> wrote:
>>>>> 
>>>>>> No.  Bugs like these are supposed to be found and fixed after we
>>> branch
>>>>>> from trunk (which happened several months ago in the case of 0.11).
>>>> The
>>>>>> point of RCs are to check that it's a good build, licenses are
>> right,
>>>> etc.
>>>>>> Any bugs found this late in the game have to be seen as failures
>> of
>>>>>> earlier testing.
>>>>>> 
>>>>>> Alan.
>>>>>> 
>>>>>> On Feb 20, 2013, at 11:33 AM, Russell Jurney wrote:
>>>>>> 
>>>>>>> Isn't the point of an RC to find and fix bugs like these>
>>>>>>> 
>>>>>>> 
>>>>>>> On Wed, Feb 20, 2013 at 11:31 AM, Bill Graham <
>>> billgraham@gmail.com>
>>>>>> wrote:
>>>>>>> 
>>>>>>>> Regarding Pig 11 rc2, I propose we continue with the current
>> vote
>>>> as is
>>>>>>>> (which closes today EOD). Patches for 0.20.2 issues can be
>> rolled
>>>> into a
>>>>>>>> Pig 0.11.1 release whenever they're available and tested.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Wed, Feb 20, 2013 at 9:24 AM, Olga Natkovich <
>>>> onatkovich@yahoo.com
>>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> I agree that supporting as much as we can is a good goal. The
>>>> issue is
>>>>>>>> who
>>>>>>>>> is going to be testing against all these versions? We found the
>>>> issues
>>>>>>>>> under discussion because of a customer report, not because we
>>>>>>>> consistently
>>>>>>>>> test against all versions. Perhaps when we decide which
>> versions
>>> to
>>>>>>>> support
>>>>>>>>> for next release we need also to agree who is going to be
>> testing
>>>> and
>>>>>>>>> maintaining compatibility with a particular version.
>>>>>>>>> 
>>>>>>>>> For instance since Hadoop 23 compatibility is important for us
>> at
>>>> Yahoo
>>>>>>>> we
>>>>>>>>> have been maintaining compatibility with this version for 0.9,
>>>> 0.10 and
>>>>>>>>> will do the same for 0.11 and going forward. I think we would
>>> need
>>>>>> others
>>>>>>>>> to step in and claim the versions of their interest.
>>>>>>>>> 
>>>>>>>>> Olga
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> ________________________________
>>>>>>>>> From: Kai Londenberg <ka...@googlemail.com>
>>>>>>>>> To: dev@pig.apache.org
>>>>>>>>> Sent: Wednesday, February 20, 2013 1:51 AM
>>>>>>>>> Subject: Re: pig 0.11 candidate 2 feedback: Several problems
>>>>>>>>> 
>>>>>>>>> Hi,
>>>>>>>>> 
>>>>>>>>> I stronly agree with Jonathan here. If there are good reasons
>> why
>>>> you
>>>>>>>>> can't support an older version of Hadoop any more, that's one
>>>> thing.
>>>>>>>>> But having to change 2 lines of code doesn't really qualify as
>>>> such in
>>>>>>>>> my point of view ;)
>>>>>>>>> 
>>>>>>>>> At least for me, pig support for 0.20.2 is essential - without
>>> it,
>>>> I
>>>>>>>>> can't use it. If it doesn't support it, I'll have to branch pig
>>> and
>>>>>>>>> hack it myself, or stop using it.
>>>>>>>>> 
>>>>>>>>> I guess, there are a lot of people still running 0.20.2
>> Clusters.
>>>> If
>>>>>>>>> you really have lots of data stored on HDFS and a continuously
>>> busy
>>>>>>>>> cluster, an upgrade is nothing you do "just because".
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 2013/2/20 Jonathan Coveney <jc...@gmail.com>:
>>>>>>>>>> I agree that we shouldn't have to support old versions
>> forever.
>>>> That
>>>>>>>>> said,
>>>>>>>>>> I also don't think we should be too blase about supporting
>> older
>>>>>>>> versions
>>>>>>>>>> where it is not odious to do so. We have a lot of competition
>> in
>>>> the
>>>>>>>>>> language space and the broader the versions we can support,
>> the
>>>> better
>>>>>>>>>> (assuming it isn't too odious to do so). In this case, I don't
>>>> think
>>>>>> it
>>>>>>>>>> should be too hard to change ObjectSerializer so that the
>>>>>> commons-codec
>>>>>>>>>> code used is compatible with both versions...we could just
>>> in-line
>>>>>> some
>>>>>>>>> of
>>>>>>>>>> the Base64 code, and comment accordingly.
>>>>>>>>>> 
>>>>>>>>>> That said, we also should be clear about what versions we
>>>> support, but
>>>>>>>>> 6-12
>>>>>>>>>> months seems short. The upgrade cycles on Hadoop are really,
>>>> really
>>>>>>>> long.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 2013/2/20 Prashant Kommireddi <pr...@gmail.com>
>>>>>>>>>> 
>>>>>>>>>>> Agreed, that makes sense. Probably supporting older hadoop
>>>> version
>>>>>> for
>>>>>>>>> a 1
>>>>>>>>>>> or 2 pig releases before moving to a newer/stable version?
>>>>>>>>>>> 
>>>>>>>>>>> Having said that, should we use 0.11 period to communicate
>> the
>>>> same
>>>>>> to
>>>>>>>>> the
>>>>>>>>>>> community and start moving on 0.12 onwards? I know we are way
>>>> past
>>>>>>>> 6-12
>>>>>>>>>>> months (1-2 release) time frame with 0.20.2, but we also need
>>> to
>>>> make
>>>>>>>>> sure
>>>>>>>>>>> users are aware and plan accordingly.
>>>>>>>>>>> 
>>>>>>>>>>> I'd also be interested to hear how other projects (Hive,
>> Oozie)
>>>> are
>>>>>>>>>>> handling this.
>>>>>>>>>>> 
>>>>>>>>>>> -Prashant
>>>>>>>>>>> 
>>>>>>>>>>> On Tue, Feb 19, 2013 at 3:22 PM, Olga Natkovich <
>>>>>> onatkovich@yahoo.com
>>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> It seems that for each Pig release we need to agree and
>>> clearly
>>>>>>>> state
>>>>>>>>>>>> which Hadoop versions it will support. I guess the main
>>>> question is
>>>>>>>>> how
>>>>>>>>>>> we
>>>>>>>>>>>> decide on this. Perhaps we should say that Pig no longer
>>>> supports
>>>>>>>>> older
>>>>>>>>>>>> Hadoop versions once the newer one is out for at least 6-12
>>>> month to
>>>>>>>>> make
>>>>>>>>>>>> sure it is stable. I don't think we can support old versions
>>>>>>>>>>> indefinitely.
>>>>>>>>>>>> It is in everybody's interest to keep moving forward.
>>>>>>>>>>>> 
>>>>>>>>>>>> Olga
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> ________________________________
>>>>>>>>>>>> From: Prashant Kommireddi <pr...@gmail.com>
>>>>>>>>>>>> To: dev@pig.apache.org
>>>>>>>>>>>> Sent: Tuesday, February 19, 2013 10:57 AM
>>>>>>>>>>>> Subject: Re: pig 0.11 candidate 2 feedback: Several problems
>>>>>>>>>>>> 
>>>>>>>>>>>> What do you guys feel about the JIRA to do with 0.20.2
>>>> compatibility
>>>>>>>>>>>> (PIG-3194)? I am interested in discussing the strategy
>> around
>>>>>>>> backward
>>>>>>>>>>>> compatibility as this is something that would haunt us each
>>>> time we
>>>>>>>>> move
>>>>>>>>>>> to
>>>>>>>>>>>> the next hadoop version. For eg, we might be in a similar
>>>> situation
>>>>>>>>> while
>>>>>>>>>>>> moving to Hadoop 2.0, when some of the stuff might break for
>>>> 1.0.
>>>>>>>>>>>> 
>>>>>>>>>>>> I feel it would be good to get this JIRA fix in for 0.11, as
>>>> 0.20.2
>>>>>>>>> users
>>>>>>>>>>>> might be caught unaware. Of course, I must admit there is
>>>> selfish
>>>>>>>>>>> interest
>>>>>>>>>>>> here and it's probably easier for us to have a workaround on
>>> Pig
>>>>>>>>> rather
>>>>>>>>>>>> than upgrade hadoop in all our production DCs.
>>>>>>>>>>>> 
>>>>>>>>>>>> -Prashant
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On Tue, Feb 19, 2013 at 9:54 AM, Russell Jurney <
>>>>>>>>>>> russell.jurney@gmail.com
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> I think someone should step up and fix the easy ones, if
>>>> possible.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Tue, Feb 19, 2013 at 9:51 AM, Bill Graham <
>>>>>>>> billgraham@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thanks Kai for reporting these.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> What do people think about the severity of these issues
>>> w.r.t.
>>>>>>>> Pig
>>>>>>>>>>> 11?
>>>>>>>>>>>> I
>>>>>>>>>>>>>> see a few possible options:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 1. We include some or all of these patches in a new Pig 11
>>> rc.
>>>>>>>>> We'd
>>>>>>>>>>>> want
>>>>>>>>>>>>> to
>>>>>>>>>>>>>> make sure that they don't destabilize the current branch.
>>> This
>>>>>>>>>>> approach
>>>>>>>>>>>>>> makes sense if we think Pig 11 wouldn't be a good release
>>>>>>>> without
>>>>>>>>> one
>>>>>>>>>>>> or
>>>>>>>>>>>>>> more of these included.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 2. We continue with the Pig 11 release without these, but
>>> then
>>>>>>>>>>> include
>>>>>>>>>>>>> one
>>>>>>>>>>>>>> or more in a 0.11.1 release.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 3. We continue with the Pig 11 release without these, but
>>> then
>>>>>>>>>>> include
>>>>>>>>>>>>> them
>>>>>>>>>>>>>> in a 0.12 release.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Jon has a patch for the MAP issue
>>>>>>>>>>>>>> (PIG-3144<https://issues.apache.org/jira/browse/PIG-3144
>>> )
>>>>>>>>>>>>>> ready, which seems like the most pressing of the three to
>>> me.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> thanks,
>>>>>>>>>>>>>> Bill
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Mon, Feb 18, 2013 at 2:27 AM, Kai Londenberg <
>>>>>>>>>>>>>> kai.londenberg@googlemail.com> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I just subscribed to the dev mailing list in order to
>> give
>>>> you
>>>>>>>>> some
>>>>>>>>>>>>>>> feedback on pig 0.11 candidate 2.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> The following three issues are currently present in 0.11
>>>>>>>>> candidate
>>>>>>>>>>> 2:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/PIG-3144 -
>>> 'Erroneous
>>>>>>>> map
>>>>>>>>>>>> entry
>>>>>>>>>>>>>>> alias resolution leading to "Duplicate schema alias"
>>> errors'
>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/PIG-3194 - Changes
>>> to
>>>>>>>>>>>>>>> ObjectSerializer.java break compatibility with Hadoop
>>> 0.20.2
>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/PIG-3195 - Race
>>>>>>>>> Condition in
>>>>>>>>>>>>>>> PhysicalOperator leads to ExecException "Error while
>> trying
>>>> to
>>>>>>>>> get
>>>>>>>>>>>>>>> next result in POStream"
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> The last two of these are easily solveable (see the
>> tickets
>>>>>>>> for
>>>>>>>>>>>>>>> details on that). The first one is a bit trickier I
>> think,
>>>> but
>>>>>>>>> at
>>>>>>>>>>>>>>> least there is a workaround for it (pass Map fields
>> through
>>>> an
>>>>>>>>> UDF)
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> In my personal opinion, each of these problems is pretty
>>>>>>>> severe,
>>>>>>>>>>> but
>>>>>>>>>>>>>>> opinions about the importance of the MAP Datatype and
>>> STREAM
>>>>>>>>>>>> Operator,
>>>>>>>>>>>>>>> as well as Hadoop 0.20.2 compatibility might differ.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> so far ..
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Kai Londenberg
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> *Note that I'm no longer using my Yahoo! email address.
>>> Please
>>>>>>>>> email
>>>>>>>>>>> me
>>>>>>>>>>>>> at
>>>>>>>>>>>>>> billgraham@gmail.com going forward.*
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Russell Jurney twitter.com/rjurney
>> russell.jurney@gmail.com
>>>>>>>>>>>>> datasyndrome.com
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> *Note that I'm no longer using my Yahoo! email address. Please
>>>> email me
>>>>>> at
>>>>>>>> billgraham@gmail.com going forward.*
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
>>>>>> datasyndrome.com
>>>>>> 
>>>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
>>> datasyndrome.com
>>> 
>> 

Re: pig 0.11 candidate 2 feedback: Several problems

Posted by Julien Le Dem <ju...@ledem.net>.
sounds good to me too
Julien
On Mar 1, 2013, at 11:33 AM, Bill Graham wrote:

> +1 to releasing Pig 0.11.1 when this is addressed. I should be able to help
> with the release again.
> 
> 
> 
> On Fri, Mar 1, 2013 at 11:25 AM, Prashant Kommireddi <pr...@gmail.com>wrote:
> 
>> Hey Guys,
>> 
>> I wanted to start a conversation on this again. If Kai is not looking at
>> PIG-3194 I can start working on it to get 0.11 compatible with 20.2. If
>> everyone agrees, we should roll out 0.11.1 sooner than usual and I
>> volunteer to help with it in anyway possible.
>> 
>> Any objections to getting 0.11.1 out soon after 3194 is fixed?
>> 
>> -Prashant
>> 
>> On Wed, Feb 20, 2013 at 3:34 PM, Russell Jurney <russell.jurney@gmail.com
>>> wrote:
>> 
>>> I stand corrected. Cool, 0.11 is good!
>>> 
>>> 
>>> On Wed, Feb 20, 2013 at 1:15 PM, Jarek Jarcec Cecho <jarcec@apache.org
>>>> wrote:
>>> 
>>>> Just a unrelated note: The CDH3 is more closer to Hadoop 1.x than to
>>> 0.20.
>>>> 
>>>> Jarcec
>>>> 
>>>> On Wed, Feb 20, 2013 at 12:04:51PM -0800, Dmitriy Ryaboy wrote:
>>>>> I agree -- this is a good release. The bugs Kai pointed out should be
>>>>> fixed, but as they are not critical regressions, we can fix them in
>>>> 0.11.1
>>>>> (if someone wants to roll 0.11.1 the minute these fixes are
>> committed,
>>> I
>>>>> won't mind and will dutifully vote for the release).
>>>>> 
>>>>> I think the Hadoop 20.2 incompatibility is unfortunate but iirc this
>> is
>>>>> fixable by setting HADOOP_USER_CLASSPATH_FIRST=true (was that in
>> 20.2?)
>>>>> 
>>>>> FWIW Twitter's running CDH3 and this release works in our
>> environment.
>>>>> 
>>>>> At this point things that block a release are critical regressions in
>>>>> performance or correctness.
>>>>> 
>>>>> D
>>>>> 
>>>>> 
>>>>> On Wed, Feb 20, 2013 at 11:52 AM, Alan Gates <ga...@hortonworks.com>
>>>> wrote:
>>>>> 
>>>>>> No.  Bugs like these are supposed to be found and fixed after we
>>> branch
>>>>>> from trunk (which happened several months ago in the case of 0.11).
>>>> The
>>>>>> point of RCs are to check that it's a good build, licenses are
>> right,
>>>> etc.
>>>>>> Any bugs found this late in the game have to be seen as failures
>> of
>>>>>> earlier testing.
>>>>>> 
>>>>>> Alan.
>>>>>> 
>>>>>> On Feb 20, 2013, at 11:33 AM, Russell Jurney wrote:
>>>>>> 
>>>>>>> Isn't the point of an RC to find and fix bugs like these>
>>>>>>> 
>>>>>>> 
>>>>>>> On Wed, Feb 20, 2013 at 11:31 AM, Bill Graham <
>>> billgraham@gmail.com>
>>>>>> wrote:
>>>>>>> 
>>>>>>>> Regarding Pig 11 rc2, I propose we continue with the current
>> vote
>>>> as is
>>>>>>>> (which closes today EOD). Patches for 0.20.2 issues can be
>> rolled
>>>> into a
>>>>>>>> Pig 0.11.1 release whenever they're available and tested.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Wed, Feb 20, 2013 at 9:24 AM, Olga Natkovich <
>>>> onatkovich@yahoo.com
>>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> I agree that supporting as much as we can is a good goal. The
>>>> issue is
>>>>>>>> who
>>>>>>>>> is going to be testing against all these versions? We found the
>>>> issues
>>>>>>>>> under discussion because of a customer report, not because we
>>>>>>>> consistently
>>>>>>>>> test against all versions. Perhaps when we decide which
>> versions
>>> to
>>>>>>>> support
>>>>>>>>> for next release we need also to agree who is going to be
>> testing
>>>> and
>>>>>>>>> maintaining compatibility with a particular version.
>>>>>>>>> 
>>>>>>>>> For instance since Hadoop 23 compatibility is important for us
>> at
>>>> Yahoo
>>>>>>>> we
>>>>>>>>> have been maintaining compatibility with this version for 0.9,
>>>> 0.10 and
>>>>>>>>> will do the same for 0.11 and going forward. I think we would
>>> need
>>>>>> others
>>>>>>>>> to step in and claim the versions of their interest.
>>>>>>>>> 
>>>>>>>>> Olga
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> ________________________________
>>>>>>>>> From: Kai Londenberg <ka...@googlemail.com>
>>>>>>>>> To: dev@pig.apache.org
>>>>>>>>> Sent: Wednesday, February 20, 2013 1:51 AM
>>>>>>>>> Subject: Re: pig 0.11 candidate 2 feedback: Several problems
>>>>>>>>> 
>>>>>>>>> Hi,
>>>>>>>>> 
>>>>>>>>> I stronly agree with Jonathan here. If there are good reasons
>> why
>>>> you
>>>>>>>>> can't support an older version of Hadoop any more, that's one
>>>> thing.
>>>>>>>>> But having to change 2 lines of code doesn't really qualify as
>>>> such in
>>>>>>>>> my point of view ;)
>>>>>>>>> 
>>>>>>>>> At least for me, pig support for 0.20.2 is essential - without
>>> it,
>>>> I
>>>>>>>>> can't use it. If it doesn't support it, I'll have to branch pig
>>> and
>>>>>>>>> hack it myself, or stop using it.
>>>>>>>>> 
>>>>>>>>> I guess, there are a lot of people still running 0.20.2
>> Clusters.
>>>> If
>>>>>>>>> you really have lots of data stored on HDFS and a continuously
>>> busy
>>>>>>>>> cluster, an upgrade is nothing you do "just because".
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 2013/2/20 Jonathan Coveney <jc...@gmail.com>:
>>>>>>>>>> I agree that we shouldn't have to support old versions
>> forever.
>>>> That
>>>>>>>>> said,
>>>>>>>>>> I also don't think we should be too blase about supporting
>> older
>>>>>>>> versions
>>>>>>>>>> where it is not odious to do so. We have a lot of competition
>> in
>>>> the
>>>>>>>>>> language space and the broader the versions we can support,
>> the
>>>> better
>>>>>>>>>> (assuming it isn't too odious to do so). In this case, I don't
>>>> think
>>>>>> it
>>>>>>>>>> should be too hard to change ObjectSerializer so that the
>>>>>> commons-codec
>>>>>>>>>> code used is compatible with both versions...we could just
>>> in-line
>>>>>> some
>>>>>>>>> of
>>>>>>>>>> the Base64 code, and comment accordingly.
>>>>>>>>>> 
>>>>>>>>>> That said, we also should be clear about what versions we
>>>> support, but
>>>>>>>>> 6-12
>>>>>>>>>> months seems short. The upgrade cycles on Hadoop are really,
>>>> really
>>>>>>>> long.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 2013/2/20 Prashant Kommireddi <pr...@gmail.com>
>>>>>>>>>> 
>>>>>>>>>>> Agreed, that makes sense. Probably supporting older hadoop
>>>> version
>>>>>> for
>>>>>>>>> a 1
>>>>>>>>>>> or 2 pig releases before moving to a newer/stable version?
>>>>>>>>>>> 
>>>>>>>>>>> Having said that, should we use 0.11 period to communicate
>> the
>>>> same
>>>>>> to
>>>>>>>>> the
>>>>>>>>>>> community and start moving on 0.12 onwards? I know we are way
>>>> past
>>>>>>>> 6-12
>>>>>>>>>>> months (1-2 release) time frame with 0.20.2, but we also need
>>> to
>>>> make
>>>>>>>>> sure
>>>>>>>>>>> users are aware and plan accordingly.
>>>>>>>>>>> 
>>>>>>>>>>> I'd also be interested to hear how other projects (Hive,
>> Oozie)
>>>> are
>>>>>>>>>>> handling this.
>>>>>>>>>>> 
>>>>>>>>>>> -Prashant
>>>>>>>>>>> 
>>>>>>>>>>> On Tue, Feb 19, 2013 at 3:22 PM, Olga Natkovich <
>>>>>> onatkovich@yahoo.com
>>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> It seems that for each Pig release we need to agree and
>>> clearly
>>>>>>>> state
>>>>>>>>>>>> which Hadoop versions it will support. I guess the main
>>>> question is
>>>>>>>>> how
>>>>>>>>>>> we
>>>>>>>>>>>> decide on this. Perhaps we should say that Pig no longer
>>>> supports
>>>>>>>>> older
>>>>>>>>>>>> Hadoop versions once the newer one is out for at least 6-12
>>>> month to
>>>>>>>>> make
>>>>>>>>>>>> sure it is stable. I don't think we can support old versions
>>>>>>>>>>> indefinitely.
>>>>>>>>>>>> It is in everybody's interest to keep moving forward.
>>>>>>>>>>>> 
>>>>>>>>>>>> Olga
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> ________________________________
>>>>>>>>>>>> From: Prashant Kommireddi <pr...@gmail.com>
>>>>>>>>>>>> To: dev@pig.apache.org
>>>>>>>>>>>> Sent: Tuesday, February 19, 2013 10:57 AM
>>>>>>>>>>>> Subject: Re: pig 0.11 candidate 2 feedback: Several problems
>>>>>>>>>>>> 
>>>>>>>>>>>> What do you guys feel about the JIRA to do with 0.20.2
>>>> compatibility
>>>>>>>>>>>> (PIG-3194)? I am interested in discussing the strategy
>> around
>>>>>>>> backward
>>>>>>>>>>>> compatibility as this is something that would haunt us each
>>>> time we
>>>>>>>>> move
>>>>>>>>>>> to
>>>>>>>>>>>> the next hadoop version. For eg, we might be in a similar
>>>> situation
>>>>>>>>> while
>>>>>>>>>>>> moving to Hadoop 2.0, when some of the stuff might break for
>>>> 1.0.
>>>>>>>>>>>> 
>>>>>>>>>>>> I feel it would be good to get this JIRA fix in for 0.11, as
>>>> 0.20.2
>>>>>>>>> users
>>>>>>>>>>>> might be caught unaware. Of course, I must admit there is
>>>> selfish
>>>>>>>>>>> interest
>>>>>>>>>>>> here and it's probably easier for us to have a workaround on
>>> Pig
>>>>>>>>> rather
>>>>>>>>>>>> than upgrade hadoop in all our production DCs.
>>>>>>>>>>>> 
>>>>>>>>>>>> -Prashant
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On Tue, Feb 19, 2013 at 9:54 AM, Russell Jurney <
>>>>>>>>>>> russell.jurney@gmail.com
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> I think someone should step up and fix the easy ones, if
>>>> possible.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Tue, Feb 19, 2013 at 9:51 AM, Bill Graham <
>>>>>>>> billgraham@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thanks Kai for reporting these.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> What do people think about the severity of these issues
>>> w.r.t.
>>>>>>>> Pig
>>>>>>>>>>> 11?
>>>>>>>>>>>> I
>>>>>>>>>>>>>> see a few possible options:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 1. We include some or all of these patches in a new Pig 11
>>> rc.
>>>>>>>>> We'd
>>>>>>>>>>>> want
>>>>>>>>>>>>> to
>>>>>>>>>>>>>> make sure that they don't destabilize the current branch.
>>> This
>>>>>>>>>>> approach
>>>>>>>>>>>>>> makes sense if we think Pig 11 wouldn't be a good release
>>>>>>>> without
>>>>>>>>> one
>>>>>>>>>>>> or
>>>>>>>>>>>>>> more of these included.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 2. We continue with the Pig 11 release without these, but
>>> then
>>>>>>>>>>> include
>>>>>>>>>>>>> one
>>>>>>>>>>>>>> or more in a 0.11.1 release.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 3. We continue with the Pig 11 release without these, but
>>> then
>>>>>>>>>>> include
>>>>>>>>>>>>> them
>>>>>>>>>>>>>> in a 0.12 release.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Jon has a patch for the MAP issue
>>>>>>>>>>>>>> (PIG-3144<https://issues.apache.org/jira/browse/PIG-3144
>>> )
>>>>>>>>>>>>>> ready, which seems like the most pressing of the three to
>>> me.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> thanks,
>>>>>>>>>>>>>> Bill
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Mon, Feb 18, 2013 at 2:27 AM, Kai Londenberg <
>>>>>>>>>>>>>> kai.londenberg@googlemail.com> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I just subscribed to the dev mailing list in order to
>> give
>>>> you
>>>>>>>>> some
>>>>>>>>>>>>>>> feedback on pig 0.11 candidate 2.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> The following three issues are currently present in 0.11
>>>>>>>>> candidate
>>>>>>>>>>> 2:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/PIG-3144 -
>>> 'Erroneous
>>>>>>>> map
>>>>>>>>>>>> entry
>>>>>>>>>>>>>>> alias resolution leading to "Duplicate schema alias"
>>> errors'
>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/PIG-3194 - Changes
>>> to
>>>>>>>>>>>>>>> ObjectSerializer.java break compatibility with Hadoop
>>> 0.20.2
>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/PIG-3195 - Race
>>>>>>>>> Condition in
>>>>>>>>>>>>>>> PhysicalOperator leads to ExecException "Error while
>> trying
>>>> to
>>>>>>>>> get
>>>>>>>>>>>>>>> next result in POStream"
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> The last two of these are easily solveable (see the
>> tickets
>>>>>>>> for
>>>>>>>>>>>>>>> details on that). The first one is a bit trickier I
>> think,
>>>> but
>>>>>>>>> at
>>>>>>>>>>>>>>> least there is a workaround for it (pass Map fields
>> through
>>>> an
>>>>>>>>> UDF)
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> In my personal opinion, each of these problems is pretty
>>>>>>>> severe,
>>>>>>>>>>> but
>>>>>>>>>>>>>>> opinions about the importance of the MAP Datatype and
>>> STREAM
>>>>>>>>>>>> Operator,
>>>>>>>>>>>>>>> as well as Hadoop 0.20.2 compatibility might differ.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> so far ..
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Kai Londenberg
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> *Note that I'm no longer using my Yahoo! email address.
>>> Please
>>>>>>>>> email
>>>>>>>>>>> me
>>>>>>>>>>>>> at
>>>>>>>>>>>>>> billgraham@gmail.com going forward.*
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Russell Jurney twitter.com/rjurney
>> russell.jurney@gmail.com
>>>>>>>>>>>>> datasyndrome.com
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> *Note that I'm no longer using my Yahoo! email address. Please
>>>> email me
>>>>>> at
>>>>>>>> billgraham@gmail.com going forward.*
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
>>>>>> datasyndrome.com
>>>>>> 
>>>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
>>> datasyndrome.com
>>> 
>> 


Re: pig 0.11 candidate 2 feedback: Several problems

Posted by Bill Graham <bi...@gmail.com>.
+1 to releasing Pig 0.11.1 when this is addressed. I should be able to help
with the release again.



On Fri, Mar 1, 2013 at 11:25 AM, Prashant Kommireddi <pr...@gmail.com>wrote:

> Hey Guys,
>
> I wanted to start a conversation on this again. If Kai is not looking at
> PIG-3194 I can start working on it to get 0.11 compatible with 20.2. If
> everyone agrees, we should roll out 0.11.1 sooner than usual and I
> volunteer to help with it in anyway possible.
>
> Any objections to getting 0.11.1 out soon after 3194 is fixed?
>
> -Prashant
>
> On Wed, Feb 20, 2013 at 3:34 PM, Russell Jurney <russell.jurney@gmail.com
> >wrote:
>
> > I stand corrected. Cool, 0.11 is good!
> >
> >
> > On Wed, Feb 20, 2013 at 1:15 PM, Jarek Jarcec Cecho <jarcec@apache.org
> > >wrote:
> >
> > > Just a unrelated note: The CDH3 is more closer to Hadoop 1.x than to
> > 0.20.
> > >
> > > Jarcec
> > >
> > > On Wed, Feb 20, 2013 at 12:04:51PM -0800, Dmitriy Ryaboy wrote:
> > > > I agree -- this is a good release. The bugs Kai pointed out should be
> > > > fixed, but as they are not critical regressions, we can fix them in
> > > 0.11.1
> > > > (if someone wants to roll 0.11.1 the minute these fixes are
> committed,
> > I
> > > > won't mind and will dutifully vote for the release).
> > > >
> > > > I think the Hadoop 20.2 incompatibility is unfortunate but iirc this
> is
> > > > fixable by setting HADOOP_USER_CLASSPATH_FIRST=true (was that in
> 20.2?)
> > > >
> > > > FWIW Twitter's running CDH3 and this release works in our
> environment.
> > > >
> > > > At this point things that block a release are critical regressions in
> > > > performance or correctness.
> > > >
> > > > D
> > > >
> > > >
> > > > On Wed, Feb 20, 2013 at 11:52 AM, Alan Gates <ga...@hortonworks.com>
> > > wrote:
> > > >
> > > > > No.  Bugs like these are supposed to be found and fixed after we
> > branch
> > > > > from trunk (which happened several months ago in the case of 0.11).
> > >  The
> > > > > point of RCs are to check that it's a good build, licenses are
> right,
> > > etc.
> > > > >  Any bugs found this late in the game have to be seen as failures
> of
> > > > > earlier testing.
> > > > >
> > > > > Alan.
> > > > >
> > > > > On Feb 20, 2013, at 11:33 AM, Russell Jurney wrote:
> > > > >
> > > > > > Isn't the point of an RC to find and fix bugs like these>
> > > > > >
> > > > > >
> > > > > > On Wed, Feb 20, 2013 at 11:31 AM, Bill Graham <
> > billgraham@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > >> Regarding Pig 11 rc2, I propose we continue with the current
> vote
> > > as is
> > > > > >> (which closes today EOD). Patches for 0.20.2 issues can be
> rolled
> > > into a
> > > > > >> Pig 0.11.1 release whenever they're available and tested.
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >> On Wed, Feb 20, 2013 at 9:24 AM, Olga Natkovich <
> > > onatkovich@yahoo.com
> > > > > >>> wrote:
> > > > > >>
> > > > > >>> I agree that supporting as much as we can is a good goal. The
> > > issue is
> > > > > >> who
> > > > > >>> is going to be testing against all these versions? We found the
> > > issues
> > > > > >>> under discussion because of a customer report, not because we
> > > > > >> consistently
> > > > > >>> test against all versions. Perhaps when we decide which
> versions
> > to
> > > > > >> support
> > > > > >>> for next release we need also to agree who is going to be
> testing
> > > and
> > > > > >>> maintaining compatibility with a particular version.
> > > > > >>>
> > > > > >>> For instance since Hadoop 23 compatibility is important for us
> at
> > > Yahoo
> > > > > >> we
> > > > > >>> have been maintaining compatibility with this version for 0.9,
> > > 0.10 and
> > > > > >>> will do the same for 0.11 and going forward. I think we would
> > need
> > > > > others
> > > > > >>> to step in and claim the versions of their interest.
> > > > > >>>
> > > > > >>> Olga
> > > > > >>>
> > > > > >>>
> > > > > >>> ________________________________
> > > > > >>> From: Kai Londenberg <ka...@googlemail.com>
> > > > > >>> To: dev@pig.apache.org
> > > > > >>> Sent: Wednesday, February 20, 2013 1:51 AM
> > > > > >>> Subject: Re: pig 0.11 candidate 2 feedback: Several problems
> > > > > >>>
> > > > > >>> Hi,
> > > > > >>>
> > > > > >>> I stronly agree with Jonathan here. If there are good reasons
> why
> > > you
> > > > > >>> can't support an older version of Hadoop any more, that's one
> > > thing.
> > > > > >>> But having to change 2 lines of code doesn't really qualify as
> > > such in
> > > > > >>> my point of view ;)
> > > > > >>>
> > > > > >>> At least for me, pig support for 0.20.2 is essential - without
> > it,
> > > I
> > > > > >>> can't use it. If it doesn't support it, I'll have to branch pig
> > and
> > > > > >>> hack it myself, or stop using it.
> > > > > >>>
> > > > > >>> I guess, there are a lot of people still running 0.20.2
> Clusters.
> > > If
> > > > > >>> you really have lots of data stored on HDFS and a continuously
> > busy
> > > > > >>> cluster, an upgrade is nothing you do "just because".
> > > > > >>>
> > > > > >>>
> > > > > >>> 2013/2/20 Jonathan Coveney <jc...@gmail.com>:
> > > > > >>>> I agree that we shouldn't have to support old versions
> forever.
> > > That
> > > > > >>> said,
> > > > > >>>> I also don't think we should be too blase about supporting
> older
> > > > > >> versions
> > > > > >>>> where it is not odious to do so. We have a lot of competition
> in
> > > the
> > > > > >>>> language space and the broader the versions we can support,
> the
> > > better
> > > > > >>>> (assuming it isn't too odious to do so). In this case, I don't
> > > think
> > > > > it
> > > > > >>>> should be too hard to change ObjectSerializer so that the
> > > > > commons-codec
> > > > > >>>> code used is compatible with both versions...we could just
> > in-line
> > > > > some
> > > > > >>> of
> > > > > >>>> the Base64 code, and comment accordingly.
> > > > > >>>>
> > > > > >>>> That said, we also should be clear about what versions we
> > > support, but
> > > > > >>> 6-12
> > > > > >>>> months seems short. The upgrade cycles on Hadoop are really,
> > > really
> > > > > >> long.
> > > > > >>>>
> > > > > >>>>
> > > > > >>>> 2013/2/20 Prashant Kommireddi <pr...@gmail.com>
> > > > > >>>>
> > > > > >>>>> Agreed, that makes sense. Probably supporting older hadoop
> > > version
> > > > > for
> > > > > >>> a 1
> > > > > >>>>> or 2 pig releases before moving to a newer/stable version?
> > > > > >>>>>
> > > > > >>>>> Having said that, should we use 0.11 period to communicate
> the
> > > same
> > > > > to
> > > > > >>> the
> > > > > >>>>> community and start moving on 0.12 onwards? I know we are way
> > > past
> > > > > >> 6-12
> > > > > >>>>> months (1-2 release) time frame with 0.20.2, but we also need
> > to
> > > make
> > > > > >>> sure
> > > > > >>>>> users are aware and plan accordingly.
> > > > > >>>>>
> > > > > >>>>> I'd also be interested to hear how other projects (Hive,
> Oozie)
> > > are
> > > > > >>>>> handling this.
> > > > > >>>>>
> > > > > >>>>> -Prashant
> > > > > >>>>>
> > > > > >>>>> On Tue, Feb 19, 2013 at 3:22 PM, Olga Natkovich <
> > > > > onatkovich@yahoo.com
> > > > > >>>>>> wrote:
> > > > > >>>>>
> > > > > >>>>>> It seems that for each Pig release we need to agree and
> > clearly
> > > > > >> state
> > > > > >>>>>> which Hadoop versions it will support. I guess the main
> > > question is
> > > > > >>> how
> > > > > >>>>> we
> > > > > >>>>>> decide on this. Perhaps we should say that Pig no longer
> > > supports
> > > > > >>> older
> > > > > >>>>>> Hadoop versions once the newer one is out for at least 6-12
> > > month to
> > > > > >>> make
> > > > > >>>>>> sure it is stable. I don't think we can support old versions
> > > > > >>>>> indefinitely.
> > > > > >>>>>> It is in everybody's interest to keep moving forward.
> > > > > >>>>>>
> > > > > >>>>>> Olga
> > > > > >>>>>>
> > > > > >>>>>>
> > > > > >>>>>> ________________________________
> > > > > >>>>>> From: Prashant Kommireddi <pr...@gmail.com>
> > > > > >>>>>> To: dev@pig.apache.org
> > > > > >>>>>> Sent: Tuesday, February 19, 2013 10:57 AM
> > > > > >>>>>> Subject: Re: pig 0.11 candidate 2 feedback: Several problems
> > > > > >>>>>>
> > > > > >>>>>> What do you guys feel about the JIRA to do with 0.20.2
> > > compatibility
> > > > > >>>>>> (PIG-3194)? I am interested in discussing the strategy
> around
> > > > > >> backward
> > > > > >>>>>> compatibility as this is something that would haunt us each
> > > time we
> > > > > >>> move
> > > > > >>>>> to
> > > > > >>>>>> the next hadoop version. For eg, we might be in a similar
> > > situation
> > > > > >>> while
> > > > > >>>>>> moving to Hadoop 2.0, when some of the stuff might break for
> > > 1.0.
> > > > > >>>>>>
> > > > > >>>>>> I feel it would be good to get this JIRA fix in for 0.11, as
> > > 0.20.2
> > > > > >>> users
> > > > > >>>>>> might be caught unaware. Of course, I must admit there is
> > > selfish
> > > > > >>>>> interest
> > > > > >>>>>> here and it's probably easier for us to have a workaround on
> > Pig
> > > > > >>> rather
> > > > > >>>>>> than upgrade hadoop in all our production DCs.
> > > > > >>>>>>
> > > > > >>>>>> -Prashant
> > > > > >>>>>>
> > > > > >>>>>>
> > > > > >>>>>> On Tue, Feb 19, 2013 at 9:54 AM, Russell Jurney <
> > > > > >>>>> russell.jurney@gmail.com
> > > > > >>>>>>> wrote:
> > > > > >>>>>>
> > > > > >>>>>>> I think someone should step up and fix the easy ones, if
> > > possible.
> > > > > >>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>> On Tue, Feb 19, 2013 at 9:51 AM, Bill Graham <
> > > > > >> billgraham@gmail.com>
> > > > > >>>>>> wrote:
> > > > > >>>>>>>
> > > > > >>>>>>>> Thanks Kai for reporting these.
> > > > > >>>>>>>>
> > > > > >>>>>>>> What do people think about the severity of these issues
> > w.r.t.
> > > > > >> Pig
> > > > > >>>>> 11?
> > > > > >>>>>> I
> > > > > >>>>>>>> see a few possible options:
> > > > > >>>>>>>>
> > > > > >>>>>>>> 1. We include some or all of these patches in a new Pig 11
> > rc.
> > > > > >>> We'd
> > > > > >>>>>> want
> > > > > >>>>>>> to
> > > > > >>>>>>>> make sure that they don't destabilize the current branch.
> > This
> > > > > >>>>> approach
> > > > > >>>>>>>> makes sense if we think Pig 11 wouldn't be a good release
> > > > > >> without
> > > > > >>> one
> > > > > >>>>>> or
> > > > > >>>>>>>> more of these included.
> > > > > >>>>>>>>
> > > > > >>>>>>>> 2. We continue with the Pig 11 release without these, but
> > then
> > > > > >>>>> include
> > > > > >>>>>>> one
> > > > > >>>>>>>> or more in a 0.11.1 release.
> > > > > >>>>>>>>
> > > > > >>>>>>>> 3. We continue with the Pig 11 release without these, but
> > then
> > > > > >>>>> include
> > > > > >>>>>>> them
> > > > > >>>>>>>> in a 0.12 release.
> > > > > >>>>>>>>
> > > > > >>>>>>>> Jon has a patch for the MAP issue
> > > > > >>>>>>>> (PIG-3144<https://issues.apache.org/jira/browse/PIG-3144
> >)
> > > > > >>>>>>>> ready, which seems like the most pressing of the three to
> > me.
> > > > > >>>>>>>>
> > > > > >>>>>>>> thanks,
> > > > > >>>>>>>> Bill
> > > > > >>>>>>>>
> > > > > >>>>>>>> On Mon, Feb 18, 2013 at 2:27 AM, Kai Londenberg <
> > > > > >>>>>>>> kai.londenberg@googlemail.com> wrote:
> > > > > >>>>>>>>
> > > > > >>>>>>>>> Hi,
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> I just subscribed to the dev mailing list in order to
> give
> > > you
> > > > > >>> some
> > > > > >>>>>>>>> feedback on pig 0.11 candidate 2.
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> The following three issues are currently present in 0.11
> > > > > >>> candidate
> > > > > >>>>> 2:
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> https://issues.apache.org/jira/browse/PIG-3144 -
> > 'Erroneous
> > > > > >> map
> > > > > >>>>>> entry
> > > > > >>>>>>>>> alias resolution leading to "Duplicate schema alias"
> > errors'
> > > > > >>>>>>>>> https://issues.apache.org/jira/browse/PIG-3194 - Changes
> > to
> > > > > >>>>>>>>> ObjectSerializer.java break compatibility with Hadoop
> > 0.20.2
> > > > > >>>>>>>>> https://issues.apache.org/jira/browse/PIG-3195 - Race
> > > > > >>> Condition in
> > > > > >>>>>>>>> PhysicalOperator leads to ExecException "Error while
> trying
> > > to
> > > > > >>> get
> > > > > >>>>>>>>> next result in POStream"
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> The last two of these are easily solveable (see the
> tickets
> > > > > >> for
> > > > > >>>>>>>>> details on that). The first one is a bit trickier I
> think,
> > > but
> > > > > >>> at
> > > > > >>>>>>>>> least there is a workaround for it (pass Map fields
> through
> > > an
> > > > > >>> UDF)
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> In my personal opinion, each of these problems is pretty
> > > > > >> severe,
> > > > > >>>>> but
> > > > > >>>>>>>>> opinions about the importance of the MAP Datatype and
> > STREAM
> > > > > >>>>>> Operator,
> > > > > >>>>>>>>> as well as Hadoop 0.20.2 compatibility might differ.
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> so far ..
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> Kai Londenberg
> > > > > >>>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>> --
> > > > > >>>>>>>> *Note that I'm no longer using my Yahoo! email address.
> > Please
> > > > > >>> email
> > > > > >>>>> me
> > > > > >>>>>>> at
> > > > > >>>>>>>> billgraham@gmail.com going forward.*
> > > > > >>>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>> --
> > > > > >>>>>>> Russell Jurney twitter.com/rjurney
> russell.jurney@gmail.com
> > > > > >>>>>>> datasyndrome.com
> > > > > >>>>>>>
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >> --
> > > > > >> *Note that I'm no longer using my Yahoo! email address. Please
> > > email me
> > > > > at
> > > > > >> billgraham@gmail.com going forward.*
> > > > > >>
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
> > > > > datasyndrome.com
> > > > >
> > > > >
> > >
> >
> >
> >
> > --
> > Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
> > datasyndrome.com
> >
>

Re: pig 0.11 candidate 2 feedback: Several problems

Posted by Prashant Kommireddi <pr...@gmail.com>.
Hey Guys,

I wanted to start a conversation on this again. If Kai is not looking at
PIG-3194 I can start working on it to get 0.11 compatible with 20.2. If
everyone agrees, we should roll out 0.11.1 sooner than usual and I
volunteer to help with it in anyway possible.

Any objections to getting 0.11.1 out soon after 3194 is fixed?

-Prashant

On Wed, Feb 20, 2013 at 3:34 PM, Russell Jurney <ru...@gmail.com>wrote:

> I stand corrected. Cool, 0.11 is good!
>
>
> On Wed, Feb 20, 2013 at 1:15 PM, Jarek Jarcec Cecho <jarcec@apache.org
> >wrote:
>
> > Just a unrelated note: The CDH3 is more closer to Hadoop 1.x than to
> 0.20.
> >
> > Jarcec
> >
> > On Wed, Feb 20, 2013 at 12:04:51PM -0800, Dmitriy Ryaboy wrote:
> > > I agree -- this is a good release. The bugs Kai pointed out should be
> > > fixed, but as they are not critical regressions, we can fix them in
> > 0.11.1
> > > (if someone wants to roll 0.11.1 the minute these fixes are committed,
> I
> > > won't mind and will dutifully vote for the release).
> > >
> > > I think the Hadoop 20.2 incompatibility is unfortunate but iirc this is
> > > fixable by setting HADOOP_USER_CLASSPATH_FIRST=true (was that in 20.2?)
> > >
> > > FWIW Twitter's running CDH3 and this release works in our environment.
> > >
> > > At this point things that block a release are critical regressions in
> > > performance or correctness.
> > >
> > > D
> > >
> > >
> > > On Wed, Feb 20, 2013 at 11:52 AM, Alan Gates <ga...@hortonworks.com>
> > wrote:
> > >
> > > > No.  Bugs like these are supposed to be found and fixed after we
> branch
> > > > from trunk (which happened several months ago in the case of 0.11).
> >  The
> > > > point of RCs are to check that it's a good build, licenses are right,
> > etc.
> > > >  Any bugs found this late in the game have to be seen as failures of
> > > > earlier testing.
> > > >
> > > > Alan.
> > > >
> > > > On Feb 20, 2013, at 11:33 AM, Russell Jurney wrote:
> > > >
> > > > > Isn't the point of an RC to find and fix bugs like these>
> > > > >
> > > > >
> > > > > On Wed, Feb 20, 2013 at 11:31 AM, Bill Graham <
> billgraham@gmail.com>
> > > > wrote:
> > > > >
> > > > >> Regarding Pig 11 rc2, I propose we continue with the current vote
> > as is
> > > > >> (which closes today EOD). Patches for 0.20.2 issues can be rolled
> > into a
> > > > >> Pig 0.11.1 release whenever they're available and tested.
> > > > >>
> > > > >>
> > > > >>
> > > > >> On Wed, Feb 20, 2013 at 9:24 AM, Olga Natkovich <
> > onatkovich@yahoo.com
> > > > >>> wrote:
> > > > >>
> > > > >>> I agree that supporting as much as we can is a good goal. The
> > issue is
> > > > >> who
> > > > >>> is going to be testing against all these versions? We found the
> > issues
> > > > >>> under discussion because of a customer report, not because we
> > > > >> consistently
> > > > >>> test against all versions. Perhaps when we decide which versions
> to
> > > > >> support
> > > > >>> for next release we need also to agree who is going to be testing
> > and
> > > > >>> maintaining compatibility with a particular version.
> > > > >>>
> > > > >>> For instance since Hadoop 23 compatibility is important for us at
> > Yahoo
> > > > >> we
> > > > >>> have been maintaining compatibility with this version for 0.9,
> > 0.10 and
> > > > >>> will do the same for 0.11 and going forward. I think we would
> need
> > > > others
> > > > >>> to step in and claim the versions of their interest.
> > > > >>>
> > > > >>> Olga
> > > > >>>
> > > > >>>
> > > > >>> ________________________________
> > > > >>> From: Kai Londenberg <ka...@googlemail.com>
> > > > >>> To: dev@pig.apache.org
> > > > >>> Sent: Wednesday, February 20, 2013 1:51 AM
> > > > >>> Subject: Re: pig 0.11 candidate 2 feedback: Several problems
> > > > >>>
> > > > >>> Hi,
> > > > >>>
> > > > >>> I stronly agree with Jonathan here. If there are good reasons why
> > you
> > > > >>> can't support an older version of Hadoop any more, that's one
> > thing.
> > > > >>> But having to change 2 lines of code doesn't really qualify as
> > such in
> > > > >>> my point of view ;)
> > > > >>>
> > > > >>> At least for me, pig support for 0.20.2 is essential - without
> it,
> > I
> > > > >>> can't use it. If it doesn't support it, I'll have to branch pig
> and
> > > > >>> hack it myself, or stop using it.
> > > > >>>
> > > > >>> I guess, there are a lot of people still running 0.20.2 Clusters.
> > If
> > > > >>> you really have lots of data stored on HDFS and a continuously
> busy
> > > > >>> cluster, an upgrade is nothing you do "just because".
> > > > >>>
> > > > >>>
> > > > >>> 2013/2/20 Jonathan Coveney <jc...@gmail.com>:
> > > > >>>> I agree that we shouldn't have to support old versions forever.
> > That
> > > > >>> said,
> > > > >>>> I also don't think we should be too blase about supporting older
> > > > >> versions
> > > > >>>> where it is not odious to do so. We have a lot of competition in
> > the
> > > > >>>> language space and the broader the versions we can support, the
> > better
> > > > >>>> (assuming it isn't too odious to do so). In this case, I don't
> > think
> > > > it
> > > > >>>> should be too hard to change ObjectSerializer so that the
> > > > commons-codec
> > > > >>>> code used is compatible with both versions...we could just
> in-line
> > > > some
> > > > >>> of
> > > > >>>> the Base64 code, and comment accordingly.
> > > > >>>>
> > > > >>>> That said, we also should be clear about what versions we
> > support, but
> > > > >>> 6-12
> > > > >>>> months seems short. The upgrade cycles on Hadoop are really,
> > really
> > > > >> long.
> > > > >>>>
> > > > >>>>
> > > > >>>> 2013/2/20 Prashant Kommireddi <pr...@gmail.com>
> > > > >>>>
> > > > >>>>> Agreed, that makes sense. Probably supporting older hadoop
> > version
> > > > for
> > > > >>> a 1
> > > > >>>>> or 2 pig releases before moving to a newer/stable version?
> > > > >>>>>
> > > > >>>>> Having said that, should we use 0.11 period to communicate the
> > same
> > > > to
> > > > >>> the
> > > > >>>>> community and start moving on 0.12 onwards? I know we are way
> > past
> > > > >> 6-12
> > > > >>>>> months (1-2 release) time frame with 0.20.2, but we also need
> to
> > make
> > > > >>> sure
> > > > >>>>> users are aware and plan accordingly.
> > > > >>>>>
> > > > >>>>> I'd also be interested to hear how other projects (Hive, Oozie)
> > are
> > > > >>>>> handling this.
> > > > >>>>>
> > > > >>>>> -Prashant
> > > > >>>>>
> > > > >>>>> On Tue, Feb 19, 2013 at 3:22 PM, Olga Natkovich <
> > > > onatkovich@yahoo.com
> > > > >>>>>> wrote:
> > > > >>>>>
> > > > >>>>>> It seems that for each Pig release we need to agree and
> clearly
> > > > >> state
> > > > >>>>>> which Hadoop versions it will support. I guess the main
> > question is
> > > > >>> how
> > > > >>>>> we
> > > > >>>>>> decide on this. Perhaps we should say that Pig no longer
> > supports
> > > > >>> older
> > > > >>>>>> Hadoop versions once the newer one is out for at least 6-12
> > month to
> > > > >>> make
> > > > >>>>>> sure it is stable. I don't think we can support old versions
> > > > >>>>> indefinitely.
> > > > >>>>>> It is in everybody's interest to keep moving forward.
> > > > >>>>>>
> > > > >>>>>> Olga
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>> ________________________________
> > > > >>>>>> From: Prashant Kommireddi <pr...@gmail.com>
> > > > >>>>>> To: dev@pig.apache.org
> > > > >>>>>> Sent: Tuesday, February 19, 2013 10:57 AM
> > > > >>>>>> Subject: Re: pig 0.11 candidate 2 feedback: Several problems
> > > > >>>>>>
> > > > >>>>>> What do you guys feel about the JIRA to do with 0.20.2
> > compatibility
> > > > >>>>>> (PIG-3194)? I am interested in discussing the strategy around
> > > > >> backward
> > > > >>>>>> compatibility as this is something that would haunt us each
> > time we
> > > > >>> move
> > > > >>>>> to
> > > > >>>>>> the next hadoop version. For eg, we might be in a similar
> > situation
> > > > >>> while
> > > > >>>>>> moving to Hadoop 2.0, when some of the stuff might break for
> > 1.0.
> > > > >>>>>>
> > > > >>>>>> I feel it would be good to get this JIRA fix in for 0.11, as
> > 0.20.2
> > > > >>> users
> > > > >>>>>> might be caught unaware. Of course, I must admit there is
> > selfish
> > > > >>>>> interest
> > > > >>>>>> here and it's probably easier for us to have a workaround on
> Pig
> > > > >>> rather
> > > > >>>>>> than upgrade hadoop in all our production DCs.
> > > > >>>>>>
> > > > >>>>>> -Prashant
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>> On Tue, Feb 19, 2013 at 9:54 AM, Russell Jurney <
> > > > >>>>> russell.jurney@gmail.com
> > > > >>>>>>> wrote:
> > > > >>>>>>
> > > > >>>>>>> I think someone should step up and fix the easy ones, if
> > possible.
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>> On Tue, Feb 19, 2013 at 9:51 AM, Bill Graham <
> > > > >> billgraham@gmail.com>
> > > > >>>>>> wrote:
> > > > >>>>>>>
> > > > >>>>>>>> Thanks Kai for reporting these.
> > > > >>>>>>>>
> > > > >>>>>>>> What do people think about the severity of these issues
> w.r.t.
> > > > >> Pig
> > > > >>>>> 11?
> > > > >>>>>> I
> > > > >>>>>>>> see a few possible options:
> > > > >>>>>>>>
> > > > >>>>>>>> 1. We include some or all of these patches in a new Pig 11
> rc.
> > > > >>> We'd
> > > > >>>>>> want
> > > > >>>>>>> to
> > > > >>>>>>>> make sure that they don't destabilize the current branch.
> This
> > > > >>>>> approach
> > > > >>>>>>>> makes sense if we think Pig 11 wouldn't be a good release
> > > > >> without
> > > > >>> one
> > > > >>>>>> or
> > > > >>>>>>>> more of these included.
> > > > >>>>>>>>
> > > > >>>>>>>> 2. We continue with the Pig 11 release without these, but
> then
> > > > >>>>> include
> > > > >>>>>>> one
> > > > >>>>>>>> or more in a 0.11.1 release.
> > > > >>>>>>>>
> > > > >>>>>>>> 3. We continue with the Pig 11 release without these, but
> then
> > > > >>>>> include
> > > > >>>>>>> them
> > > > >>>>>>>> in a 0.12 release.
> > > > >>>>>>>>
> > > > >>>>>>>> Jon has a patch for the MAP issue
> > > > >>>>>>>> (PIG-3144<https://issues.apache.org/jira/browse/PIG-3144>)
> > > > >>>>>>>> ready, which seems like the most pressing of the three to
> me.
> > > > >>>>>>>>
> > > > >>>>>>>> thanks,
> > > > >>>>>>>> Bill
> > > > >>>>>>>>
> > > > >>>>>>>> On Mon, Feb 18, 2013 at 2:27 AM, Kai Londenberg <
> > > > >>>>>>>> kai.londenberg@googlemail.com> wrote:
> > > > >>>>>>>>
> > > > >>>>>>>>> Hi,
> > > > >>>>>>>>>
> > > > >>>>>>>>> I just subscribed to the dev mailing list in order to give
> > you
> > > > >>> some
> > > > >>>>>>>>> feedback on pig 0.11 candidate 2.
> > > > >>>>>>>>>
> > > > >>>>>>>>> The following three issues are currently present in 0.11
> > > > >>> candidate
> > > > >>>>> 2:
> > > > >>>>>>>>>
> > > > >>>>>>>>> https://issues.apache.org/jira/browse/PIG-3144 -
> 'Erroneous
> > > > >> map
> > > > >>>>>> entry
> > > > >>>>>>>>> alias resolution leading to "Duplicate schema alias"
> errors'
> > > > >>>>>>>>> https://issues.apache.org/jira/browse/PIG-3194 - Changes
> to
> > > > >>>>>>>>> ObjectSerializer.java break compatibility with Hadoop
> 0.20.2
> > > > >>>>>>>>> https://issues.apache.org/jira/browse/PIG-3195 - Race
> > > > >>> Condition in
> > > > >>>>>>>>> PhysicalOperator leads to ExecException "Error while trying
> > to
> > > > >>> get
> > > > >>>>>>>>> next result in POStream"
> > > > >>>>>>>>>
> > > > >>>>>>>>> The last two of these are easily solveable (see the tickets
> > > > >> for
> > > > >>>>>>>>> details on that). The first one is a bit trickier I think,
> > but
> > > > >>> at
> > > > >>>>>>>>> least there is a workaround for it (pass Map fields through
> > an
> > > > >>> UDF)
> > > > >>>>>>>>>
> > > > >>>>>>>>> In my personal opinion, each of these problems is pretty
> > > > >> severe,
> > > > >>>>> but
> > > > >>>>>>>>> opinions about the importance of the MAP Datatype and
> STREAM
> > > > >>>>>> Operator,
> > > > >>>>>>>>> as well as Hadoop 0.20.2 compatibility might differ.
> > > > >>>>>>>>>
> > > > >>>>>>>>> so far ..
> > > > >>>>>>>>>
> > > > >>>>>>>>> Kai Londenberg
> > > > >>>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>> --
> > > > >>>>>>>> *Note that I'm no longer using my Yahoo! email address.
> Please
> > > > >>> email
> > > > >>>>> me
> > > > >>>>>>> at
> > > > >>>>>>>> billgraham@gmail.com going forward.*
> > > > >>>>>>>>
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>> --
> > > > >>>>>>> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
> > > > >>>>>>> datasyndrome.com
> > > > >>>>>>>
> > > > >>>>>>
> > > > >>>>>
> > > > >>>
> > > > >>
> > > > >>
> > > > >>
> > > > >> --
> > > > >> *Note that I'm no longer using my Yahoo! email address. Please
> > email me
> > > > at
> > > > >> billgraham@gmail.com going forward.*
> > > > >>
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
> > > > datasyndrome.com
> > > >
> > > >
> >
>
>
>
> --
> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
> datasyndrome.com
>

Re: pig 0.11 candidate 2 feedback: Several problems

Posted by Russell Jurney <ru...@gmail.com>.
I stand corrected. Cool, 0.11 is good!


On Wed, Feb 20, 2013 at 1:15 PM, Jarek Jarcec Cecho <ja...@apache.org>wrote:

> Just a unrelated note: The CDH3 is more closer to Hadoop 1.x than to 0.20.
>
> Jarcec
>
> On Wed, Feb 20, 2013 at 12:04:51PM -0800, Dmitriy Ryaboy wrote:
> > I agree -- this is a good release. The bugs Kai pointed out should be
> > fixed, but as they are not critical regressions, we can fix them in
> 0.11.1
> > (if someone wants to roll 0.11.1 the minute these fixes are committed, I
> > won't mind and will dutifully vote for the release).
> >
> > I think the Hadoop 20.2 incompatibility is unfortunate but iirc this is
> > fixable by setting HADOOP_USER_CLASSPATH_FIRST=true (was that in 20.2?)
> >
> > FWIW Twitter's running CDH3 and this release works in our environment.
> >
> > At this point things that block a release are critical regressions in
> > performance or correctness.
> >
> > D
> >
> >
> > On Wed, Feb 20, 2013 at 11:52 AM, Alan Gates <ga...@hortonworks.com>
> wrote:
> >
> > > No.  Bugs like these are supposed to be found and fixed after we branch
> > > from trunk (which happened several months ago in the case of 0.11).
>  The
> > > point of RCs are to check that it's a good build, licenses are right,
> etc.
> > >  Any bugs found this late in the game have to be seen as failures of
> > > earlier testing.
> > >
> > > Alan.
> > >
> > > On Feb 20, 2013, at 11:33 AM, Russell Jurney wrote:
> > >
> > > > Isn't the point of an RC to find and fix bugs like these>
> > > >
> > > >
> > > > On Wed, Feb 20, 2013 at 11:31 AM, Bill Graham <bi...@gmail.com>
> > > wrote:
> > > >
> > > >> Regarding Pig 11 rc2, I propose we continue with the current vote
> as is
> > > >> (which closes today EOD). Patches for 0.20.2 issues can be rolled
> into a
> > > >> Pig 0.11.1 release whenever they're available and tested.
> > > >>
> > > >>
> > > >>
> > > >> On Wed, Feb 20, 2013 at 9:24 AM, Olga Natkovich <
> onatkovich@yahoo.com
> > > >>> wrote:
> > > >>
> > > >>> I agree that supporting as much as we can is a good goal. The
> issue is
> > > >> who
> > > >>> is going to be testing against all these versions? We found the
> issues
> > > >>> under discussion because of a customer report, not because we
> > > >> consistently
> > > >>> test against all versions. Perhaps when we decide which versions to
> > > >> support
> > > >>> for next release we need also to agree who is going to be testing
> and
> > > >>> maintaining compatibility with a particular version.
> > > >>>
> > > >>> For instance since Hadoop 23 compatibility is important for us at
> Yahoo
> > > >> we
> > > >>> have been maintaining compatibility with this version for 0.9,
> 0.10 and
> > > >>> will do the same for 0.11 and going forward. I think we would need
> > > others
> > > >>> to step in and claim the versions of their interest.
> > > >>>
> > > >>> Olga
> > > >>>
> > > >>>
> > > >>> ________________________________
> > > >>> From: Kai Londenberg <ka...@googlemail.com>
> > > >>> To: dev@pig.apache.org
> > > >>> Sent: Wednesday, February 20, 2013 1:51 AM
> > > >>> Subject: Re: pig 0.11 candidate 2 feedback: Several problems
> > > >>>
> > > >>> Hi,
> > > >>>
> > > >>> I stronly agree with Jonathan here. If there are good reasons why
> you
> > > >>> can't support an older version of Hadoop any more, that's one
> thing.
> > > >>> But having to change 2 lines of code doesn't really qualify as
> such in
> > > >>> my point of view ;)
> > > >>>
> > > >>> At least for me, pig support for 0.20.2 is essential - without it,
> I
> > > >>> can't use it. If it doesn't support it, I'll have to branch pig and
> > > >>> hack it myself, or stop using it.
> > > >>>
> > > >>> I guess, there are a lot of people still running 0.20.2 Clusters.
> If
> > > >>> you really have lots of data stored on HDFS and a continuously busy
> > > >>> cluster, an upgrade is nothing you do "just because".
> > > >>>
> > > >>>
> > > >>> 2013/2/20 Jonathan Coveney <jc...@gmail.com>:
> > > >>>> I agree that we shouldn't have to support old versions forever.
> That
> > > >>> said,
> > > >>>> I also don't think we should be too blase about supporting older
> > > >> versions
> > > >>>> where it is not odious to do so. We have a lot of competition in
> the
> > > >>>> language space and the broader the versions we can support, the
> better
> > > >>>> (assuming it isn't too odious to do so). In this case, I don't
> think
> > > it
> > > >>>> should be too hard to change ObjectSerializer so that the
> > > commons-codec
> > > >>>> code used is compatible with both versions...we could just in-line
> > > some
> > > >>> of
> > > >>>> the Base64 code, and comment accordingly.
> > > >>>>
> > > >>>> That said, we also should be clear about what versions we
> support, but
> > > >>> 6-12
> > > >>>> months seems short. The upgrade cycles on Hadoop are really,
> really
> > > >> long.
> > > >>>>
> > > >>>>
> > > >>>> 2013/2/20 Prashant Kommireddi <pr...@gmail.com>
> > > >>>>
> > > >>>>> Agreed, that makes sense. Probably supporting older hadoop
> version
> > > for
> > > >>> a 1
> > > >>>>> or 2 pig releases before moving to a newer/stable version?
> > > >>>>>
> > > >>>>> Having said that, should we use 0.11 period to communicate the
> same
> > > to
> > > >>> the
> > > >>>>> community and start moving on 0.12 onwards? I know we are way
> past
> > > >> 6-12
> > > >>>>> months (1-2 release) time frame with 0.20.2, but we also need to
> make
> > > >>> sure
> > > >>>>> users are aware and plan accordingly.
> > > >>>>>
> > > >>>>> I'd also be interested to hear how other projects (Hive, Oozie)
> are
> > > >>>>> handling this.
> > > >>>>>
> > > >>>>> -Prashant
> > > >>>>>
> > > >>>>> On Tue, Feb 19, 2013 at 3:22 PM, Olga Natkovich <
> > > onatkovich@yahoo.com
> > > >>>>>> wrote:
> > > >>>>>
> > > >>>>>> It seems that for each Pig release we need to agree and clearly
> > > >> state
> > > >>>>>> which Hadoop versions it will support. I guess the main
> question is
> > > >>> how
> > > >>>>> we
> > > >>>>>> decide on this. Perhaps we should say that Pig no longer
> supports
> > > >>> older
> > > >>>>>> Hadoop versions once the newer one is out for at least 6-12
> month to
> > > >>> make
> > > >>>>>> sure it is stable. I don't think we can support old versions
> > > >>>>> indefinitely.
> > > >>>>>> It is in everybody's interest to keep moving forward.
> > > >>>>>>
> > > >>>>>> Olga
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> ________________________________
> > > >>>>>> From: Prashant Kommireddi <pr...@gmail.com>
> > > >>>>>> To: dev@pig.apache.org
> > > >>>>>> Sent: Tuesday, February 19, 2013 10:57 AM
> > > >>>>>> Subject: Re: pig 0.11 candidate 2 feedback: Several problems
> > > >>>>>>
> > > >>>>>> What do you guys feel about the JIRA to do with 0.20.2
> compatibility
> > > >>>>>> (PIG-3194)? I am interested in discussing the strategy around
> > > >> backward
> > > >>>>>> compatibility as this is something that would haunt us each
> time we
> > > >>> move
> > > >>>>> to
> > > >>>>>> the next hadoop version. For eg, we might be in a similar
> situation
> > > >>> while
> > > >>>>>> moving to Hadoop 2.0, when some of the stuff might break for
> 1.0.
> > > >>>>>>
> > > >>>>>> I feel it would be good to get this JIRA fix in for 0.11, as
> 0.20.2
> > > >>> users
> > > >>>>>> might be caught unaware. Of course, I must admit there is
> selfish
> > > >>>>> interest
> > > >>>>>> here and it's probably easier for us to have a workaround on Pig
> > > >>> rather
> > > >>>>>> than upgrade hadoop in all our production DCs.
> > > >>>>>>
> > > >>>>>> -Prashant
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> On Tue, Feb 19, 2013 at 9:54 AM, Russell Jurney <
> > > >>>>> russell.jurney@gmail.com
> > > >>>>>>> wrote:
> > > >>>>>>
> > > >>>>>>> I think someone should step up and fix the easy ones, if
> possible.
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> On Tue, Feb 19, 2013 at 9:51 AM, Bill Graham <
> > > >> billgraham@gmail.com>
> > > >>>>>> wrote:
> > > >>>>>>>
> > > >>>>>>>> Thanks Kai for reporting these.
> > > >>>>>>>>
> > > >>>>>>>> What do people think about the severity of these issues w.r.t.
> > > >> Pig
> > > >>>>> 11?
> > > >>>>>> I
> > > >>>>>>>> see a few possible options:
> > > >>>>>>>>
> > > >>>>>>>> 1. We include some or all of these patches in a new Pig 11 rc.
> > > >>> We'd
> > > >>>>>> want
> > > >>>>>>> to
> > > >>>>>>>> make sure that they don't destabilize the current branch. This
> > > >>>>> approach
> > > >>>>>>>> makes sense if we think Pig 11 wouldn't be a good release
> > > >> without
> > > >>> one
> > > >>>>>> or
> > > >>>>>>>> more of these included.
> > > >>>>>>>>
> > > >>>>>>>> 2. We continue with the Pig 11 release without these, but then
> > > >>>>> include
> > > >>>>>>> one
> > > >>>>>>>> or more in a 0.11.1 release.
> > > >>>>>>>>
> > > >>>>>>>> 3. We continue with the Pig 11 release without these, but then
> > > >>>>> include
> > > >>>>>>> them
> > > >>>>>>>> in a 0.12 release.
> > > >>>>>>>>
> > > >>>>>>>> Jon has a patch for the MAP issue
> > > >>>>>>>> (PIG-3144<https://issues.apache.org/jira/browse/PIG-3144>)
> > > >>>>>>>> ready, which seems like the most pressing of the three to me.
> > > >>>>>>>>
> > > >>>>>>>> thanks,
> > > >>>>>>>> Bill
> > > >>>>>>>>
> > > >>>>>>>> On Mon, Feb 18, 2013 at 2:27 AM, Kai Londenberg <
> > > >>>>>>>> kai.londenberg@googlemail.com> wrote:
> > > >>>>>>>>
> > > >>>>>>>>> Hi,
> > > >>>>>>>>>
> > > >>>>>>>>> I just subscribed to the dev mailing list in order to give
> you
> > > >>> some
> > > >>>>>>>>> feedback on pig 0.11 candidate 2.
> > > >>>>>>>>>
> > > >>>>>>>>> The following three issues are currently present in 0.11
> > > >>> candidate
> > > >>>>> 2:
> > > >>>>>>>>>
> > > >>>>>>>>> https://issues.apache.org/jira/browse/PIG-3144 - 'Erroneous
> > > >> map
> > > >>>>>> entry
> > > >>>>>>>>> alias resolution leading to "Duplicate schema alias" errors'
> > > >>>>>>>>> https://issues.apache.org/jira/browse/PIG-3194 - Changes to
> > > >>>>>>>>> ObjectSerializer.java break compatibility with Hadoop 0.20.2
> > > >>>>>>>>> https://issues.apache.org/jira/browse/PIG-3195 - Race
> > > >>> Condition in
> > > >>>>>>>>> PhysicalOperator leads to ExecException "Error while trying
> to
> > > >>> get
> > > >>>>>>>>> next result in POStream"
> > > >>>>>>>>>
> > > >>>>>>>>> The last two of these are easily solveable (see the tickets
> > > >> for
> > > >>>>>>>>> details on that). The first one is a bit trickier I think,
> but
> > > >>> at
> > > >>>>>>>>> least there is a workaround for it (pass Map fields through
> an
> > > >>> UDF)
> > > >>>>>>>>>
> > > >>>>>>>>> In my personal opinion, each of these problems is pretty
> > > >> severe,
> > > >>>>> but
> > > >>>>>>>>> opinions about the importance of the MAP Datatype and STREAM
> > > >>>>>> Operator,
> > > >>>>>>>>> as well as Hadoop 0.20.2 compatibility might differ.
> > > >>>>>>>>>
> > > >>>>>>>>> so far ..
> > > >>>>>>>>>
> > > >>>>>>>>> Kai Londenberg
> > > >>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>> --
> > > >>>>>>>> *Note that I'm no longer using my Yahoo! email address. Please
> > > >>> email
> > > >>>>> me
> > > >>>>>>> at
> > > >>>>>>>> billgraham@gmail.com going forward.*
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> --
> > > >>>>>>> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
> > > >>>>>>> datasyndrome.com
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> *Note that I'm no longer using my Yahoo! email address. Please
> email me
> > > at
> > > >> billgraham@gmail.com going forward.*
> > > >>
> > > >
> > > >
> > > >
> > > > --
> > > > Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
> > > datasyndrome.com
> > >
> > >
>



-- 
Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.com

Re: pig 0.11 candidate 2 feedback: Several problems

Posted by Jarek Jarcec Cecho <ja...@apache.org>.
Just a unrelated note: The CDH3 is more closer to Hadoop 1.x than to 0.20.

Jarcec

On Wed, Feb 20, 2013 at 12:04:51PM -0800, Dmitriy Ryaboy wrote:
> I agree -- this is a good release. The bugs Kai pointed out should be
> fixed, but as they are not critical regressions, we can fix them in 0.11.1
> (if someone wants to roll 0.11.1 the minute these fixes are committed, I
> won't mind and will dutifully vote for the release).
> 
> I think the Hadoop 20.2 incompatibility is unfortunate but iirc this is
> fixable by setting HADOOP_USER_CLASSPATH_FIRST=true (was that in 20.2?)
> 
> FWIW Twitter's running CDH3 and this release works in our environment.
> 
> At this point things that block a release are critical regressions in
> performance or correctness.
> 
> D
> 
> 
> On Wed, Feb 20, 2013 at 11:52 AM, Alan Gates <ga...@hortonworks.com> wrote:
> 
> > No.  Bugs like these are supposed to be found and fixed after we branch
> > from trunk (which happened several months ago in the case of 0.11).  The
> > point of RCs are to check that it's a good build, licenses are right, etc.
> >  Any bugs found this late in the game have to be seen as failures of
> > earlier testing.
> >
> > Alan.
> >
> > On Feb 20, 2013, at 11:33 AM, Russell Jurney wrote:
> >
> > > Isn't the point of an RC to find and fix bugs like these>
> > >
> > >
> > > On Wed, Feb 20, 2013 at 11:31 AM, Bill Graham <bi...@gmail.com>
> > wrote:
> > >
> > >> Regarding Pig 11 rc2, I propose we continue with the current vote as is
> > >> (which closes today EOD). Patches for 0.20.2 issues can be rolled into a
> > >> Pig 0.11.1 release whenever they're available and tested.
> > >>
> > >>
> > >>
> > >> On Wed, Feb 20, 2013 at 9:24 AM, Olga Natkovich <onatkovich@yahoo.com
> > >>> wrote:
> > >>
> > >>> I agree that supporting as much as we can is a good goal. The issue is
> > >> who
> > >>> is going to be testing against all these versions? We found the issues
> > >>> under discussion because of a customer report, not because we
> > >> consistently
> > >>> test against all versions. Perhaps when we decide which versions to
> > >> support
> > >>> for next release we need also to agree who is going to be testing and
> > >>> maintaining compatibility with a particular version.
> > >>>
> > >>> For instance since Hadoop 23 compatibility is important for us at Yahoo
> > >> we
> > >>> have been maintaining compatibility with this version for 0.9, 0.10 and
> > >>> will do the same for 0.11 and going forward. I think we would need
> > others
> > >>> to step in and claim the versions of their interest.
> > >>>
> > >>> Olga
> > >>>
> > >>>
> > >>> ________________________________
> > >>> From: Kai Londenberg <ka...@googlemail.com>
> > >>> To: dev@pig.apache.org
> > >>> Sent: Wednesday, February 20, 2013 1:51 AM
> > >>> Subject: Re: pig 0.11 candidate 2 feedback: Several problems
> > >>>
> > >>> Hi,
> > >>>
> > >>> I stronly agree with Jonathan here. If there are good reasons why you
> > >>> can't support an older version of Hadoop any more, that's one thing.
> > >>> But having to change 2 lines of code doesn't really qualify as such in
> > >>> my point of view ;)
> > >>>
> > >>> At least for me, pig support for 0.20.2 is essential - without it, I
> > >>> can't use it. If it doesn't support it, I'll have to branch pig and
> > >>> hack it myself, or stop using it.
> > >>>
> > >>> I guess, there are a lot of people still running 0.20.2 Clusters. If
> > >>> you really have lots of data stored on HDFS and a continuously busy
> > >>> cluster, an upgrade is nothing you do "just because".
> > >>>
> > >>>
> > >>> 2013/2/20 Jonathan Coveney <jc...@gmail.com>:
> > >>>> I agree that we shouldn't have to support old versions forever. That
> > >>> said,
> > >>>> I also don't think we should be too blase about supporting older
> > >> versions
> > >>>> where it is not odious to do so. We have a lot of competition in the
> > >>>> language space and the broader the versions we can support, the better
> > >>>> (assuming it isn't too odious to do so). In this case, I don't think
> > it
> > >>>> should be too hard to change ObjectSerializer so that the
> > commons-codec
> > >>>> code used is compatible with both versions...we could just in-line
> > some
> > >>> of
> > >>>> the Base64 code, and comment accordingly.
> > >>>>
> > >>>> That said, we also should be clear about what versions we support, but
> > >>> 6-12
> > >>>> months seems short. The upgrade cycles on Hadoop are really, really
> > >> long.
> > >>>>
> > >>>>
> > >>>> 2013/2/20 Prashant Kommireddi <pr...@gmail.com>
> > >>>>
> > >>>>> Agreed, that makes sense. Probably supporting older hadoop version
> > for
> > >>> a 1
> > >>>>> or 2 pig releases before moving to a newer/stable version?
> > >>>>>
> > >>>>> Having said that, should we use 0.11 period to communicate the same
> > to
> > >>> the
> > >>>>> community and start moving on 0.12 onwards? I know we are way past
> > >> 6-12
> > >>>>> months (1-2 release) time frame with 0.20.2, but we also need to make
> > >>> sure
> > >>>>> users are aware and plan accordingly.
> > >>>>>
> > >>>>> I'd also be interested to hear how other projects (Hive, Oozie) are
> > >>>>> handling this.
> > >>>>>
> > >>>>> -Prashant
> > >>>>>
> > >>>>> On Tue, Feb 19, 2013 at 3:22 PM, Olga Natkovich <
> > onatkovich@yahoo.com
> > >>>>>> wrote:
> > >>>>>
> > >>>>>> It seems that for each Pig release we need to agree and clearly
> > >> state
> > >>>>>> which Hadoop versions it will support. I guess the main question is
> > >>> how
> > >>>>> we
> > >>>>>> decide on this. Perhaps we should say that Pig no longer supports
> > >>> older
> > >>>>>> Hadoop versions once the newer one is out for at least 6-12 month to
> > >>> make
> > >>>>>> sure it is stable. I don't think we can support old versions
> > >>>>> indefinitely.
> > >>>>>> It is in everybody's interest to keep moving forward.
> > >>>>>>
> > >>>>>> Olga
> > >>>>>>
> > >>>>>>
> > >>>>>> ________________________________
> > >>>>>> From: Prashant Kommireddi <pr...@gmail.com>
> > >>>>>> To: dev@pig.apache.org
> > >>>>>> Sent: Tuesday, February 19, 2013 10:57 AM
> > >>>>>> Subject: Re: pig 0.11 candidate 2 feedback: Several problems
> > >>>>>>
> > >>>>>> What do you guys feel about the JIRA to do with 0.20.2 compatibility
> > >>>>>> (PIG-3194)? I am interested in discussing the strategy around
> > >> backward
> > >>>>>> compatibility as this is something that would haunt us each time we
> > >>> move
> > >>>>> to
> > >>>>>> the next hadoop version. For eg, we might be in a similar situation
> > >>> while
> > >>>>>> moving to Hadoop 2.0, when some of the stuff might break for 1.0.
> > >>>>>>
> > >>>>>> I feel it would be good to get this JIRA fix in for 0.11, as 0.20.2
> > >>> users
> > >>>>>> might be caught unaware. Of course, I must admit there is selfish
> > >>>>> interest
> > >>>>>> here and it's probably easier for us to have a workaround on Pig
> > >>> rather
> > >>>>>> than upgrade hadoop in all our production DCs.
> > >>>>>>
> > >>>>>> -Prashant
> > >>>>>>
> > >>>>>>
> > >>>>>> On Tue, Feb 19, 2013 at 9:54 AM, Russell Jurney <
> > >>>>> russell.jurney@gmail.com
> > >>>>>>> wrote:
> > >>>>>>
> > >>>>>>> I think someone should step up and fix the easy ones, if possible.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> On Tue, Feb 19, 2013 at 9:51 AM, Bill Graham <
> > >> billgraham@gmail.com>
> > >>>>>> wrote:
> > >>>>>>>
> > >>>>>>>> Thanks Kai for reporting these.
> > >>>>>>>>
> > >>>>>>>> What do people think about the severity of these issues w.r.t.
> > >> Pig
> > >>>>> 11?
> > >>>>>> I
> > >>>>>>>> see a few possible options:
> > >>>>>>>>
> > >>>>>>>> 1. We include some or all of these patches in a new Pig 11 rc.
> > >>> We'd
> > >>>>>> want
> > >>>>>>> to
> > >>>>>>>> make sure that they don't destabilize the current branch. This
> > >>>>> approach
> > >>>>>>>> makes sense if we think Pig 11 wouldn't be a good release
> > >> without
> > >>> one
> > >>>>>> or
> > >>>>>>>> more of these included.
> > >>>>>>>>
> > >>>>>>>> 2. We continue with the Pig 11 release without these, but then
> > >>>>> include
> > >>>>>>> one
> > >>>>>>>> or more in a 0.11.1 release.
> > >>>>>>>>
> > >>>>>>>> 3. We continue with the Pig 11 release without these, but then
> > >>>>> include
> > >>>>>>> them
> > >>>>>>>> in a 0.12 release.
> > >>>>>>>>
> > >>>>>>>> Jon has a patch for the MAP issue
> > >>>>>>>> (PIG-3144<https://issues.apache.org/jira/browse/PIG-3144>)
> > >>>>>>>> ready, which seems like the most pressing of the three to me.
> > >>>>>>>>
> > >>>>>>>> thanks,
> > >>>>>>>> Bill
> > >>>>>>>>
> > >>>>>>>> On Mon, Feb 18, 2013 at 2:27 AM, Kai Londenberg <
> > >>>>>>>> kai.londenberg@googlemail.com> wrote:
> > >>>>>>>>
> > >>>>>>>>> Hi,
> > >>>>>>>>>
> > >>>>>>>>> I just subscribed to the dev mailing list in order to give you
> > >>> some
> > >>>>>>>>> feedback on pig 0.11 candidate 2.
> > >>>>>>>>>
> > >>>>>>>>> The following three issues are currently present in 0.11
> > >>> candidate
> > >>>>> 2:
> > >>>>>>>>>
> > >>>>>>>>> https://issues.apache.org/jira/browse/PIG-3144 - 'Erroneous
> > >> map
> > >>>>>> entry
> > >>>>>>>>> alias resolution leading to "Duplicate schema alias" errors'
> > >>>>>>>>> https://issues.apache.org/jira/browse/PIG-3194 - Changes to
> > >>>>>>>>> ObjectSerializer.java break compatibility with Hadoop 0.20.2
> > >>>>>>>>> https://issues.apache.org/jira/browse/PIG-3195 - Race
> > >>> Condition in
> > >>>>>>>>> PhysicalOperator leads to ExecException "Error while trying to
> > >>> get
> > >>>>>>>>> next result in POStream"
> > >>>>>>>>>
> > >>>>>>>>> The last two of these are easily solveable (see the tickets
> > >> for
> > >>>>>>>>> details on that). The first one is a bit trickier I think, but
> > >>> at
> > >>>>>>>>> least there is a workaround for it (pass Map fields through an
> > >>> UDF)
> > >>>>>>>>>
> > >>>>>>>>> In my personal opinion, each of these problems is pretty
> > >> severe,
> > >>>>> but
> > >>>>>>>>> opinions about the importance of the MAP Datatype and STREAM
> > >>>>>> Operator,
> > >>>>>>>>> as well as Hadoop 0.20.2 compatibility might differ.
> > >>>>>>>>>
> > >>>>>>>>> so far ..
> > >>>>>>>>>
> > >>>>>>>>> Kai Londenberg
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> --
> > >>>>>>>> *Note that I'm no longer using my Yahoo! email address. Please
> > >>> email
> > >>>>> me
> > >>>>>>> at
> > >>>>>>>> billgraham@gmail.com going forward.*
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> --
> > >>>>>>> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
> > >>>>>>> datasyndrome.com
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>
> > >>
> > >>
> > >>
> > >> --
> > >> *Note that I'm no longer using my Yahoo! email address. Please email me
> > at
> > >> billgraham@gmail.com going forward.*
> > >>
> > >
> > >
> > >
> > > --
> > > Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
> > datasyndrome.com
> >
> >

Re: pig 0.11 candidate 2 feedback: Several problems

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
I agree -- this is a good release. The bugs Kai pointed out should be
fixed, but as they are not critical regressions, we can fix them in 0.11.1
(if someone wants to roll 0.11.1 the minute these fixes are committed, I
won't mind and will dutifully vote for the release).

I think the Hadoop 20.2 incompatibility is unfortunate but iirc this is
fixable by setting HADOOP_USER_CLASSPATH_FIRST=true (was that in 20.2?)

FWIW Twitter's running CDH3 and this release works in our environment.

At this point things that block a release are critical regressions in
performance or correctness.

D


On Wed, Feb 20, 2013 at 11:52 AM, Alan Gates <ga...@hortonworks.com> wrote:

> No.  Bugs like these are supposed to be found and fixed after we branch
> from trunk (which happened several months ago in the case of 0.11).  The
> point of RCs are to check that it's a good build, licenses are right, etc.
>  Any bugs found this late in the game have to be seen as failures of
> earlier testing.
>
> Alan.
>
> On Feb 20, 2013, at 11:33 AM, Russell Jurney wrote:
>
> > Isn't the point of an RC to find and fix bugs like these>
> >
> >
> > On Wed, Feb 20, 2013 at 11:31 AM, Bill Graham <bi...@gmail.com>
> wrote:
> >
> >> Regarding Pig 11 rc2, I propose we continue with the current vote as is
> >> (which closes today EOD). Patches for 0.20.2 issues can be rolled into a
> >> Pig 0.11.1 release whenever they're available and tested.
> >>
> >>
> >>
> >> On Wed, Feb 20, 2013 at 9:24 AM, Olga Natkovich <onatkovich@yahoo.com
> >>> wrote:
> >>
> >>> I agree that supporting as much as we can is a good goal. The issue is
> >> who
> >>> is going to be testing against all these versions? We found the issues
> >>> under discussion because of a customer report, not because we
> >> consistently
> >>> test against all versions. Perhaps when we decide which versions to
> >> support
> >>> for next release we need also to agree who is going to be testing and
> >>> maintaining compatibility with a particular version.
> >>>
> >>> For instance since Hadoop 23 compatibility is important for us at Yahoo
> >> we
> >>> have been maintaining compatibility with this version for 0.9, 0.10 and
> >>> will do the same for 0.11 and going forward. I think we would need
> others
> >>> to step in and claim the versions of their interest.
> >>>
> >>> Olga
> >>>
> >>>
> >>> ________________________________
> >>> From: Kai Londenberg <ka...@googlemail.com>
> >>> To: dev@pig.apache.org
> >>> Sent: Wednesday, February 20, 2013 1:51 AM
> >>> Subject: Re: pig 0.11 candidate 2 feedback: Several problems
> >>>
> >>> Hi,
> >>>
> >>> I stronly agree with Jonathan here. If there are good reasons why you
> >>> can't support an older version of Hadoop any more, that's one thing.
> >>> But having to change 2 lines of code doesn't really qualify as such in
> >>> my point of view ;)
> >>>
> >>> At least for me, pig support for 0.20.2 is essential - without it, I
> >>> can't use it. If it doesn't support it, I'll have to branch pig and
> >>> hack it myself, or stop using it.
> >>>
> >>> I guess, there are a lot of people still running 0.20.2 Clusters. If
> >>> you really have lots of data stored on HDFS and a continuously busy
> >>> cluster, an upgrade is nothing you do "just because".
> >>>
> >>>
> >>> 2013/2/20 Jonathan Coveney <jc...@gmail.com>:
> >>>> I agree that we shouldn't have to support old versions forever. That
> >>> said,
> >>>> I also don't think we should be too blase about supporting older
> >> versions
> >>>> where it is not odious to do so. We have a lot of competition in the
> >>>> language space and the broader the versions we can support, the better
> >>>> (assuming it isn't too odious to do so). In this case, I don't think
> it
> >>>> should be too hard to change ObjectSerializer so that the
> commons-codec
> >>>> code used is compatible with both versions...we could just in-line
> some
> >>> of
> >>>> the Base64 code, and comment accordingly.
> >>>>
> >>>> That said, we also should be clear about what versions we support, but
> >>> 6-12
> >>>> months seems short. The upgrade cycles on Hadoop are really, really
> >> long.
> >>>>
> >>>>
> >>>> 2013/2/20 Prashant Kommireddi <pr...@gmail.com>
> >>>>
> >>>>> Agreed, that makes sense. Probably supporting older hadoop version
> for
> >>> a 1
> >>>>> or 2 pig releases before moving to a newer/stable version?
> >>>>>
> >>>>> Having said that, should we use 0.11 period to communicate the same
> to
> >>> the
> >>>>> community and start moving on 0.12 onwards? I know we are way past
> >> 6-12
> >>>>> months (1-2 release) time frame with 0.20.2, but we also need to make
> >>> sure
> >>>>> users are aware and plan accordingly.
> >>>>>
> >>>>> I'd also be interested to hear how other projects (Hive, Oozie) are
> >>>>> handling this.
> >>>>>
> >>>>> -Prashant
> >>>>>
> >>>>> On Tue, Feb 19, 2013 at 3:22 PM, Olga Natkovich <
> onatkovich@yahoo.com
> >>>>>> wrote:
> >>>>>
> >>>>>> It seems that for each Pig release we need to agree and clearly
> >> state
> >>>>>> which Hadoop versions it will support. I guess the main question is
> >>> how
> >>>>> we
> >>>>>> decide on this. Perhaps we should say that Pig no longer supports
> >>> older
> >>>>>> Hadoop versions once the newer one is out for at least 6-12 month to
> >>> make
> >>>>>> sure it is stable. I don't think we can support old versions
> >>>>> indefinitely.
> >>>>>> It is in everybody's interest to keep moving forward.
> >>>>>>
> >>>>>> Olga
> >>>>>>
> >>>>>>
> >>>>>> ________________________________
> >>>>>> From: Prashant Kommireddi <pr...@gmail.com>
> >>>>>> To: dev@pig.apache.org
> >>>>>> Sent: Tuesday, February 19, 2013 10:57 AM
> >>>>>> Subject: Re: pig 0.11 candidate 2 feedback: Several problems
> >>>>>>
> >>>>>> What do you guys feel about the JIRA to do with 0.20.2 compatibility
> >>>>>> (PIG-3194)? I am interested in discussing the strategy around
> >> backward
> >>>>>> compatibility as this is something that would haunt us each time we
> >>> move
> >>>>> to
> >>>>>> the next hadoop version. For eg, we might be in a similar situation
> >>> while
> >>>>>> moving to Hadoop 2.0, when some of the stuff might break for 1.0.
> >>>>>>
> >>>>>> I feel it would be good to get this JIRA fix in for 0.11, as 0.20.2
> >>> users
> >>>>>> might be caught unaware. Of course, I must admit there is selfish
> >>>>> interest
> >>>>>> here and it's probably easier for us to have a workaround on Pig
> >>> rather
> >>>>>> than upgrade hadoop in all our production DCs.
> >>>>>>
> >>>>>> -Prashant
> >>>>>>
> >>>>>>
> >>>>>> On Tue, Feb 19, 2013 at 9:54 AM, Russell Jurney <
> >>>>> russell.jurney@gmail.com
> >>>>>>> wrote:
> >>>>>>
> >>>>>>> I think someone should step up and fix the easy ones, if possible.
> >>>>>>>
> >>>>>>>
> >>>>>>> On Tue, Feb 19, 2013 at 9:51 AM, Bill Graham <
> >> billgraham@gmail.com>
> >>>>>> wrote:
> >>>>>>>
> >>>>>>>> Thanks Kai for reporting these.
> >>>>>>>>
> >>>>>>>> What do people think about the severity of these issues w.r.t.
> >> Pig
> >>>>> 11?
> >>>>>> I
> >>>>>>>> see a few possible options:
> >>>>>>>>
> >>>>>>>> 1. We include some or all of these patches in a new Pig 11 rc.
> >>> We'd
> >>>>>> want
> >>>>>>> to
> >>>>>>>> make sure that they don't destabilize the current branch. This
> >>>>> approach
> >>>>>>>> makes sense if we think Pig 11 wouldn't be a good release
> >> without
> >>> one
> >>>>>> or
> >>>>>>>> more of these included.
> >>>>>>>>
> >>>>>>>> 2. We continue with the Pig 11 release without these, but then
> >>>>> include
> >>>>>>> one
> >>>>>>>> or more in a 0.11.1 release.
> >>>>>>>>
> >>>>>>>> 3. We continue with the Pig 11 release without these, but then
> >>>>> include
> >>>>>>> them
> >>>>>>>> in a 0.12 release.
> >>>>>>>>
> >>>>>>>> Jon has a patch for the MAP issue
> >>>>>>>> (PIG-3144<https://issues.apache.org/jira/browse/PIG-3144>)
> >>>>>>>> ready, which seems like the most pressing of the three to me.
> >>>>>>>>
> >>>>>>>> thanks,
> >>>>>>>> Bill
> >>>>>>>>
> >>>>>>>> On Mon, Feb 18, 2013 at 2:27 AM, Kai Londenberg <
> >>>>>>>> kai.londenberg@googlemail.com> wrote:
> >>>>>>>>
> >>>>>>>>> Hi,
> >>>>>>>>>
> >>>>>>>>> I just subscribed to the dev mailing list in order to give you
> >>> some
> >>>>>>>>> feedback on pig 0.11 candidate 2.
> >>>>>>>>>
> >>>>>>>>> The following three issues are currently present in 0.11
> >>> candidate
> >>>>> 2:
> >>>>>>>>>
> >>>>>>>>> https://issues.apache.org/jira/browse/PIG-3144 - 'Erroneous
> >> map
> >>>>>> entry
> >>>>>>>>> alias resolution leading to "Duplicate schema alias" errors'
> >>>>>>>>> https://issues.apache.org/jira/browse/PIG-3194 - Changes to
> >>>>>>>>> ObjectSerializer.java break compatibility with Hadoop 0.20.2
> >>>>>>>>> https://issues.apache.org/jira/browse/PIG-3195 - Race
> >>> Condition in
> >>>>>>>>> PhysicalOperator leads to ExecException "Error while trying to
> >>> get
> >>>>>>>>> next result in POStream"
> >>>>>>>>>
> >>>>>>>>> The last two of these are easily solveable (see the tickets
> >> for
> >>>>>>>>> details on that). The first one is a bit trickier I think, but
> >>> at
> >>>>>>>>> least there is a workaround for it (pass Map fields through an
> >>> UDF)
> >>>>>>>>>
> >>>>>>>>> In my personal opinion, each of these problems is pretty
> >> severe,
> >>>>> but
> >>>>>>>>> opinions about the importance of the MAP Datatype and STREAM
> >>>>>> Operator,
> >>>>>>>>> as well as Hadoop 0.20.2 compatibility might differ.
> >>>>>>>>>
> >>>>>>>>> so far ..
> >>>>>>>>>
> >>>>>>>>> Kai Londenberg
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> *Note that I'm no longer using my Yahoo! email address. Please
> >>> email
> >>>>> me
> >>>>>>> at
> >>>>>>>> billgraham@gmail.com going forward.*
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
> >>>>>>> datasyndrome.com
> >>>>>>>
> >>>>>>
> >>>>>
> >>>
> >>
> >>
> >>
> >> --
> >> *Note that I'm no longer using my Yahoo! email address. Please email me
> at
> >> billgraham@gmail.com going forward.*
> >>
> >
> >
> >
> > --
> > Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
> datasyndrome.com
>
>

Re: pig 0.11 candidate 2 feedback: Several problems

Posted by Alan Gates <ga...@hortonworks.com>.
No.  Bugs like these are supposed to be found and fixed after we branch from trunk (which happened several months ago in the case of 0.11).  The point of RCs are to check that it's a good build, licenses are right, etc.  Any bugs found this late in the game have to be seen as failures of earlier testing.

Alan.

On Feb 20, 2013, at 11:33 AM, Russell Jurney wrote:

> Isn't the point of an RC to find and fix bugs like these>
> 
> 
> On Wed, Feb 20, 2013 at 11:31 AM, Bill Graham <bi...@gmail.com> wrote:
> 
>> Regarding Pig 11 rc2, I propose we continue with the current vote as is
>> (which closes today EOD). Patches for 0.20.2 issues can be rolled into a
>> Pig 0.11.1 release whenever they're available and tested.
>> 
>> 
>> 
>> On Wed, Feb 20, 2013 at 9:24 AM, Olga Natkovich <onatkovich@yahoo.com
>>> wrote:
>> 
>>> I agree that supporting as much as we can is a good goal. The issue is
>> who
>>> is going to be testing against all these versions? We found the issues
>>> under discussion because of a customer report, not because we
>> consistently
>>> test against all versions. Perhaps when we decide which versions to
>> support
>>> for next release we need also to agree who is going to be testing and
>>> maintaining compatibility with a particular version.
>>> 
>>> For instance since Hadoop 23 compatibility is important for us at Yahoo
>> we
>>> have been maintaining compatibility with this version for 0.9, 0.10 and
>>> will do the same for 0.11 and going forward. I think we would need others
>>> to step in and claim the versions of their interest.
>>> 
>>> Olga
>>> 
>>> 
>>> ________________________________
>>> From: Kai Londenberg <ka...@googlemail.com>
>>> To: dev@pig.apache.org
>>> Sent: Wednesday, February 20, 2013 1:51 AM
>>> Subject: Re: pig 0.11 candidate 2 feedback: Several problems
>>> 
>>> Hi,
>>> 
>>> I stronly agree with Jonathan here. If there are good reasons why you
>>> can't support an older version of Hadoop any more, that's one thing.
>>> But having to change 2 lines of code doesn't really qualify as such in
>>> my point of view ;)
>>> 
>>> At least for me, pig support for 0.20.2 is essential - without it, I
>>> can't use it. If it doesn't support it, I'll have to branch pig and
>>> hack it myself, or stop using it.
>>> 
>>> I guess, there are a lot of people still running 0.20.2 Clusters. If
>>> you really have lots of data stored on HDFS and a continuously busy
>>> cluster, an upgrade is nothing you do "just because".
>>> 
>>> 
>>> 2013/2/20 Jonathan Coveney <jc...@gmail.com>:
>>>> I agree that we shouldn't have to support old versions forever. That
>>> said,
>>>> I also don't think we should be too blase about supporting older
>> versions
>>>> where it is not odious to do so. We have a lot of competition in the
>>>> language space and the broader the versions we can support, the better
>>>> (assuming it isn't too odious to do so). In this case, I don't think it
>>>> should be too hard to change ObjectSerializer so that the commons-codec
>>>> code used is compatible with both versions...we could just in-line some
>>> of
>>>> the Base64 code, and comment accordingly.
>>>> 
>>>> That said, we also should be clear about what versions we support, but
>>> 6-12
>>>> months seems short. The upgrade cycles on Hadoop are really, really
>> long.
>>>> 
>>>> 
>>>> 2013/2/20 Prashant Kommireddi <pr...@gmail.com>
>>>> 
>>>>> Agreed, that makes sense. Probably supporting older hadoop version for
>>> a 1
>>>>> or 2 pig releases before moving to a newer/stable version?
>>>>> 
>>>>> Having said that, should we use 0.11 period to communicate the same to
>>> the
>>>>> community and start moving on 0.12 onwards? I know we are way past
>> 6-12
>>>>> months (1-2 release) time frame with 0.20.2, but we also need to make
>>> sure
>>>>> users are aware and plan accordingly.
>>>>> 
>>>>> I'd also be interested to hear how other projects (Hive, Oozie) are
>>>>> handling this.
>>>>> 
>>>>> -Prashant
>>>>> 
>>>>> On Tue, Feb 19, 2013 at 3:22 PM, Olga Natkovich <onatkovich@yahoo.com
>>>>>> wrote:
>>>>> 
>>>>>> It seems that for each Pig release we need to agree and clearly
>> state
>>>>>> which Hadoop versions it will support. I guess the main question is
>>> how
>>>>> we
>>>>>> decide on this. Perhaps we should say that Pig no longer supports
>>> older
>>>>>> Hadoop versions once the newer one is out for at least 6-12 month to
>>> make
>>>>>> sure it is stable. I don't think we can support old versions
>>>>> indefinitely.
>>>>>> It is in everybody's interest to keep moving forward.
>>>>>> 
>>>>>> Olga
>>>>>> 
>>>>>> 
>>>>>> ________________________________
>>>>>> From: Prashant Kommireddi <pr...@gmail.com>
>>>>>> To: dev@pig.apache.org
>>>>>> Sent: Tuesday, February 19, 2013 10:57 AM
>>>>>> Subject: Re: pig 0.11 candidate 2 feedback: Several problems
>>>>>> 
>>>>>> What do you guys feel about the JIRA to do with 0.20.2 compatibility
>>>>>> (PIG-3194)? I am interested in discussing the strategy around
>> backward
>>>>>> compatibility as this is something that would haunt us each time we
>>> move
>>>>> to
>>>>>> the next hadoop version. For eg, we might be in a similar situation
>>> while
>>>>>> moving to Hadoop 2.0, when some of the stuff might break for 1.0.
>>>>>> 
>>>>>> I feel it would be good to get this JIRA fix in for 0.11, as 0.20.2
>>> users
>>>>>> might be caught unaware. Of course, I must admit there is selfish
>>>>> interest
>>>>>> here and it's probably easier for us to have a workaround on Pig
>>> rather
>>>>>> than upgrade hadoop in all our production DCs.
>>>>>> 
>>>>>> -Prashant
>>>>>> 
>>>>>> 
>>>>>> On Tue, Feb 19, 2013 at 9:54 AM, Russell Jurney <
>>>>> russell.jurney@gmail.com
>>>>>>> wrote:
>>>>>> 
>>>>>>> I think someone should step up and fix the easy ones, if possible.
>>>>>>> 
>>>>>>> 
>>>>>>> On Tue, Feb 19, 2013 at 9:51 AM, Bill Graham <
>> billgraham@gmail.com>
>>>>>> wrote:
>>>>>>> 
>>>>>>>> Thanks Kai for reporting these.
>>>>>>>> 
>>>>>>>> What do people think about the severity of these issues w.r.t.
>> Pig
>>>>> 11?
>>>>>> I
>>>>>>>> see a few possible options:
>>>>>>>> 
>>>>>>>> 1. We include some or all of these patches in a new Pig 11 rc.
>>> We'd
>>>>>> want
>>>>>>> to
>>>>>>>> make sure that they don't destabilize the current branch. This
>>>>> approach
>>>>>>>> makes sense if we think Pig 11 wouldn't be a good release
>> without
>>> one
>>>>>> or
>>>>>>>> more of these included.
>>>>>>>> 
>>>>>>>> 2. We continue with the Pig 11 release without these, but then
>>>>> include
>>>>>>> one
>>>>>>>> or more in a 0.11.1 release.
>>>>>>>> 
>>>>>>>> 3. We continue with the Pig 11 release without these, but then
>>>>> include
>>>>>>> them
>>>>>>>> in a 0.12 release.
>>>>>>>> 
>>>>>>>> Jon has a patch for the MAP issue
>>>>>>>> (PIG-3144<https://issues.apache.org/jira/browse/PIG-3144>)
>>>>>>>> ready, which seems like the most pressing of the three to me.
>>>>>>>> 
>>>>>>>> thanks,
>>>>>>>> Bill
>>>>>>>> 
>>>>>>>> On Mon, Feb 18, 2013 at 2:27 AM, Kai Londenberg <
>>>>>>>> kai.londenberg@googlemail.com> wrote:
>>>>>>>> 
>>>>>>>>> Hi,
>>>>>>>>> 
>>>>>>>>> I just subscribed to the dev mailing list in order to give you
>>> some
>>>>>>>>> feedback on pig 0.11 candidate 2.
>>>>>>>>> 
>>>>>>>>> The following three issues are currently present in 0.11
>>> candidate
>>>>> 2:
>>>>>>>>> 
>>>>>>>>> https://issues.apache.org/jira/browse/PIG-3144 - 'Erroneous
>> map
>>>>>> entry
>>>>>>>>> alias resolution leading to "Duplicate schema alias" errors'
>>>>>>>>> https://issues.apache.org/jira/browse/PIG-3194 - Changes to
>>>>>>>>> ObjectSerializer.java break compatibility with Hadoop 0.20.2
>>>>>>>>> https://issues.apache.org/jira/browse/PIG-3195 - Race
>>> Condition in
>>>>>>>>> PhysicalOperator leads to ExecException "Error while trying to
>>> get
>>>>>>>>> next result in POStream"
>>>>>>>>> 
>>>>>>>>> The last two of these are easily solveable (see the tickets
>> for
>>>>>>>>> details on that). The first one is a bit trickier I think, but
>>> at
>>>>>>>>> least there is a workaround for it (pass Map fields through an
>>> UDF)
>>>>>>>>> 
>>>>>>>>> In my personal opinion, each of these problems is pretty
>> severe,
>>>>> but
>>>>>>>>> opinions about the importance of the MAP Datatype and STREAM
>>>>>> Operator,
>>>>>>>>> as well as Hadoop 0.20.2 compatibility might differ.
>>>>>>>>> 
>>>>>>>>> so far ..
>>>>>>>>> 
>>>>>>>>> Kai Londenberg
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> *Note that I'm no longer using my Yahoo! email address. Please
>>> email
>>>>> me
>>>>>>> at
>>>>>>>> billgraham@gmail.com going forward.*
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
>>>>>>> datasyndrome.com
>>>>>>> 
>>>>>> 
>>>>> 
>>> 
>> 
>> 
>> 
>> --
>> *Note that I'm no longer using my Yahoo! email address. Please email me at
>> billgraham@gmail.com going forward.*
>> 
> 
> 
> 
> -- 
> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.com


Re: pig 0.11 candidate 2 feedback: Several problems

Posted by Russell Jurney <ru...@gmail.com>.
Isn't the point of an RC to find and fix bugs like these>


On Wed, Feb 20, 2013 at 11:31 AM, Bill Graham <bi...@gmail.com> wrote:

> Regarding Pig 11 rc2, I propose we continue with the current vote as is
> (which closes today EOD). Patches for 0.20.2 issues can be rolled into a
> Pig 0.11.1 release whenever they're available and tested.
>
>
>
> On Wed, Feb 20, 2013 at 9:24 AM, Olga Natkovich <onatkovich@yahoo.com
> >wrote:
>
> > I agree that supporting as much as we can is a good goal. The issue is
> who
> > is going to be testing against all these versions? We found the issues
> > under discussion because of a customer report, not because we
> consistently
> > test against all versions. Perhaps when we decide which versions to
> support
> > for next release we need also to agree who is going to be testing and
> > maintaining compatibility with a particular version.
> >
> > For instance since Hadoop 23 compatibility is important for us at Yahoo
> we
> > have been maintaining compatibility with this version for 0.9, 0.10 and
> > will do the same for 0.11 and going forward. I think we would need others
> > to step in and claim the versions of their interest.
> >
> > Olga
> >
> >
> > ________________________________
> >  From: Kai Londenberg <ka...@googlemail.com>
> > To: dev@pig.apache.org
> > Sent: Wednesday, February 20, 2013 1:51 AM
> > Subject: Re: pig 0.11 candidate 2 feedback: Several problems
> >
> > Hi,
> >
> > I stronly agree with Jonathan here. If there are good reasons why you
> > can't support an older version of Hadoop any more, that's one thing.
> > But having to change 2 lines of code doesn't really qualify as such in
> > my point of view ;)
> >
> > At least for me, pig support for 0.20.2 is essential - without it, I
> > can't use it. If it doesn't support it, I'll have to branch pig and
> > hack it myself, or stop using it.
> >
> > I guess, there are a lot of people still running 0.20.2 Clusters. If
> > you really have lots of data stored on HDFS and a continuously busy
> > cluster, an upgrade is nothing you do "just because".
> >
> >
> > 2013/2/20 Jonathan Coveney <jc...@gmail.com>:
> > > I agree that we shouldn't have to support old versions forever. That
> > said,
> > > I also don't think we should be too blase about supporting older
> versions
> > > where it is not odious to do so. We have a lot of competition in the
> > > language space and the broader the versions we can support, the better
> > > (assuming it isn't too odious to do so). In this case, I don't think it
> > > should be too hard to change ObjectSerializer so that the commons-codec
> > > code used is compatible with both versions...we could just in-line some
> > of
> > > the Base64 code, and comment accordingly.
> > >
> > > That said, we also should be clear about what versions we support, but
> > 6-12
> > > months seems short. The upgrade cycles on Hadoop are really, really
> long.
> > >
> > >
> > > 2013/2/20 Prashant Kommireddi <pr...@gmail.com>
> > >
> > >> Agreed, that makes sense. Probably supporting older hadoop version for
> > a 1
> > >> or 2 pig releases before moving to a newer/stable version?
> > >>
> > >> Having said that, should we use 0.11 period to communicate the same to
> > the
> > >> community and start moving on 0.12 onwards? I know we are way past
> 6-12
> > >> months (1-2 release) time frame with 0.20.2, but we also need to make
> > sure
> > >> users are aware and plan accordingly.
> > >>
> > >> I'd also be interested to hear how other projects (Hive, Oozie) are
> > >> handling this.
> > >>
> > >> -Prashant
> > >>
> > >> On Tue, Feb 19, 2013 at 3:22 PM, Olga Natkovich <onatkovich@yahoo.com
> > >> >wrote:
> > >>
> > >> > It seems that for each Pig release we need to agree and clearly
> state
> > >> > which Hadoop versions it will support. I guess the main question is
> > how
> > >> we
> > >> > decide on this. Perhaps we should say that Pig no longer supports
> > older
> > >> > Hadoop versions once the newer one is out for at least 6-12 month to
> > make
> > >> > sure it is stable. I don't think we can support old versions
> > >> indefinitely.
> > >> > It is in everybody's interest to keep moving forward.
> > >> >
> > >> > Olga
> > >> >
> > >> >
> > >> > ________________________________
> > >> >  From: Prashant Kommireddi <pr...@gmail.com>
> > >> > To: dev@pig.apache.org
> > >> > Sent: Tuesday, February 19, 2013 10:57 AM
> > >> > Subject: Re: pig 0.11 candidate 2 feedback: Several problems
> > >> >
> > >> > What do you guys feel about the JIRA to do with 0.20.2 compatibility
> > >> > (PIG-3194)? I am interested in discussing the strategy around
> backward
> > >> > compatibility as this is something that would haunt us each time we
> > move
> > >> to
> > >> > the next hadoop version. For eg, we might be in a similar situation
> > while
> > >> > moving to Hadoop 2.0, when some of the stuff might break for 1.0.
> > >> >
> > >> > I feel it would be good to get this JIRA fix in for 0.11, as 0.20.2
> > users
> > >> > might be caught unaware. Of course, I must admit there is selfish
> > >> interest
> > >> > here and it's probably easier for us to have a workaround on Pig
> > rather
> > >> > than upgrade hadoop in all our production DCs.
> > >> >
> > >> > -Prashant
> > >> >
> > >> >
> > >> > On Tue, Feb 19, 2013 at 9:54 AM, Russell Jurney <
> > >> russell.jurney@gmail.com
> > >> > >wrote:
> > >> >
> > >> > > I think someone should step up and fix the easy ones, if possible.
> > >> > >
> > >> > >
> > >> > > On Tue, Feb 19, 2013 at 9:51 AM, Bill Graham <
> billgraham@gmail.com>
> > >> > wrote:
> > >> > >
> > >> > > > Thanks Kai for reporting these.
> > >> > > >
> > >> > > > What do people think about the severity of these issues w.r.t.
> Pig
> > >> 11?
> > >> > I
> > >> > > > see a few possible options:
> > >> > > >
> > >> > > > 1. We include some or all of these patches in a new Pig 11 rc.
> > We'd
> > >> > want
> > >> > > to
> > >> > > > make sure that they don't destabilize the current branch. This
> > >> approach
> > >> > > > makes sense if we think Pig 11 wouldn't be a good release
> without
> > one
> > >> > or
> > >> > > > more of these included.
> > >> > > >
> > >> > > > 2. We continue with the Pig 11 release without these, but then
> > >> include
> > >> > > one
> > >> > > > or more in a 0.11.1 release.
> > >> > > >
> > >> > > > 3. We continue with the Pig 11 release without these, but then
> > >> include
> > >> > > them
> > >> > > > in a 0.12 release.
> > >> > > >
> > >> > > > Jon has a patch for the MAP issue
> > >> > > > (PIG-3144<https://issues.apache.org/jira/browse/PIG-3144>)
> > >> > > > ready, which seems like the most pressing of the three to me.
> > >> > > >
> > >> > > > thanks,
> > >> > > > Bill
> > >> > > >
> > >> > > > On Mon, Feb 18, 2013 at 2:27 AM, Kai Londenberg <
> > >> > > > kai.londenberg@googlemail.com> wrote:
> > >> > > >
> > >> > > > > Hi,
> > >> > > > >
> > >> > > > > I just subscribed to the dev mailing list in order to give you
> > some
> > >> > > > > feedback on pig 0.11 candidate 2.
> > >> > > > >
> > >> > > > > The following three issues are currently present in 0.11
> > candidate
> > >> 2:
> > >> > > > >
> > >> > > > > https://issues.apache.org/jira/browse/PIG-3144 - 'Erroneous
> map
> > >> > entry
> > >> > > > > alias resolution leading to "Duplicate schema alias" errors'
> > >> > > > > https://issues.apache.org/jira/browse/PIG-3194 - Changes to
> > >> > > > > ObjectSerializer.java break compatibility with Hadoop 0.20.2
> > >> > > > > https://issues.apache.org/jira/browse/PIG-3195 - Race
> > Condition in
> > >> > > > > PhysicalOperator leads to ExecException "Error while trying to
> > get
> > >> > > > > next result in POStream"
> > >> > > > >
> > >> > > > > The last two of these are easily solveable (see the tickets
> for
> > >> > > > > details on that). The first one is a bit trickier I think, but
> > at
> > >> > > > > least there is a workaround for it (pass Map fields through an
> > UDF)
> > >> > > > >
> > >> > > > > In my personal opinion, each of these problems is pretty
> severe,
> > >> but
> > >> > > > > opinions about the importance of the MAP Datatype and STREAM
> > >> > Operator,
> > >> > > > > as well as Hadoop 0.20.2 compatibility might differ.
> > >> > > > >
> > >> > > > > so far ..
> > >> > > > >
> > >> > > > > Kai Londenberg
> > >> > > > >
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > > --
> > >> > > > *Note that I'm no longer using my Yahoo! email address. Please
> > email
> > >> me
> > >> > > at
> > >> > > > billgraham@gmail.com going forward.*
> > >> > > >
> > >> > >
> > >> > >
> > >> > >
> > >> > > --
> > >> > > Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
> > >> > > datasyndrome.com
> > >> > >
> > >> >
> > >>
> >
>
>
>
> --
> *Note that I'm no longer using my Yahoo! email address. Please email me at
> billgraham@gmail.com going forward.*
>



-- 
Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.com

Re: pig 0.11 candidate 2 feedback: Several problems

Posted by Bill Graham <bi...@gmail.com>.
Regarding Pig 11 rc2, I propose we continue with the current vote as is
(which closes today EOD). Patches for 0.20.2 issues can be rolled into a
Pig 0.11.1 release whenever they're available and tested.



On Wed, Feb 20, 2013 at 9:24 AM, Olga Natkovich <on...@yahoo.com>wrote:

> I agree that supporting as much as we can is a good goal. The issue is who
> is going to be testing against all these versions? We found the issues
> under discussion because of a customer report, not because we consistently
> test against all versions. Perhaps when we decide which versions to support
> for next release we need also to agree who is going to be testing and
> maintaining compatibility with a particular version.
>
> For instance since Hadoop 23 compatibility is important for us at Yahoo we
> have been maintaining compatibility with this version for 0.9, 0.10 and
> will do the same for 0.11 and going forward. I think we would need others
> to step in and claim the versions of their interest.
>
> Olga
>
>
> ________________________________
>  From: Kai Londenberg <ka...@googlemail.com>
> To: dev@pig.apache.org
> Sent: Wednesday, February 20, 2013 1:51 AM
> Subject: Re: pig 0.11 candidate 2 feedback: Several problems
>
> Hi,
>
> I stronly agree with Jonathan here. If there are good reasons why you
> can't support an older version of Hadoop any more, that's one thing.
> But having to change 2 lines of code doesn't really qualify as such in
> my point of view ;)
>
> At least for me, pig support for 0.20.2 is essential - without it, I
> can't use it. If it doesn't support it, I'll have to branch pig and
> hack it myself, or stop using it.
>
> I guess, there are a lot of people still running 0.20.2 Clusters. If
> you really have lots of data stored on HDFS and a continuously busy
> cluster, an upgrade is nothing you do "just because".
>
>
> 2013/2/20 Jonathan Coveney <jc...@gmail.com>:
> > I agree that we shouldn't have to support old versions forever. That
> said,
> > I also don't think we should be too blase about supporting older versions
> > where it is not odious to do so. We have a lot of competition in the
> > language space and the broader the versions we can support, the better
> > (assuming it isn't too odious to do so). In this case, I don't think it
> > should be too hard to change ObjectSerializer so that the commons-codec
> > code used is compatible with both versions...we could just in-line some
> of
> > the Base64 code, and comment accordingly.
> >
> > That said, we also should be clear about what versions we support, but
> 6-12
> > months seems short. The upgrade cycles on Hadoop are really, really long.
> >
> >
> > 2013/2/20 Prashant Kommireddi <pr...@gmail.com>
> >
> >> Agreed, that makes sense. Probably supporting older hadoop version for
> a 1
> >> or 2 pig releases before moving to a newer/stable version?
> >>
> >> Having said that, should we use 0.11 period to communicate the same to
> the
> >> community and start moving on 0.12 onwards? I know we are way past 6-12
> >> months (1-2 release) time frame with 0.20.2, but we also need to make
> sure
> >> users are aware and plan accordingly.
> >>
> >> I'd also be interested to hear how other projects (Hive, Oozie) are
> >> handling this.
> >>
> >> -Prashant
> >>
> >> On Tue, Feb 19, 2013 at 3:22 PM, Olga Natkovich <onatkovich@yahoo.com
> >> >wrote:
> >>
> >> > It seems that for each Pig release we need to agree and clearly state
> >> > which Hadoop versions it will support. I guess the main question is
> how
> >> we
> >> > decide on this. Perhaps we should say that Pig no longer supports
> older
> >> > Hadoop versions once the newer one is out for at least 6-12 month to
> make
> >> > sure it is stable. I don't think we can support old versions
> >> indefinitely.
> >> > It is in everybody's interest to keep moving forward.
> >> >
> >> > Olga
> >> >
> >> >
> >> > ________________________________
> >> >  From: Prashant Kommireddi <pr...@gmail.com>
> >> > To: dev@pig.apache.org
> >> > Sent: Tuesday, February 19, 2013 10:57 AM
> >> > Subject: Re: pig 0.11 candidate 2 feedback: Several problems
> >> >
> >> > What do you guys feel about the JIRA to do with 0.20.2 compatibility
> >> > (PIG-3194)? I am interested in discussing the strategy around backward
> >> > compatibility as this is something that would haunt us each time we
> move
> >> to
> >> > the next hadoop version. For eg, we might be in a similar situation
> while
> >> > moving to Hadoop 2.0, when some of the stuff might break for 1.0.
> >> >
> >> > I feel it would be good to get this JIRA fix in for 0.11, as 0.20.2
> users
> >> > might be caught unaware. Of course, I must admit there is selfish
> >> interest
> >> > here and it's probably easier for us to have a workaround on Pig
> rather
> >> > than upgrade hadoop in all our production DCs.
> >> >
> >> > -Prashant
> >> >
> >> >
> >> > On Tue, Feb 19, 2013 at 9:54 AM, Russell Jurney <
> >> russell.jurney@gmail.com
> >> > >wrote:
> >> >
> >> > > I think someone should step up and fix the easy ones, if possible.
> >> > >
> >> > >
> >> > > On Tue, Feb 19, 2013 at 9:51 AM, Bill Graham <bi...@gmail.com>
> >> > wrote:
> >> > >
> >> > > > Thanks Kai for reporting these.
> >> > > >
> >> > > > What do people think about the severity of these issues w.r.t. Pig
> >> 11?
> >> > I
> >> > > > see a few possible options:
> >> > > >
> >> > > > 1. We include some or all of these patches in a new Pig 11 rc.
> We'd
> >> > want
> >> > > to
> >> > > > make sure that they don't destabilize the current branch. This
> >> approach
> >> > > > makes sense if we think Pig 11 wouldn't be a good release without
> one
> >> > or
> >> > > > more of these included.
> >> > > >
> >> > > > 2. We continue with the Pig 11 release without these, but then
> >> include
> >> > > one
> >> > > > or more in a 0.11.1 release.
> >> > > >
> >> > > > 3. We continue with the Pig 11 release without these, but then
> >> include
> >> > > them
> >> > > > in a 0.12 release.
> >> > > >
> >> > > > Jon has a patch for the MAP issue
> >> > > > (PIG-3144<https://issues.apache.org/jira/browse/PIG-3144>)
> >> > > > ready, which seems like the most pressing of the three to me.
> >> > > >
> >> > > > thanks,
> >> > > > Bill
> >> > > >
> >> > > > On Mon, Feb 18, 2013 at 2:27 AM, Kai Londenberg <
> >> > > > kai.londenberg@googlemail.com> wrote:
> >> > > >
> >> > > > > Hi,
> >> > > > >
> >> > > > > I just subscribed to the dev mailing list in order to give you
> some
> >> > > > > feedback on pig 0.11 candidate 2.
> >> > > > >
> >> > > > > The following three issues are currently present in 0.11
> candidate
> >> 2:
> >> > > > >
> >> > > > > https://issues.apache.org/jira/browse/PIG-3144 - 'Erroneous map
> >> > entry
> >> > > > > alias resolution leading to "Duplicate schema alias" errors'
> >> > > > > https://issues.apache.org/jira/browse/PIG-3194 - Changes to
> >> > > > > ObjectSerializer.java break compatibility with Hadoop 0.20.2
> >> > > > > https://issues.apache.org/jira/browse/PIG-3195 - Race
> Condition in
> >> > > > > PhysicalOperator leads to ExecException "Error while trying to
> get
> >> > > > > next result in POStream"
> >> > > > >
> >> > > > > The last two of these are easily solveable (see the tickets for
> >> > > > > details on that). The first one is a bit trickier I think, but
> at
> >> > > > > least there is a workaround for it (pass Map fields through an
> UDF)
> >> > > > >
> >> > > > > In my personal opinion, each of these problems is pretty severe,
> >> but
> >> > > > > opinions about the importance of the MAP Datatype and STREAM
> >> > Operator,
> >> > > > > as well as Hadoop 0.20.2 compatibility might differ.
> >> > > > >
> >> > > > > so far ..
> >> > > > >
> >> > > > > Kai Londenberg
> >> > > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > > --
> >> > > > *Note that I'm no longer using my Yahoo! email address. Please
> email
> >> me
> >> > > at
> >> > > > billgraham@gmail.com going forward.*
> >> > > >
> >> > >
> >> > >
> >> > >
> >> > > --
> >> > > Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
> >> > > datasyndrome.com
> >> > >
> >> >
> >>
>



-- 
*Note that I'm no longer using my Yahoo! email address. Please email me at
billgraham@gmail.com going forward.*

Re: pig 0.11 candidate 2 feedback: Several problems

Posted by Olga Natkovich <on...@yahoo.com>.
I agree that supporting as much as we can is a good goal. The issue is who is going to be testing against all these versions? We found the issues under discussion because of a customer report, not because we consistently test against all versions. Perhaps when we decide which versions to support for next release we need also to agree who is going to be testing and maintaining compatibility with a particular version. 

For instance since Hadoop 23 compatibility is important for us at Yahoo we have been maintaining compatibility with this version for 0.9, 0.10 and will do the same for 0.11 and going forward. I think we would need others to step in and claim the versions of their interest.

Olga


________________________________
 From: Kai Londenberg <ka...@googlemail.com>
To: dev@pig.apache.org 
Sent: Wednesday, February 20, 2013 1:51 AM
Subject: Re: pig 0.11 candidate 2 feedback: Several problems
 
Hi,

I stronly agree with Jonathan here. If there are good reasons why you
can't support an older version of Hadoop any more, that's one thing.
But having to change 2 lines of code doesn't really qualify as such in
my point of view ;)

At least for me, pig support for 0.20.2 is essential - without it, I
can't use it. If it doesn't support it, I'll have to branch pig and
hack it myself, or stop using it.

I guess, there are a lot of people still running 0.20.2 Clusters. If
you really have lots of data stored on HDFS and a continuously busy
cluster, an upgrade is nothing you do "just because".


2013/2/20 Jonathan Coveney <jc...@gmail.com>:
> I agree that we shouldn't have to support old versions forever. That said,
> I also don't think we should be too blase about supporting older versions
> where it is not odious to do so. We have a lot of competition in the
> language space and the broader the versions we can support, the better
> (assuming it isn't too odious to do so). In this case, I don't think it
> should be too hard to change ObjectSerializer so that the commons-codec
> code used is compatible with both versions...we could just in-line some of
> the Base64 code, and comment accordingly.
>
> That said, we also should be clear about what versions we support, but 6-12
> months seems short. The upgrade cycles on Hadoop are really, really long.
>
>
> 2013/2/20 Prashant Kommireddi <pr...@gmail.com>
>
>> Agreed, that makes sense. Probably supporting older hadoop version for a 1
>> or 2 pig releases before moving to a newer/stable version?
>>
>> Having said that, should we use 0.11 period to communicate the same to the
>> community and start moving on 0.12 onwards? I know we are way past 6-12
>> months (1-2 release) time frame with 0.20.2, but we also need to make sure
>> users are aware and plan accordingly.
>>
>> I'd also be interested to hear how other projects (Hive, Oozie) are
>> handling this.
>>
>> -Prashant
>>
>> On Tue, Feb 19, 2013 at 3:22 PM, Olga Natkovich <onatkovich@yahoo.com
>> >wrote:
>>
>> > It seems that for each Pig release we need to agree and clearly state
>> > which Hadoop versions it will support. I guess the main question is how
>> we
>> > decide on this. Perhaps we should say that Pig no longer supports older
>> > Hadoop versions once the newer one is out for at least 6-12 month to make
>> > sure it is stable. I don't think we can support old versions
>> indefinitely.
>> > It is in everybody's interest to keep moving forward.
>> >
>> > Olga
>> >
>> >
>> > ________________________________
>> >  From: Prashant Kommireddi <pr...@gmail.com>
>> > To: dev@pig.apache.org
>> > Sent: Tuesday, February 19, 2013 10:57 AM
>> > Subject: Re: pig 0.11 candidate 2 feedback: Several problems
>> >
>> > What do you guys feel about the JIRA to do with 0.20.2 compatibility
>> > (PIG-3194)? I am interested in discussing the strategy around backward
>> > compatibility as this is something that would haunt us each time we move
>> to
>> > the next hadoop version. For eg, we might be in a similar situation while
>> > moving to Hadoop 2.0, when some of the stuff might break for 1.0.
>> >
>> > I feel it would be good to get this JIRA fix in for 0.11, as 0.20.2 users
>> > might be caught unaware. Of course, I must admit there is selfish
>> interest
>> > here and it's probably easier for us to have a workaround on Pig rather
>> > than upgrade hadoop in all our production DCs.
>> >
>> > -Prashant
>> >
>> >
>> > On Tue, Feb 19, 2013 at 9:54 AM, Russell Jurney <
>> russell.jurney@gmail.com
>> > >wrote:
>> >
>> > > I think someone should step up and fix the easy ones, if possible.
>> > >
>> > >
>> > > On Tue, Feb 19, 2013 at 9:51 AM, Bill Graham <bi...@gmail.com>
>> > wrote:
>> > >
>> > > > Thanks Kai for reporting these.
>> > > >
>> > > > What do people think about the severity of these issues w.r.t. Pig
>> 11?
>> > I
>> > > > see a few possible options:
>> > > >
>> > > > 1. We include some or all of these patches in a new Pig 11 rc. We'd
>> > want
>> > > to
>> > > > make sure that they don't destabilize the current branch. This
>> approach
>> > > > makes sense if we think Pig 11 wouldn't be a good release without one
>> > or
>> > > > more of these included.
>> > > >
>> > > > 2. We continue with the Pig 11 release without these, but then
>> include
>> > > one
>> > > > or more in a 0.11.1 release.
>> > > >
>> > > > 3. We continue with the Pig 11 release without these, but then
>> include
>> > > them
>> > > > in a 0.12 release.
>> > > >
>> > > > Jon has a patch for the MAP issue
>> > > > (PIG-3144<https://issues.apache.org/jira/browse/PIG-3144>)
>> > > > ready, which seems like the most pressing of the three to me.
>> > > >
>> > > > thanks,
>> > > > Bill
>> > > >
>> > > > On Mon, Feb 18, 2013 at 2:27 AM, Kai Londenberg <
>> > > > kai.londenberg@googlemail.com> wrote:
>> > > >
>> > > > > Hi,
>> > > > >
>> > > > > I just subscribed to the dev mailing list in order to give you some
>> > > > > feedback on pig 0.11 candidate 2.
>> > > > >
>> > > > > The following three issues are currently present in 0.11 candidate
>> 2:
>> > > > >
>> > > > > https://issues.apache.org/jira/browse/PIG-3144 - 'Erroneous map
>> > entry
>> > > > > alias resolution leading to "Duplicate schema alias" errors'
>> > > > > https://issues.apache.org/jira/browse/PIG-3194 - Changes to
>> > > > > ObjectSerializer.java break compatibility with Hadoop 0.20.2
>> > > > > https://issues.apache.org/jira/browse/PIG-3195 - Race Condition in
>> > > > > PhysicalOperator leads to ExecException "Error while trying to get
>> > > > > next result in POStream"
>> > > > >
>> > > > > The last two of these are easily solveable (see the tickets for
>> > > > > details on that). The first one is a bit trickier I think, but at
>> > > > > least there is a workaround for it (pass Map fields through an UDF)
>> > > > >
>> > > > > In my personal opinion, each of these problems is pretty severe,
>> but
>> > > > > opinions about the importance of the MAP Datatype and STREAM
>> > Operator,
>> > > > > as well as Hadoop 0.20.2 compatibility might differ.
>> > > > >
>> > > > > so far ..
>> > > > >
>> > > > > Kai Londenberg
>> > > > >
>> > > >
>> > > >
>> > > >
>> > > > --
>> > > > *Note that I'm no longer using my Yahoo! email address. Please email
>> me
>> > > at
>> > > > billgraham@gmail.com going forward.*
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > > Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
>> > > datasyndrome.com
>> > >
>> >
>>

Re: pig 0.11 candidate 2 feedback: Several problems

Posted by Alan Gates <ga...@hortonworks.com>.
Time is not the right metric to determine how long we support a particular release of Hadoop.  Some versions of Hadoop are widely adopted and have a long life (0.20) and some have very low adoption and short life spans (0.19, 0.21).  We're an open source project driven by volunteers, so it makes sense that we support versions that we have people willing to test and verify releases against.  Those of us at Hortonworks test Pig against Hadoop 1.x and 2.x.  I assume Cheolsoo tests it against CDH 3 and 4.  The Yahoo members test it against 0.23.  I'm not sure what the Twitter guys are running but I'm sure they test it against their version(s) of Hadoop.  So if someone is willing to run the tests and submit the patches for 0.20.2, then let's support it.  I agree that at some point the additional code complexity required to support older versions is no longer justified even when people want it.  But as pointed out we're not there yet with 0.20.x.

Alan.

On Feb 20, 2013, at 1:51 AM, Kai Londenberg wrote:

> Hi,
> 
> I stronly agree with Jonathan here. If there are good reasons why you
> can't support an older version of Hadoop any more, that's one thing.
> But having to change 2 lines of code doesn't really qualify as such in
> my point of view ;)
> 
> At least for me, pig support for 0.20.2 is essential - without it, I
> can't use it. If it doesn't support it, I'll have to branch pig and
> hack it myself, or stop using it.
> 
> I guess, there are a lot of people still running 0.20.2 Clusters. If
> you really have lots of data stored on HDFS and a continuously busy
> cluster, an upgrade is nothing you do "just because".
> 
> 
> 2013/2/20 Jonathan Coveney <jc...@gmail.com>:
>> I agree that we shouldn't have to support old versions forever. That said,
>> I also don't think we should be too blase about supporting older versions
>> where it is not odious to do so. We have a lot of competition in the
>> language space and the broader the versions we can support, the better
>> (assuming it isn't too odious to do so). In this case, I don't think it
>> should be too hard to change ObjectSerializer so that the commons-codec
>> code used is compatible with both versions...we could just in-line some of
>> the Base64 code, and comment accordingly.
>> 
>> That said, we also should be clear about what versions we support, but 6-12
>> months seems short. The upgrade cycles on Hadoop are really, really long.
>> 
>> 
>> 2013/2/20 Prashant Kommireddi <pr...@gmail.com>
>> 
>>> Agreed, that makes sense. Probably supporting older hadoop version for a 1
>>> or 2 pig releases before moving to a newer/stable version?
>>> 
>>> Having said that, should we use 0.11 period to communicate the same to the
>>> community and start moving on 0.12 onwards? I know we are way past 6-12
>>> months (1-2 release) time frame with 0.20.2, but we also need to make sure
>>> users are aware and plan accordingly.
>>> 
>>> I'd also be interested to hear how other projects (Hive, Oozie) are
>>> handling this.
>>> 
>>> -Prashant
>>> 
>>> On Tue, Feb 19, 2013 at 3:22 PM, Olga Natkovich <onatkovich@yahoo.com
>>>> wrote:
>>> 
>>>> It seems that for each Pig release we need to agree and clearly state
>>>> which Hadoop versions it will support. I guess the main question is how
>>> we
>>>> decide on this. Perhaps we should say that Pig no longer supports older
>>>> Hadoop versions once the newer one is out for at least 6-12 month to make
>>>> sure it is stable. I don't think we can support old versions
>>> indefinitely.
>>>> It is in everybody's interest to keep moving forward.
>>>> 
>>>> Olga
>>>> 
>>>> 
>>>> ________________________________
>>>> From: Prashant Kommireddi <pr...@gmail.com>
>>>> To: dev@pig.apache.org
>>>> Sent: Tuesday, February 19, 2013 10:57 AM
>>>> Subject: Re: pig 0.11 candidate 2 feedback: Several problems
>>>> 
>>>> What do you guys feel about the JIRA to do with 0.20.2 compatibility
>>>> (PIG-3194)? I am interested in discussing the strategy around backward
>>>> compatibility as this is something that would haunt us each time we move
>>> to
>>>> the next hadoop version. For eg, we might be in a similar situation while
>>>> moving to Hadoop 2.0, when some of the stuff might break for 1.0.
>>>> 
>>>> I feel it would be good to get this JIRA fix in for 0.11, as 0.20.2 users
>>>> might be caught unaware. Of course, I must admit there is selfish
>>> interest
>>>> here and it's probably easier for us to have a workaround on Pig rather
>>>> than upgrade hadoop in all our production DCs.
>>>> 
>>>> -Prashant
>>>> 
>>>> 
>>>> On Tue, Feb 19, 2013 at 9:54 AM, Russell Jurney <
>>> russell.jurney@gmail.com
>>>>> wrote:
>>>> 
>>>>> I think someone should step up and fix the easy ones, if possible.
>>>>> 
>>>>> 
>>>>> On Tue, Feb 19, 2013 at 9:51 AM, Bill Graham <bi...@gmail.com>
>>>> wrote:
>>>>> 
>>>>>> Thanks Kai for reporting these.
>>>>>> 
>>>>>> What do people think about the severity of these issues w.r.t. Pig
>>> 11?
>>>> I
>>>>>> see a few possible options:
>>>>>> 
>>>>>> 1. We include some or all of these patches in a new Pig 11 rc. We'd
>>>> want
>>>>> to
>>>>>> make sure that they don't destabilize the current branch. This
>>> approach
>>>>>> makes sense if we think Pig 11 wouldn't be a good release without one
>>>> or
>>>>>> more of these included.
>>>>>> 
>>>>>> 2. We continue with the Pig 11 release without these, but then
>>> include
>>>>> one
>>>>>> or more in a 0.11.1 release.
>>>>>> 
>>>>>> 3. We continue with the Pig 11 release without these, but then
>>> include
>>>>> them
>>>>>> in a 0.12 release.
>>>>>> 
>>>>>> Jon has a patch for the MAP issue
>>>>>> (PIG-3144<https://issues.apache.org/jira/browse/PIG-3144>)
>>>>>> ready, which seems like the most pressing of the three to me.
>>>>>> 
>>>>>> thanks,
>>>>>> Bill
>>>>>> 
>>>>>> On Mon, Feb 18, 2013 at 2:27 AM, Kai Londenberg <
>>>>>> kai.londenberg@googlemail.com> wrote:
>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> I just subscribed to the dev mailing list in order to give you some
>>>>>>> feedback on pig 0.11 candidate 2.
>>>>>>> 
>>>>>>> The following three issues are currently present in 0.11 candidate
>>> 2:
>>>>>>> 
>>>>>>> https://issues.apache.org/jira/browse/PIG-3144 - 'Erroneous map
>>>> entry
>>>>>>> alias resolution leading to "Duplicate schema alias" errors'
>>>>>>> https://issues.apache.org/jira/browse/PIG-3194 - Changes to
>>>>>>> ObjectSerializer.java break compatibility with Hadoop 0.20.2
>>>>>>> https://issues.apache.org/jira/browse/PIG-3195 - Race Condition in
>>>>>>> PhysicalOperator leads to ExecException "Error while trying to get
>>>>>>> next result in POStream"
>>>>>>> 
>>>>>>> The last two of these are easily solveable (see the tickets for
>>>>>>> details on that). The first one is a bit trickier I think, but at
>>>>>>> least there is a workaround for it (pass Map fields through an UDF)
>>>>>>> 
>>>>>>> In my personal opinion, each of these problems is pretty severe,
>>> but
>>>>>>> opinions about the importance of the MAP Datatype and STREAM
>>>> Operator,
>>>>>>> as well as Hadoop 0.20.2 compatibility might differ.
>>>>>>> 
>>>>>>> so far ..
>>>>>>> 
>>>>>>> Kai Londenberg
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> *Note that I'm no longer using my Yahoo! email address. Please email
>>> me
>>>>> at
>>>>>> billgraham@gmail.com going forward.*
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
>>>>> datasyndrome.com
>>>>> 
>>>> 
>>> 


Re: pig 0.11 candidate 2 feedback: Several problems

Posted by Kai Londenberg <ka...@googlemail.com>.
Hi,

I stronly agree with Jonathan here. If there are good reasons why you
can't support an older version of Hadoop any more, that's one thing.
But having to change 2 lines of code doesn't really qualify as such in
my point of view ;)

At least for me, pig support for 0.20.2 is essential - without it, I
can't use it. If it doesn't support it, I'll have to branch pig and
hack it myself, or stop using it.

I guess, there are a lot of people still running 0.20.2 Clusters. If
you really have lots of data stored on HDFS and a continuously busy
cluster, an upgrade is nothing you do "just because".


2013/2/20 Jonathan Coveney <jc...@gmail.com>:
> I agree that we shouldn't have to support old versions forever. That said,
> I also don't think we should be too blase about supporting older versions
> where it is not odious to do so. We have a lot of competition in the
> language space and the broader the versions we can support, the better
> (assuming it isn't too odious to do so). In this case, I don't think it
> should be too hard to change ObjectSerializer so that the commons-codec
> code used is compatible with both versions...we could just in-line some of
> the Base64 code, and comment accordingly.
>
> That said, we also should be clear about what versions we support, but 6-12
> months seems short. The upgrade cycles on Hadoop are really, really long.
>
>
> 2013/2/20 Prashant Kommireddi <pr...@gmail.com>
>
>> Agreed, that makes sense. Probably supporting older hadoop version for a 1
>> or 2 pig releases before moving to a newer/stable version?
>>
>> Having said that, should we use 0.11 period to communicate the same to the
>> community and start moving on 0.12 onwards? I know we are way past 6-12
>> months (1-2 release) time frame with 0.20.2, but we also need to make sure
>> users are aware and plan accordingly.
>>
>> I'd also be interested to hear how other projects (Hive, Oozie) are
>> handling this.
>>
>> -Prashant
>>
>> On Tue, Feb 19, 2013 at 3:22 PM, Olga Natkovich <onatkovich@yahoo.com
>> >wrote:
>>
>> > It seems that for each Pig release we need to agree and clearly state
>> > which Hadoop versions it will support. I guess the main question is how
>> we
>> > decide on this. Perhaps we should say that Pig no longer supports older
>> > Hadoop versions once the newer one is out for at least 6-12 month to make
>> > sure it is stable. I don't think we can support old versions
>> indefinitely.
>> > It is in everybody's interest to keep moving forward.
>> >
>> > Olga
>> >
>> >
>> > ________________________________
>> >  From: Prashant Kommireddi <pr...@gmail.com>
>> > To: dev@pig.apache.org
>> > Sent: Tuesday, February 19, 2013 10:57 AM
>> > Subject: Re: pig 0.11 candidate 2 feedback: Several problems
>> >
>> > What do you guys feel about the JIRA to do with 0.20.2 compatibility
>> > (PIG-3194)? I am interested in discussing the strategy around backward
>> > compatibility as this is something that would haunt us each time we move
>> to
>> > the next hadoop version. For eg, we might be in a similar situation while
>> > moving to Hadoop 2.0, when some of the stuff might break for 1.0.
>> >
>> > I feel it would be good to get this JIRA fix in for 0.11, as 0.20.2 users
>> > might be caught unaware. Of course, I must admit there is selfish
>> interest
>> > here and it's probably easier for us to have a workaround on Pig rather
>> > than upgrade hadoop in all our production DCs.
>> >
>> > -Prashant
>> >
>> >
>> > On Tue, Feb 19, 2013 at 9:54 AM, Russell Jurney <
>> russell.jurney@gmail.com
>> > >wrote:
>> >
>> > > I think someone should step up and fix the easy ones, if possible.
>> > >
>> > >
>> > > On Tue, Feb 19, 2013 at 9:51 AM, Bill Graham <bi...@gmail.com>
>> > wrote:
>> > >
>> > > > Thanks Kai for reporting these.
>> > > >
>> > > > What do people think about the severity of these issues w.r.t. Pig
>> 11?
>> > I
>> > > > see a few possible options:
>> > > >
>> > > > 1. We include some or all of these patches in a new Pig 11 rc. We'd
>> > want
>> > > to
>> > > > make sure that they don't destabilize the current branch. This
>> approach
>> > > > makes sense if we think Pig 11 wouldn't be a good release without one
>> > or
>> > > > more of these included.
>> > > >
>> > > > 2. We continue with the Pig 11 release without these, but then
>> include
>> > > one
>> > > > or more in a 0.11.1 release.
>> > > >
>> > > > 3. We continue with the Pig 11 release without these, but then
>> include
>> > > them
>> > > > in a 0.12 release.
>> > > >
>> > > > Jon has a patch for the MAP issue
>> > > > (PIG-3144<https://issues.apache.org/jira/browse/PIG-3144>)
>> > > > ready, which seems like the most pressing of the three to me.
>> > > >
>> > > > thanks,
>> > > > Bill
>> > > >
>> > > > On Mon, Feb 18, 2013 at 2:27 AM, Kai Londenberg <
>> > > > kai.londenberg@googlemail.com> wrote:
>> > > >
>> > > > > Hi,
>> > > > >
>> > > > > I just subscribed to the dev mailing list in order to give you some
>> > > > > feedback on pig 0.11 candidate 2.
>> > > > >
>> > > > > The following three issues are currently present in 0.11 candidate
>> 2:
>> > > > >
>> > > > > https://issues.apache.org/jira/browse/PIG-3144 - 'Erroneous map
>> > entry
>> > > > > alias resolution leading to "Duplicate schema alias" errors'
>> > > > > https://issues.apache.org/jira/browse/PIG-3194 - Changes to
>> > > > > ObjectSerializer.java break compatibility with Hadoop 0.20.2
>> > > > > https://issues.apache.org/jira/browse/PIG-3195 - Race Condition in
>> > > > > PhysicalOperator leads to ExecException "Error while trying to get
>> > > > > next result in POStream"
>> > > > >
>> > > > > The last two of these are easily solveable (see the tickets for
>> > > > > details on that). The first one is a bit trickier I think, but at
>> > > > > least there is a workaround for it (pass Map fields through an UDF)
>> > > > >
>> > > > > In my personal opinion, each of these problems is pretty severe,
>> but
>> > > > > opinions about the importance of the MAP Datatype and STREAM
>> > Operator,
>> > > > > as well as Hadoop 0.20.2 compatibility might differ.
>> > > > >
>> > > > > so far ..
>> > > > >
>> > > > > Kai Londenberg
>> > > > >
>> > > >
>> > > >
>> > > >
>> > > > --
>> > > > *Note that I'm no longer using my Yahoo! email address. Please email
>> me
>> > > at
>> > > > billgraham@gmail.com going forward.*
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > > Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
>> > > datasyndrome.com
>> > >
>> >
>>

Re: pig 0.11 candidate 2 feedback: Several problems

Posted by Jonathan Coveney <jc...@gmail.com>.
I agree that we shouldn't have to support old versions forever. That said,
I also don't think we should be too blase about supporting older versions
where it is not odious to do so. We have a lot of competition in the
language space and the broader the versions we can support, the better
(assuming it isn't too odious to do so). In this case, I don't think it
should be too hard to change ObjectSerializer so that the commons-codec
code used is compatible with both versions...we could just in-line some of
the Base64 code, and comment accordingly.

That said, we also should be clear about what versions we support, but 6-12
months seems short. The upgrade cycles on Hadoop are really, really long.


2013/2/20 Prashant Kommireddi <pr...@gmail.com>

> Agreed, that makes sense. Probably supporting older hadoop version for a 1
> or 2 pig releases before moving to a newer/stable version?
>
> Having said that, should we use 0.11 period to communicate the same to the
> community and start moving on 0.12 onwards? I know we are way past 6-12
> months (1-2 release) time frame with 0.20.2, but we also need to make sure
> users are aware and plan accordingly.
>
> I'd also be interested to hear how other projects (Hive, Oozie) are
> handling this.
>
> -Prashant
>
> On Tue, Feb 19, 2013 at 3:22 PM, Olga Natkovich <onatkovich@yahoo.com
> >wrote:
>
> > It seems that for each Pig release we need to agree and clearly state
> > which Hadoop versions it will support. I guess the main question is how
> we
> > decide on this. Perhaps we should say that Pig no longer supports older
> > Hadoop versions once the newer one is out for at least 6-12 month to make
> > sure it is stable. I don't think we can support old versions
> indefinitely.
> > It is in everybody's interest to keep moving forward.
> >
> > Olga
> >
> >
> > ________________________________
> >  From: Prashant Kommireddi <pr...@gmail.com>
> > To: dev@pig.apache.org
> > Sent: Tuesday, February 19, 2013 10:57 AM
> > Subject: Re: pig 0.11 candidate 2 feedback: Several problems
> >
> > What do you guys feel about the JIRA to do with 0.20.2 compatibility
> > (PIG-3194)? I am interested in discussing the strategy around backward
> > compatibility as this is something that would haunt us each time we move
> to
> > the next hadoop version. For eg, we might be in a similar situation while
> > moving to Hadoop 2.0, when some of the stuff might break for 1.0.
> >
> > I feel it would be good to get this JIRA fix in for 0.11, as 0.20.2 users
> > might be caught unaware. Of course, I must admit there is selfish
> interest
> > here and it's probably easier for us to have a workaround on Pig rather
> > than upgrade hadoop in all our production DCs.
> >
> > -Prashant
> >
> >
> > On Tue, Feb 19, 2013 at 9:54 AM, Russell Jurney <
> russell.jurney@gmail.com
> > >wrote:
> >
> > > I think someone should step up and fix the easy ones, if possible.
> > >
> > >
> > > On Tue, Feb 19, 2013 at 9:51 AM, Bill Graham <bi...@gmail.com>
> > wrote:
> > >
> > > > Thanks Kai for reporting these.
> > > >
> > > > What do people think about the severity of these issues w.r.t. Pig
> 11?
> > I
> > > > see a few possible options:
> > > >
> > > > 1. We include some or all of these patches in a new Pig 11 rc. We'd
> > want
> > > to
> > > > make sure that they don't destabilize the current branch. This
> approach
> > > > makes sense if we think Pig 11 wouldn't be a good release without one
> > or
> > > > more of these included.
> > > >
> > > > 2. We continue with the Pig 11 release without these, but then
> include
> > > one
> > > > or more in a 0.11.1 release.
> > > >
> > > > 3. We continue with the Pig 11 release without these, but then
> include
> > > them
> > > > in a 0.12 release.
> > > >
> > > > Jon has a patch for the MAP issue
> > > > (PIG-3144<https://issues.apache.org/jira/browse/PIG-3144>)
> > > > ready, which seems like the most pressing of the three to me.
> > > >
> > > > thanks,
> > > > Bill
> > > >
> > > > On Mon, Feb 18, 2013 at 2:27 AM, Kai Londenberg <
> > > > kai.londenberg@googlemail.com> wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I just subscribed to the dev mailing list in order to give you some
> > > > > feedback on pig 0.11 candidate 2.
> > > > >
> > > > > The following three issues are currently present in 0.11 candidate
> 2:
> > > > >
> > > > > https://issues.apache.org/jira/browse/PIG-3144 - 'Erroneous map
> > entry
> > > > > alias resolution leading to "Duplicate schema alias" errors'
> > > > > https://issues.apache.org/jira/browse/PIG-3194 - Changes to
> > > > > ObjectSerializer.java break compatibility with Hadoop 0.20.2
> > > > > https://issues.apache.org/jira/browse/PIG-3195 - Race Condition in
> > > > > PhysicalOperator leads to ExecException "Error while trying to get
> > > > > next result in POStream"
> > > > >
> > > > > The last two of these are easily solveable (see the tickets for
> > > > > details on that). The first one is a bit trickier I think, but at
> > > > > least there is a workaround for it (pass Map fields through an UDF)
> > > > >
> > > > > In my personal opinion, each of these problems is pretty severe,
> but
> > > > > opinions about the importance of the MAP Datatype and STREAM
> > Operator,
> > > > > as well as Hadoop 0.20.2 compatibility might differ.
> > > > >
> > > > > so far ..
> > > > >
> > > > > Kai Londenberg
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > *Note that I'm no longer using my Yahoo! email address. Please email
> me
> > > at
> > > > billgraham@gmail.com going forward.*
> > > >
> > >
> > >
> > >
> > > --
> > > Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
> > > datasyndrome.com
> > >
> >
>

Re: pig 0.11 candidate 2 feedback: Several problems

Posted by Prashant Kommireddi <pr...@gmail.com>.
Agreed, that makes sense. Probably supporting older hadoop version for a 1
or 2 pig releases before moving to a newer/stable version?

Having said that, should we use 0.11 period to communicate the same to the
community and start moving on 0.12 onwards? I know we are way past 6-12
months (1-2 release) time frame with 0.20.2, but we also need to make sure
users are aware and plan accordingly.

I'd also be interested to hear how other projects (Hive, Oozie) are
handling this.

-Prashant

On Tue, Feb 19, 2013 at 3:22 PM, Olga Natkovich <on...@yahoo.com>wrote:

> It seems that for each Pig release we need to agree and clearly state
> which Hadoop versions it will support. I guess the main question is how we
> decide on this. Perhaps we should say that Pig no longer supports older
> Hadoop versions once the newer one is out for at least 6-12 month to make
> sure it is stable. I don't think we can support old versions indefinitely.
> It is in everybody's interest to keep moving forward.
>
> Olga
>
>
> ________________________________
>  From: Prashant Kommireddi <pr...@gmail.com>
> To: dev@pig.apache.org
> Sent: Tuesday, February 19, 2013 10:57 AM
> Subject: Re: pig 0.11 candidate 2 feedback: Several problems
>
> What do you guys feel about the JIRA to do with 0.20.2 compatibility
> (PIG-3194)? I am interested in discussing the strategy around backward
> compatibility as this is something that would haunt us each time we move to
> the next hadoop version. For eg, we might be in a similar situation while
> moving to Hadoop 2.0, when some of the stuff might break for 1.0.
>
> I feel it would be good to get this JIRA fix in for 0.11, as 0.20.2 users
> might be caught unaware. Of course, I must admit there is selfish interest
> here and it's probably easier for us to have a workaround on Pig rather
> than upgrade hadoop in all our production DCs.
>
> -Prashant
>
>
> On Tue, Feb 19, 2013 at 9:54 AM, Russell Jurney <russell.jurney@gmail.com
> >wrote:
>
> > I think someone should step up and fix the easy ones, if possible.
> >
> >
> > On Tue, Feb 19, 2013 at 9:51 AM, Bill Graham <bi...@gmail.com>
> wrote:
> >
> > > Thanks Kai for reporting these.
> > >
> > > What do people think about the severity of these issues w.r.t. Pig 11?
> I
> > > see a few possible options:
> > >
> > > 1. We include some or all of these patches in a new Pig 11 rc. We'd
> want
> > to
> > > make sure that they don't destabilize the current branch. This approach
> > > makes sense if we think Pig 11 wouldn't be a good release without one
> or
> > > more of these included.
> > >
> > > 2. We continue with the Pig 11 release without these, but then include
> > one
> > > or more in a 0.11.1 release.
> > >
> > > 3. We continue with the Pig 11 release without these, but then include
> > them
> > > in a 0.12 release.
> > >
> > > Jon has a patch for the MAP issue
> > > (PIG-3144<https://issues.apache.org/jira/browse/PIG-3144>)
> > > ready, which seems like the most pressing of the three to me.
> > >
> > > thanks,
> > > Bill
> > >
> > > On Mon, Feb 18, 2013 at 2:27 AM, Kai Londenberg <
> > > kai.londenberg@googlemail.com> wrote:
> > >
> > > > Hi,
> > > >
> > > > I just subscribed to the dev mailing list in order to give you some
> > > > feedback on pig 0.11 candidate 2.
> > > >
> > > > The following three issues are currently present in 0.11 candidate 2:
> > > >
> > > > https://issues.apache.org/jira/browse/PIG-3144 - 'Erroneous map
> entry
> > > > alias resolution leading to "Duplicate schema alias" errors'
> > > > https://issues.apache.org/jira/browse/PIG-3194 - Changes to
> > > > ObjectSerializer.java break compatibility with Hadoop 0.20.2
> > > > https://issues.apache.org/jira/browse/PIG-3195 - Race Condition in
> > > > PhysicalOperator leads to ExecException "Error while trying to get
> > > > next result in POStream"
> > > >
> > > > The last two of these are easily solveable (see the tickets for
> > > > details on that). The first one is a bit trickier I think, but at
> > > > least there is a workaround for it (pass Map fields through an UDF)
> > > >
> > > > In my personal opinion, each of these problems is pretty severe, but
> > > > opinions about the importance of the MAP Datatype and STREAM
> Operator,
> > > > as well as Hadoop 0.20.2 compatibility might differ.
> > > >
> > > > so far ..
> > > >
> > > > Kai Londenberg
> > > >
> > >
> > >
> > >
> > > --
> > > *Note that I'm no longer using my Yahoo! email address. Please email me
> > at
> > > billgraham@gmail.com going forward.*
> > >
> >
> >
> >
> > --
> > Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
> > datasyndrome.com
> >
>

Re: pig 0.11 candidate 2 feedback: Several problems

Posted by Olga Natkovich <on...@yahoo.com>.
It seems that for each Pig release we need to agree and clearly state which Hadoop versions it will support. I guess the main question is how we decide on this. Perhaps we should say that Pig no longer supports older Hadoop versions once the newer one is out for at least 6-12 month to make sure it is stable. I don't think we can support old versions indefinitely. It is in everybody's interest to keep moving forward.

Olga


________________________________
 From: Prashant Kommireddi <pr...@gmail.com>
To: dev@pig.apache.org 
Sent: Tuesday, February 19, 2013 10:57 AM
Subject: Re: pig 0.11 candidate 2 feedback: Several problems
 
What do you guys feel about the JIRA to do with 0.20.2 compatibility
(PIG-3194)? I am interested in discussing the strategy around backward
compatibility as this is something that would haunt us each time we move to
the next hadoop version. For eg, we might be in a similar situation while
moving to Hadoop 2.0, when some of the stuff might break for 1.0.

I feel it would be good to get this JIRA fix in for 0.11, as 0.20.2 users
might be caught unaware. Of course, I must admit there is selfish interest
here and it's probably easier for us to have a workaround on Pig rather
than upgrade hadoop in all our production DCs.

-Prashant


On Tue, Feb 19, 2013 at 9:54 AM, Russell Jurney <ru...@gmail.com>wrote:

> I think someone should step up and fix the easy ones, if possible.
>
>
> On Tue, Feb 19, 2013 at 9:51 AM, Bill Graham <bi...@gmail.com> wrote:
>
> > Thanks Kai for reporting these.
> >
> > What do people think about the severity of these issues w.r.t. Pig 11? I
> > see a few possible options:
> >
> > 1. We include some or all of these patches in a new Pig 11 rc. We'd want
> to
> > make sure that they don't destabilize the current branch. This approach
> > makes sense if we think Pig 11 wouldn't be a good release without one or
> > more of these included.
> >
> > 2. We continue with the Pig 11 release without these, but then include
> one
> > or more in a 0.11.1 release.
> >
> > 3. We continue with the Pig 11 release without these, but then include
> them
> > in a 0.12 release.
> >
> > Jon has a patch for the MAP issue
> > (PIG-3144<https://issues.apache.org/jira/browse/PIG-3144>)
> > ready, which seems like the most pressing of the three to me.
> >
> > thanks,
> > Bill
> >
> > On Mon, Feb 18, 2013 at 2:27 AM, Kai Londenberg <
> > kai.londenberg@googlemail.com> wrote:
> >
> > > Hi,
> > >
> > > I just subscribed to the dev mailing list in order to give you some
> > > feedback on pig 0.11 candidate 2.
> > >
> > > The following three issues are currently present in 0.11 candidate 2:
> > >
> > > https://issues.apache.org/jira/browse/PIG-3144 - 'Erroneous map entry
> > > alias resolution leading to "Duplicate schema alias" errors'
> > > https://issues.apache.org/jira/browse/PIG-3194 - Changes to
> > > ObjectSerializer.java break compatibility with Hadoop 0.20.2
> > > https://issues.apache.org/jira/browse/PIG-3195 - Race Condition in
> > > PhysicalOperator leads to ExecException "Error while trying to get
> > > next result in POStream"
> > >
> > > The last two of these are easily solveable (see the tickets for
> > > details on that). The first one is a bit trickier I think, but at
> > > least there is a workaround for it (pass Map fields through an UDF)
> > >
> > > In my personal opinion, each of these problems is pretty severe, but
> > > opinions about the importance of the MAP Datatype and STREAM Operator,
> > > as well as Hadoop 0.20.2 compatibility might differ.
> > >
> > > so far ..
> > >
> > > Kai Londenberg
> > >
> >
> >
> >
> > --
> > *Note that I'm no longer using my Yahoo! email address. Please email me
> at
> > billgraham@gmail.com going forward.*
> >
>
>
>
> --
> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
> datasyndrome.com
>

Re: pig 0.11 candidate 2 feedback: Several problems

Posted by Prashant Kommireddi <pr...@gmail.com>.
What do you guys feel about the JIRA to do with 0.20.2 compatibility
(PIG-3194)? I am interested in discussing the strategy around backward
compatibility as this is something that would haunt us each time we move to
the next hadoop version. For eg, we might be in a similar situation while
moving to Hadoop 2.0, when some of the stuff might break for 1.0.

I feel it would be good to get this JIRA fix in for 0.11, as 0.20.2 users
might be caught unaware. Of course, I must admit there is selfish interest
here and it's probably easier for us to have a workaround on Pig rather
than upgrade hadoop in all our production DCs.

-Prashant


On Tue, Feb 19, 2013 at 9:54 AM, Russell Jurney <ru...@gmail.com>wrote:

> I think someone should step up and fix the easy ones, if possible.
>
>
> On Tue, Feb 19, 2013 at 9:51 AM, Bill Graham <bi...@gmail.com> wrote:
>
> > Thanks Kai for reporting these.
> >
> > What do people think about the severity of these issues w.r.t. Pig 11? I
> > see a few possible options:
> >
> > 1. We include some or all of these patches in a new Pig 11 rc. We'd want
> to
> > make sure that they don't destabilize the current branch. This approach
> > makes sense if we think Pig 11 wouldn't be a good release without one or
> > more of these included.
> >
> > 2. We continue with the Pig 11 release without these, but then include
> one
> > or more in a 0.11.1 release.
> >
> > 3. We continue with the Pig 11 release without these, but then include
> them
> > in a 0.12 release.
> >
> > Jon has a patch for the MAP issue
> > (PIG-3144<https://issues.apache.org/jira/browse/PIG-3144>)
> > ready, which seems like the most pressing of the three to me.
> >
> > thanks,
> > Bill
> >
> > On Mon, Feb 18, 2013 at 2:27 AM, Kai Londenberg <
> > kai.londenberg@googlemail.com> wrote:
> >
> > > Hi,
> > >
> > > I just subscribed to the dev mailing list in order to give you some
> > > feedback on pig 0.11 candidate 2.
> > >
> > > The following three issues are currently present in 0.11 candidate 2:
> > >
> > > https://issues.apache.org/jira/browse/PIG-3144 - 'Erroneous map entry
> > > alias resolution leading to "Duplicate schema alias" errors'
> > > https://issues.apache.org/jira/browse/PIG-3194 - Changes to
> > > ObjectSerializer.java break compatibility with Hadoop 0.20.2
> > > https://issues.apache.org/jira/browse/PIG-3195 - Race Condition in
> > > PhysicalOperator leads to ExecException "Error while trying to get
> > > next result in POStream"
> > >
> > > The last two of these are easily solveable (see the tickets for
> > > details on that). The first one is a bit trickier I think, but at
> > > least there is a workaround for it (pass Map fields through an UDF)
> > >
> > > In my personal opinion, each of these problems is pretty severe, but
> > > opinions about the importance of the MAP Datatype and STREAM Operator,
> > > as well as Hadoop 0.20.2 compatibility might differ.
> > >
> > > so far ..
> > >
> > > Kai Londenberg
> > >
> >
> >
> >
> > --
> > *Note that I'm no longer using my Yahoo! email address. Please email me
> at
> > billgraham@gmail.com going forward.*
> >
>
>
>
> --
> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
> datasyndrome.com
>

Re: pig 0.11 candidate 2 feedback: Several problems

Posted by Russell Jurney <ru...@gmail.com>.
I think someone should step up and fix the easy ones, if possible.


On Tue, Feb 19, 2013 at 9:51 AM, Bill Graham <bi...@gmail.com> wrote:

> Thanks Kai for reporting these.
>
> What do people think about the severity of these issues w.r.t. Pig 11? I
> see a few possible options:
>
> 1. We include some or all of these patches in a new Pig 11 rc. We'd want to
> make sure that they don't destabilize the current branch. This approach
> makes sense if we think Pig 11 wouldn't be a good release without one or
> more of these included.
>
> 2. We continue with the Pig 11 release without these, but then include one
> or more in a 0.11.1 release.
>
> 3. We continue with the Pig 11 release without these, but then include them
> in a 0.12 release.
>
> Jon has a patch for the MAP issue
> (PIG-3144<https://issues.apache.org/jira/browse/PIG-3144>)
> ready, which seems like the most pressing of the three to me.
>
> thanks,
> Bill
>
> On Mon, Feb 18, 2013 at 2:27 AM, Kai Londenberg <
> kai.londenberg@googlemail.com> wrote:
>
> > Hi,
> >
> > I just subscribed to the dev mailing list in order to give you some
> > feedback on pig 0.11 candidate 2.
> >
> > The following three issues are currently present in 0.11 candidate 2:
> >
> > https://issues.apache.org/jira/browse/PIG-3144 - 'Erroneous map entry
> > alias resolution leading to "Duplicate schema alias" errors'
> > https://issues.apache.org/jira/browse/PIG-3194 - Changes to
> > ObjectSerializer.java break compatibility with Hadoop 0.20.2
> > https://issues.apache.org/jira/browse/PIG-3195 - Race Condition in
> > PhysicalOperator leads to ExecException "Error while trying to get
> > next result in POStream"
> >
> > The last two of these are easily solveable (see the tickets for
> > details on that). The first one is a bit trickier I think, but at
> > least there is a workaround for it (pass Map fields through an UDF)
> >
> > In my personal opinion, each of these problems is pretty severe, but
> > opinions about the importance of the MAP Datatype and STREAM Operator,
> > as well as Hadoop 0.20.2 compatibility might differ.
> >
> > so far ..
> >
> > Kai Londenberg
> >
>
>
>
> --
> *Note that I'm no longer using my Yahoo! email address. Please email me at
> billgraham@gmail.com going forward.*
>



-- 
Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.com

Re: pig 0.11 candidate 2 feedback: Several problems

Posted by Bill Graham <bi...@gmail.com>.
Thanks Kai for reporting these.

What do people think about the severity of these issues w.r.t. Pig 11? I
see a few possible options:

1. We include some or all of these patches in a new Pig 11 rc. We'd want to
make sure that they don't destabilize the current branch. This approach
makes sense if we think Pig 11 wouldn't be a good release without one or
more of these included.

2. We continue with the Pig 11 release without these, but then include one
or more in a 0.11.1 release.

3. We continue with the Pig 11 release without these, but then include them
in a 0.12 release.

Jon has a patch for the MAP issue
(PIG-3144<https://issues.apache.org/jira/browse/PIG-3144>)
ready, which seems like the most pressing of the three to me.

thanks,
Bill

On Mon, Feb 18, 2013 at 2:27 AM, Kai Londenberg <
kai.londenberg@googlemail.com> wrote:

> Hi,
>
> I just subscribed to the dev mailing list in order to give you some
> feedback on pig 0.11 candidate 2.
>
> The following three issues are currently present in 0.11 candidate 2:
>
> https://issues.apache.org/jira/browse/PIG-3144 - 'Erroneous map entry
> alias resolution leading to "Duplicate schema alias" errors'
> https://issues.apache.org/jira/browse/PIG-3194 - Changes to
> ObjectSerializer.java break compatibility with Hadoop 0.20.2
> https://issues.apache.org/jira/browse/PIG-3195 - Race Condition in
> PhysicalOperator leads to ExecException "Error while trying to get
> next result in POStream"
>
> The last two of these are easily solveable (see the tickets for
> details on that). The first one is a bit trickier I think, but at
> least there is a workaround for it (pass Map fields through an UDF)
>
> In my personal opinion, each of these problems is pretty severe, but
> opinions about the importance of the MAP Datatype and STREAM Operator,
> as well as Hadoop 0.20.2 compatibility might differ.
>
> so far ..
>
> Kai Londenberg
>



-- 
*Note that I'm no longer using my Yahoo! email address. Please email me at
billgraham@gmail.com going forward.*