You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@crunch.apache.org by Mārtiņš Kalvāns <ma...@gmail.com> on 2014/08/01 13:32:54 UTC

Re: Join.join(PTable, ?) return empty collection

Yes, I think at least documentation about know issue could help.
Thanks!


2014-07-31 17:09 GMT+02:00 Josh Wills <jw...@cloudera.com>:

> Understood. Anything I can do to help? Docfix, at least?
>
>
> On Thu, Jul 31, 2014 at 1:08 AM, Mārtiņš Kalvāns <
> martins.kalvans@gmail.com>
> wrote:
>
> > It is avoidable almost always, problem is that in our company Crunch user
> > base is growing and many of them are "not so technical" to fast and
> > effectively catch problems like this and find workarounds. :(
> >
> >
> > --
> > Mārtiņš
> >
> >
> > 2014-07-30 18:45 GMT+02:00 Josh Wills <jw...@cloudera.com>:
> >
> > > My hypothesis is that we re-use null in joins to indicate the absence
> of
> > a
> > > value, so if the value of an entry is null, we assume it's
> non-existent.
> > > I'm assuming there isn't an easy way to switch the Void out for a
> > non-null
> > > but ignored value?
> > >
> > > J
> > >
> > >
> > > On Wed, Jul 30, 2014 at 9:35 AM, Mārtiņš Kalvāns <
> > > martins.kalvans@gmail.com>
> > > wrote:
> > >
> > > > Hi.
> > > >
> > > > I stumbled on weird behaviour (bug?) when joining PTable<?, Void> on
> > left
> > > > side with any other PTable - resulting collection is empty.
> > > > Attached example code demonstrates unexpected behaviour.
> > > > Code in question is in org.apache.crunch.lib.join.InnerJoinFn line 59
> > > > where it checks for null reference on left dataset (same for other
> join
> > > fn
> > > > implementations).
> > > > Anyone can comment on this?
> > > >
> > > >
> > > > --
> > > > Mārtiņš Kalvāns
> > > >
> > >
> > >
> > >
> > > --
> > > Director of Data Science
> > > Cloudera <http://www.cloudera.com>
> > > Twitter: @josh_wills <http://twitter.com/josh_wills>
> > >
> >
>
>
>
> --
> Director of Data Science
> Cloudera <http://www.cloudera.com>
> Twitter: @josh_wills <http://twitter.com/josh_wills>
>

Re: Join.join(PTable, ?) return empty collection

Posted by Josh Wills <jo...@gmail.com>.
Posted a doc fix for this in CRUNCH-453, along with a few other updates to
the user guide.


On Fri, Aug 1, 2014 at 4:32 AM, Mārtiņš Kalvāns <ma...@gmail.com>
wrote:

> Yes, I think at least documentation about know issue could help.
> Thanks!
>
>
> 2014-07-31 17:09 GMT+02:00 Josh Wills <jw...@cloudera.com>:
>
> > Understood. Anything I can do to help? Docfix, at least?
> >
> >
> > On Thu, Jul 31, 2014 at 1:08 AM, Mārtiņš Kalvāns <
> > martins.kalvans@gmail.com>
> > wrote:
> >
> > > It is avoidable almost always, problem is that in our company Crunch
> user
> > > base is growing and many of them are "not so technical" to fast and
> > > effectively catch problems like this and find workarounds. :(
> > >
> > >
> > > --
> > > Mārtiņš
> > >
> > >
> > > 2014-07-30 18:45 GMT+02:00 Josh Wills <jw...@cloudera.com>:
> > >
> > > > My hypothesis is that we re-use null in joins to indicate the absence
> > of
> > > a
> > > > value, so if the value of an entry is null, we assume it's
> > non-existent.
> > > > I'm assuming there isn't an easy way to switch the Void out for a
> > > non-null
> > > > but ignored value?
> > > >
> > > > J
> > > >
> > > >
> > > > On Wed, Jul 30, 2014 at 9:35 AM, Mārtiņš Kalvāns <
> > > > martins.kalvans@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi.
> > > > >
> > > > > I stumbled on weird behaviour (bug?) when joining PTable<?, Void>
> on
> > > left
> > > > > side with any other PTable - resulting collection is empty.
> > > > > Attached example code demonstrates unexpected behaviour.
> > > > > Code in question is in org.apache.crunch.lib.join.InnerJoinFn line
> 59
> > > > > where it checks for null reference on left dataset (same for other
> > join
> > > > fn
> > > > > implementations).
> > > > > Anyone can comment on this?
> > > > >
> > > > >
> > > > > --
> > > > > Mārtiņš Kalvāns
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Director of Data Science
> > > > Cloudera <http://www.cloudera.com>
> > > > Twitter: @josh_wills <http://twitter.com/josh_wills>
> > > >
> > >
> >
> >
> >
> > --
> > Director of Data Science
> > Cloudera <http://www.cloudera.com>
> > Twitter: @josh_wills <http://twitter.com/josh_wills>
> >
>