You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Russell Jurney <ru...@gmail.com> on 2012/03/02 02:19:40 UTC

mongo-hadoop Pig users must turn off speculative execution to avoid duplicate inserts

https://jira.mongodb.org/browse/HADOOP-26

-- 
Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.com

Re: mongo-hadoop Pig users must turn off speculative execution to avoid duplicate inserts

Posted by Russell Jurney <ru...@gmail.com>.
I think I'm going to add 'fire command' before/after, to set up indexes.

Russell Jurney
twitter.com/rjurney
russell.jurney@gmail.com
datasyndrome.com

On Mar 2, 2012, at 1:48 PM, Bill Graham <bi...@gmail.com> wrote:

> Ahh yes, a bug in the documentation perhaps for not pointing this out.
>
> On Fri, Mar 2, 2012 at 1:44 PM, Russell Jurney <ru...@gmail.com>wrote:
>
>> It is creating a new collection, that has no keys, and inserting dupes.
>>
>> The docs don't say you need to do this, thus the bug.
>>
>> Russell Jurney http://datasyndrome.com
>>
>> On Mar 2, 2012, at 1:00 PM, Jeremy Hanna <je...@gmail.com>
>> wrote:
>>
>>> Not sure what mongo's doing (generate ID or triggers or something) but
>> it should only be a problem of efficiency if the writes are idempotent.
>>>
>>> On Mar 2, 2012, at 3:39 AM, Jonathan Coveney wrote:
>>>
>>>> I agree with Bill. Speculative execution is a feature of Hadoop that
>>>> doesn't jive nicely with storing data into non-hadoop systems.
>>>>
>>>> 2012/3/1 Bill Graham <bi...@gmail.com>
>>>>
>>>>> I don't think this is a bug. This is something that always needs to be
>> done
>>>>> when writing to any DB.
>>>>>
>>>>> On Thu, Mar 1, 2012 at 5:19 PM, Russell Jurney <
>> russell.jurney@gmail.com
>>>>>> wrote:
>>>>>
>>>>>> https://jira.mongodb.org/browse/HADOOP-26
>>>>>>
>>>>>> --
>>>>>> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
>>>>>> datasyndrome.com
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *Note that I'm no longer using my Yahoo! email address. Please email
>> me at
>>>>> billgraham@gmail.com going forward.*
>>>>>
>>>
>>
>
>
>
> --
> *Note that I'm no longer using my Yahoo! email address. Please email me at
> billgraham@gmail.com going forward.*

Re: mongo-hadoop Pig users must turn off speculative execution to avoid duplicate inserts

Posted by Bill Graham <bi...@gmail.com>.
Ahh yes, a bug in the documentation perhaps for not pointing this out.

On Fri, Mar 2, 2012 at 1:44 PM, Russell Jurney <ru...@gmail.com>wrote:

> It is creating a new collection, that has no keys, and inserting dupes.
>
> The docs don't say you need to do this, thus the bug.
>
> Russell Jurney http://datasyndrome.com
>
> On Mar 2, 2012, at 1:00 PM, Jeremy Hanna <je...@gmail.com>
> wrote:
>
> > Not sure what mongo's doing (generate ID or triggers or something) but
> it should only be a problem of efficiency if the writes are idempotent.
> >
> > On Mar 2, 2012, at 3:39 AM, Jonathan Coveney wrote:
> >
> >> I agree with Bill. Speculative execution is a feature of Hadoop that
> >> doesn't jive nicely with storing data into non-hadoop systems.
> >>
> >> 2012/3/1 Bill Graham <bi...@gmail.com>
> >>
> >>> I don't think this is a bug. This is something that always needs to be
> done
> >>> when writing to any DB.
> >>>
> >>> On Thu, Mar 1, 2012 at 5:19 PM, Russell Jurney <
> russell.jurney@gmail.com
> >>>> wrote:
> >>>
> >>>> https://jira.mongodb.org/browse/HADOOP-26
> >>>>
> >>>> --
> >>>> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
> >>>> datasyndrome.com
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> *Note that I'm no longer using my Yahoo! email address. Please email
> me at
> >>> billgraham@gmail.com going forward.*
> >>>
> >
>



-- 
*Note that I'm no longer using my Yahoo! email address. Please email me at
billgraham@gmail.com going forward.*

Re: mongo-hadoop Pig users must turn off speculative execution to avoid duplicate inserts

Posted by Russell Jurney <ru...@gmail.com>.
It is creating a new collection, that has no keys, and inserting dupes.

The docs don't say you need to do this, thus the bug.

Russell Jurney http://datasyndrome.com

On Mar 2, 2012, at 1:00 PM, Jeremy Hanna <je...@gmail.com> wrote:

> Not sure what mongo's doing (generate ID or triggers or something) but it should only be a problem of efficiency if the writes are idempotent.
> 
> On Mar 2, 2012, at 3:39 AM, Jonathan Coveney wrote:
> 
>> I agree with Bill. Speculative execution is a feature of Hadoop that
>> doesn't jive nicely with storing data into non-hadoop systems.
>> 
>> 2012/3/1 Bill Graham <bi...@gmail.com>
>> 
>>> I don't think this is a bug. This is something that always needs to be done
>>> when writing to any DB.
>>> 
>>> On Thu, Mar 1, 2012 at 5:19 PM, Russell Jurney <russell.jurney@gmail.com
>>>> wrote:
>>> 
>>>> https://jira.mongodb.org/browse/HADOOP-26
>>>> 
>>>> --
>>>> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
>>>> datasyndrome.com
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> *Note that I'm no longer using my Yahoo! email address. Please email me at
>>> billgraham@gmail.com going forward.*
>>> 
> 

Re: mongo-hadoop Pig users must turn off speculative execution to avoid duplicate inserts

Posted by Jeremy Hanna <je...@gmail.com>.
Not sure what mongo's doing (generate ID or triggers or something) but it should only be a problem of efficiency if the writes are idempotent.

On Mar 2, 2012, at 3:39 AM, Jonathan Coveney wrote:

> I agree with Bill. Speculative execution is a feature of Hadoop that
> doesn't jive nicely with storing data into non-hadoop systems.
> 
> 2012/3/1 Bill Graham <bi...@gmail.com>
> 
>> I don't think this is a bug. This is something that always needs to be done
>> when writing to any DB.
>> 
>> On Thu, Mar 1, 2012 at 5:19 PM, Russell Jurney <russell.jurney@gmail.com
>>> wrote:
>> 
>>> https://jira.mongodb.org/browse/HADOOP-26
>>> 
>>> --
>>> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
>>> datasyndrome.com
>>> 
>> 
>> 
>> 
>> --
>> *Note that I'm no longer using my Yahoo! email address. Please email me at
>> billgraham@gmail.com going forward.*
>> 


Re: mongo-hadoop Pig users must turn off speculative execution to avoid duplicate inserts

Posted by Jonathan Coveney <jc...@gmail.com>.
I agree with Bill. Speculative execution is a feature of Hadoop that
doesn't jive nicely with storing data into non-hadoop systems.

2012/3/1 Bill Graham <bi...@gmail.com>

> I don't think this is a bug. This is something that always needs to be done
> when writing to any DB.
>
> On Thu, Mar 1, 2012 at 5:19 PM, Russell Jurney <russell.jurney@gmail.com
> >wrote:
>
> > https://jira.mongodb.org/browse/HADOOP-26
> >
> > --
> > Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
> > datasyndrome.com
> >
>
>
>
> --
> *Note that I'm no longer using my Yahoo! email address. Please email me at
> billgraham@gmail.com going forward.*
>

Re: mongo-hadoop Pig users must turn off speculative execution to avoid duplicate inserts

Posted by Bill Graham <bi...@gmail.com>.
I don't think this is a bug. This is something that always needs to be done
when writing to any DB.

On Thu, Mar 1, 2012 at 5:19 PM, Russell Jurney <ru...@gmail.com>wrote:

> https://jira.mongodb.org/browse/HADOOP-26
>
> --
> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
> datasyndrome.com
>



-- 
*Note that I'm no longer using my Yahoo! email address. Please email me at
billgraham@gmail.com going forward.*