You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Sathwik B P <sa...@apache.org> on 2013/08/08 08:53:24 UTC

Why LineRecordWriter.write(..) is synchronized

Hi,

LineRecordWriter.write(..) is synchronized. I did not find any other
RecordWriter implementations define the write as synchronized.
Any specific reason for this.

regards,
sathwik

Re: Why LineRecordWriter.write(..) is synchronized

Posted by Niels Basjes <Ni...@basjes.nl>.
I may be nitpicking here but if "perhaps the answer is no" then I conclude:
Perhaps the other implementations of RecordWriter are a race condition/file
corruption ready to occur.


On Thu, Aug 8, 2013 at 12:50 PM, Harsh J <ha...@cloudera.com> wrote:

> While we don't fork by default, we do provide a MultithreadedMapper
> implementation that would require such synchronization. But if you are
> asking is it necessary, then perhaps the answer is no.
> On Aug 8, 2013 3:43 PM, "Azuryy Yu" <az...@gmail.com> wrote:
>
>> its not hadoop forked threads, we may create a line record writer, then
>> call this writer concurrently.
>> On Aug 8, 2013 4:00 PM, "Sathwik B P" <sa...@gmail.com> wrote:
>>
>>> Hi,
>>> Thanks for your reply.
>>> May I know where does hadoop fork multiple threads to use a single
>>> RecordWriter.
>>>
>>> regards,
>>> sathwik
>>>
>>> On Thu, Aug 8, 2013 at 7:06 AM, Azuryy Yu <az...@gmail.com> wrote:
>>>
>>>> because we may use multi-threads to write a single file.
>>>> On Aug 8, 2013 2:54 PM, "Sathwik B P" <sa...@apache.org> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> LineRecordWriter.write(..) is synchronized. I did not find any other
>>>>> RecordWriter implementations define the write as synchronized.
>>>>> Any specific reason for this.
>>>>>
>>>>> regards,
>>>>> sathwik
>>>>>
>>>>
>>>


-- 
Best regards / Met vriendelijke groeten,

Niels Basjes

Re: Why LineRecordWriter.write(..) is synchronized

Posted by Niels Basjes <Ni...@basjes.nl>.
I expect the impact on the IO speed to be almost 0 because waiting for a
single disk seek is longer than many thousands of calls to a synchronized
method.

Niels
On Aug 11, 2013 3:00 PM, "Harsh J" <ha...@cloudera.com> wrote:

> Yes, I feel we could discuss this over a JIRA to remove it if it hurts
> perf. too much, but it would have to be a marked incompatible change,
> and we have to add a note about the lack of thread safety in the
> javadoc of base Mapper/Reducer classes.
>
> On Sun, Aug 11, 2013 at 1:26 PM, Sathwik B P <sa...@gmail.com> wrote:
> > Hi Harsh,
> >
> > Does it make any sense to keep the method in LRW still synchronized.
> Isn't
> > it creating unnecessary overhead for non multi threaded implementations.
> >
> > regards,
> > sathwik
> >
> >
> > On Fri, Aug 9, 2013 at 7:16 AM, Harsh J <ha...@cloudera.com> wrote:
> >>
> >> I suppose I should have been clearer. There's no problem out of box if
> >> people stick to the libraries we offer :)
> >>
> >> Yes the LRW was marked synchronized at some point over 8 years ago [1]
> >> in support for multi-threaded maps, but the framework has changed much
> >> since then. The MultithreadedMapper/etc. API we offer now
> >> automatically shields the devs away from having to think of output
> >> thread safety [2].
> >>
> >> I can imagine there can only be a problem if a user writes their own
> >> unsafe multi threaded task. I suppose we could document that in the
> >> Mapper/MapRunner and Reducer APIs.
> >>
> >> [1] - http://svn.apache.org/viewvc?view=revision&revision=171186 -
> >> Commit added a synchronized to the write call.
> >> [2] - MultiThreadedMapper/etc. synchronize over the collector -
> >>
> >>
> http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/map/MultithreadedMapper.java?view=markup
> >>
> >> On Thu, Aug 8, 2013 at 7:52 PM, Azuryy Yu <az...@gmail.com> wrote:
> >> > sequence writer is also synchronized, I dont think this is bad.
> >> >
> >> > if you call HDFS api to write concurrently, then its necessary.
> >> >
> >> > On Aug 8, 2013 7:53 PM, "Jay Vyas" <ja...@gmail.com> wrote:
> >> >>
> >> >> Then is this a bug?  Synchronization in absence of any race condition
> >> >> is
> >> >> normally considered "bad".
> >> >>
> >> >> In any case id like to know why this writer is synchronized whereas
> the
> >> >> other one are not.. That is, I think, then point at issue: either
> other
> >> >> writers should be synchronized or else this one shouldn't be -
> >> >> consistency
> >> >> across the write implementations is probably desirable so that
> changes
> >> >> to
> >> >> output formats or record writers don't lead to bugs in multithreaded
> >> >> environments .
> >> >>
> >> >> On Aug 8, 2013, at 6:50 AM, Harsh J <ha...@cloudera.com> wrote:
> >> >>
> >> >> While we don't fork by default, we do provide a MultithreadedMapper
> >> >> implementation that would require such synchronization. But if you
> are
> >> >> asking is it necessary, then perhaps the answer is no.
> >> >>
> >> >> On Aug 8, 2013 3:43 PM, "Azuryy Yu" <az...@gmail.com> wrote:
> >> >>>
> >> >>> its not hadoop forked threads, we may create a line record writer,
> >> >>> then
> >> >>> call this writer concurrently.
> >> >>>
> >> >>> On Aug 8, 2013 4:00 PM, "Sathwik B P" <sa...@gmail.com> wrote:
> >> >>>>
> >> >>>> Hi,
> >> >>>> Thanks for your reply.
> >> >>>> May I know where does hadoop fork multiple threads to use a single
> >> >>>> RecordWriter.
> >> >>>>
> >> >>>> regards,
> >> >>>> sathwik
> >> >>>>
> >> >>>> On Thu, Aug 8, 2013 at 7:06 AM, Azuryy Yu <az...@gmail.com>
> wrote:
> >> >>>>>
> >> >>>>> because we may use multi-threads to write a single file.
> >> >>>>>
> >> >>>>> On Aug 8, 2013 2:54 PM, "Sathwik B P" <sa...@apache.org> wrote:
> >> >>>>>>
> >> >>>>>> Hi,
> >> >>>>>>
> >> >>>>>> LineRecordWriter.write(..) is synchronized. I did not find any
> >> >>>>>> other
> >> >>>>>> RecordWriter implementations define the write as synchronized.
> >> >>>>>> Any specific reason for this.
> >> >>>>>>
> >> >>>>>> regards,
> >> >>>>>> sathwik
> >> >>>>
> >> >>>>
> >> >
> >>
> >>
> >>
> >> --
> >> Harsh J
> >
> >
>
>
>
> --
> Harsh J
>

Re: Why LineRecordWriter.write(..) is synchronized

Posted by Niels Basjes <Ni...@basjes.nl>.
I expect the impact on the IO speed to be almost 0 because waiting for a
single disk seek is longer than many thousands of calls to a synchronized
method.

Niels
On Aug 11, 2013 3:00 PM, "Harsh J" <ha...@cloudera.com> wrote:

> Yes, I feel we could discuss this over a JIRA to remove it if it hurts
> perf. too much, but it would have to be a marked incompatible change,
> and we have to add a note about the lack of thread safety in the
> javadoc of base Mapper/Reducer classes.
>
> On Sun, Aug 11, 2013 at 1:26 PM, Sathwik B P <sa...@gmail.com> wrote:
> > Hi Harsh,
> >
> > Does it make any sense to keep the method in LRW still synchronized.
> Isn't
> > it creating unnecessary overhead for non multi threaded implementations.
> >
> > regards,
> > sathwik
> >
> >
> > On Fri, Aug 9, 2013 at 7:16 AM, Harsh J <ha...@cloudera.com> wrote:
> >>
> >> I suppose I should have been clearer. There's no problem out of box if
> >> people stick to the libraries we offer :)
> >>
> >> Yes the LRW was marked synchronized at some point over 8 years ago [1]
> >> in support for multi-threaded maps, but the framework has changed much
> >> since then. The MultithreadedMapper/etc. API we offer now
> >> automatically shields the devs away from having to think of output
> >> thread safety [2].
> >>
> >> I can imagine there can only be a problem if a user writes their own
> >> unsafe multi threaded task. I suppose we could document that in the
> >> Mapper/MapRunner and Reducer APIs.
> >>
> >> [1] - http://svn.apache.org/viewvc?view=revision&revision=171186 -
> >> Commit added a synchronized to the write call.
> >> [2] - MultiThreadedMapper/etc. synchronize over the collector -
> >>
> >>
> http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/map/MultithreadedMapper.java?view=markup
> >>
> >> On Thu, Aug 8, 2013 at 7:52 PM, Azuryy Yu <az...@gmail.com> wrote:
> >> > sequence writer is also synchronized, I dont think this is bad.
> >> >
> >> > if you call HDFS api to write concurrently, then its necessary.
> >> >
> >> > On Aug 8, 2013 7:53 PM, "Jay Vyas" <ja...@gmail.com> wrote:
> >> >>
> >> >> Then is this a bug?  Synchronization in absence of any race condition
> >> >> is
> >> >> normally considered "bad".
> >> >>
> >> >> In any case id like to know why this writer is synchronized whereas
> the
> >> >> other one are not.. That is, I think, then point at issue: either
> other
> >> >> writers should be synchronized or else this one shouldn't be -
> >> >> consistency
> >> >> across the write implementations is probably desirable so that
> changes
> >> >> to
> >> >> output formats or record writers don't lead to bugs in multithreaded
> >> >> environments .
> >> >>
> >> >> On Aug 8, 2013, at 6:50 AM, Harsh J <ha...@cloudera.com> wrote:
> >> >>
> >> >> While we don't fork by default, we do provide a MultithreadedMapper
> >> >> implementation that would require such synchronization. But if you
> are
> >> >> asking is it necessary, then perhaps the answer is no.
> >> >>
> >> >> On Aug 8, 2013 3:43 PM, "Azuryy Yu" <az...@gmail.com> wrote:
> >> >>>
> >> >>> its not hadoop forked threads, we may create a line record writer,
> >> >>> then
> >> >>> call this writer concurrently.
> >> >>>
> >> >>> On Aug 8, 2013 4:00 PM, "Sathwik B P" <sa...@gmail.com> wrote:
> >> >>>>
> >> >>>> Hi,
> >> >>>> Thanks for your reply.
> >> >>>> May I know where does hadoop fork multiple threads to use a single
> >> >>>> RecordWriter.
> >> >>>>
> >> >>>> regards,
> >> >>>> sathwik
> >> >>>>
> >> >>>> On Thu, Aug 8, 2013 at 7:06 AM, Azuryy Yu <az...@gmail.com>
> wrote:
> >> >>>>>
> >> >>>>> because we may use multi-threads to write a single file.
> >> >>>>>
> >> >>>>> On Aug 8, 2013 2:54 PM, "Sathwik B P" <sa...@apache.org> wrote:
> >> >>>>>>
> >> >>>>>> Hi,
> >> >>>>>>
> >> >>>>>> LineRecordWriter.write(..) is synchronized. I did not find any
> >> >>>>>> other
> >> >>>>>> RecordWriter implementations define the write as synchronized.
> >> >>>>>> Any specific reason for this.
> >> >>>>>>
> >> >>>>>> regards,
> >> >>>>>> sathwik
> >> >>>>
> >> >>>>
> >> >
> >>
> >>
> >>
> >> --
> >> Harsh J
> >
> >
>
>
>
> --
> Harsh J
>

Re: Why LineRecordWriter.write(..) is synchronized

Posted by Niels Basjes <Ni...@basjes.nl>.
I expect the impact on the IO speed to be almost 0 because waiting for a
single disk seek is longer than many thousands of calls to a synchronized
method.

Niels
On Aug 11, 2013 3:00 PM, "Harsh J" <ha...@cloudera.com> wrote:

> Yes, I feel we could discuss this over a JIRA to remove it if it hurts
> perf. too much, but it would have to be a marked incompatible change,
> and we have to add a note about the lack of thread safety in the
> javadoc of base Mapper/Reducer classes.
>
> On Sun, Aug 11, 2013 at 1:26 PM, Sathwik B P <sa...@gmail.com> wrote:
> > Hi Harsh,
> >
> > Does it make any sense to keep the method in LRW still synchronized.
> Isn't
> > it creating unnecessary overhead for non multi threaded implementations.
> >
> > regards,
> > sathwik
> >
> >
> > On Fri, Aug 9, 2013 at 7:16 AM, Harsh J <ha...@cloudera.com> wrote:
> >>
> >> I suppose I should have been clearer. There's no problem out of box if
> >> people stick to the libraries we offer :)
> >>
> >> Yes the LRW was marked synchronized at some point over 8 years ago [1]
> >> in support for multi-threaded maps, but the framework has changed much
> >> since then. The MultithreadedMapper/etc. API we offer now
> >> automatically shields the devs away from having to think of output
> >> thread safety [2].
> >>
> >> I can imagine there can only be a problem if a user writes their own
> >> unsafe multi threaded task. I suppose we could document that in the
> >> Mapper/MapRunner and Reducer APIs.
> >>
> >> [1] - http://svn.apache.org/viewvc?view=revision&revision=171186 -
> >> Commit added a synchronized to the write call.
> >> [2] - MultiThreadedMapper/etc. synchronize over the collector -
> >>
> >>
> http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/map/MultithreadedMapper.java?view=markup
> >>
> >> On Thu, Aug 8, 2013 at 7:52 PM, Azuryy Yu <az...@gmail.com> wrote:
> >> > sequence writer is also synchronized, I dont think this is bad.
> >> >
> >> > if you call HDFS api to write concurrently, then its necessary.
> >> >
> >> > On Aug 8, 2013 7:53 PM, "Jay Vyas" <ja...@gmail.com> wrote:
> >> >>
> >> >> Then is this a bug?  Synchronization in absence of any race condition
> >> >> is
> >> >> normally considered "bad".
> >> >>
> >> >> In any case id like to know why this writer is synchronized whereas
> the
> >> >> other one are not.. That is, I think, then point at issue: either
> other
> >> >> writers should be synchronized or else this one shouldn't be -
> >> >> consistency
> >> >> across the write implementations is probably desirable so that
> changes
> >> >> to
> >> >> output formats or record writers don't lead to bugs in multithreaded
> >> >> environments .
> >> >>
> >> >> On Aug 8, 2013, at 6:50 AM, Harsh J <ha...@cloudera.com> wrote:
> >> >>
> >> >> While we don't fork by default, we do provide a MultithreadedMapper
> >> >> implementation that would require such synchronization. But if you
> are
> >> >> asking is it necessary, then perhaps the answer is no.
> >> >>
> >> >> On Aug 8, 2013 3:43 PM, "Azuryy Yu" <az...@gmail.com> wrote:
> >> >>>
> >> >>> its not hadoop forked threads, we may create a line record writer,
> >> >>> then
> >> >>> call this writer concurrently.
> >> >>>
> >> >>> On Aug 8, 2013 4:00 PM, "Sathwik B P" <sa...@gmail.com> wrote:
> >> >>>>
> >> >>>> Hi,
> >> >>>> Thanks for your reply.
> >> >>>> May I know where does hadoop fork multiple threads to use a single
> >> >>>> RecordWriter.
> >> >>>>
> >> >>>> regards,
> >> >>>> sathwik
> >> >>>>
> >> >>>> On Thu, Aug 8, 2013 at 7:06 AM, Azuryy Yu <az...@gmail.com>
> wrote:
> >> >>>>>
> >> >>>>> because we may use multi-threads to write a single file.
> >> >>>>>
> >> >>>>> On Aug 8, 2013 2:54 PM, "Sathwik B P" <sa...@apache.org> wrote:
> >> >>>>>>
> >> >>>>>> Hi,
> >> >>>>>>
> >> >>>>>> LineRecordWriter.write(..) is synchronized. I did not find any
> >> >>>>>> other
> >> >>>>>> RecordWriter implementations define the write as synchronized.
> >> >>>>>> Any specific reason for this.
> >> >>>>>>
> >> >>>>>> regards,
> >> >>>>>> sathwik
> >> >>>>
> >> >>>>
> >> >
> >>
> >>
> >>
> >> --
> >> Harsh J
> >
> >
>
>
>
> --
> Harsh J
>

Re: Why LineRecordWriter.write(..) is synchronized

Posted by Niels Basjes <Ni...@basjes.nl>.
I expect the impact on the IO speed to be almost 0 because waiting for a
single disk seek is longer than many thousands of calls to a synchronized
method.

Niels
On Aug 11, 2013 3:00 PM, "Harsh J" <ha...@cloudera.com> wrote:

> Yes, I feel we could discuss this over a JIRA to remove it if it hurts
> perf. too much, but it would have to be a marked incompatible change,
> and we have to add a note about the lack of thread safety in the
> javadoc of base Mapper/Reducer classes.
>
> On Sun, Aug 11, 2013 at 1:26 PM, Sathwik B P <sa...@gmail.com> wrote:
> > Hi Harsh,
> >
> > Does it make any sense to keep the method in LRW still synchronized.
> Isn't
> > it creating unnecessary overhead for non multi threaded implementations.
> >
> > regards,
> > sathwik
> >
> >
> > On Fri, Aug 9, 2013 at 7:16 AM, Harsh J <ha...@cloudera.com> wrote:
> >>
> >> I suppose I should have been clearer. There's no problem out of box if
> >> people stick to the libraries we offer :)
> >>
> >> Yes the LRW was marked synchronized at some point over 8 years ago [1]
> >> in support for multi-threaded maps, but the framework has changed much
> >> since then. The MultithreadedMapper/etc. API we offer now
> >> automatically shields the devs away from having to think of output
> >> thread safety [2].
> >>
> >> I can imagine there can only be a problem if a user writes their own
> >> unsafe multi threaded task. I suppose we could document that in the
> >> Mapper/MapRunner and Reducer APIs.
> >>
> >> [1] - http://svn.apache.org/viewvc?view=revision&revision=171186 -
> >> Commit added a synchronized to the write call.
> >> [2] - MultiThreadedMapper/etc. synchronize over the collector -
> >>
> >>
> http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/map/MultithreadedMapper.java?view=markup
> >>
> >> On Thu, Aug 8, 2013 at 7:52 PM, Azuryy Yu <az...@gmail.com> wrote:
> >> > sequence writer is also synchronized, I dont think this is bad.
> >> >
> >> > if you call HDFS api to write concurrently, then its necessary.
> >> >
> >> > On Aug 8, 2013 7:53 PM, "Jay Vyas" <ja...@gmail.com> wrote:
> >> >>
> >> >> Then is this a bug?  Synchronization in absence of any race condition
> >> >> is
> >> >> normally considered "bad".
> >> >>
> >> >> In any case id like to know why this writer is synchronized whereas
> the
> >> >> other one are not.. That is, I think, then point at issue: either
> other
> >> >> writers should be synchronized or else this one shouldn't be -
> >> >> consistency
> >> >> across the write implementations is probably desirable so that
> changes
> >> >> to
> >> >> output formats or record writers don't lead to bugs in multithreaded
> >> >> environments .
> >> >>
> >> >> On Aug 8, 2013, at 6:50 AM, Harsh J <ha...@cloudera.com> wrote:
> >> >>
> >> >> While we don't fork by default, we do provide a MultithreadedMapper
> >> >> implementation that would require such synchronization. But if you
> are
> >> >> asking is it necessary, then perhaps the answer is no.
> >> >>
> >> >> On Aug 8, 2013 3:43 PM, "Azuryy Yu" <az...@gmail.com> wrote:
> >> >>>
> >> >>> its not hadoop forked threads, we may create a line record writer,
> >> >>> then
> >> >>> call this writer concurrently.
> >> >>>
> >> >>> On Aug 8, 2013 4:00 PM, "Sathwik B P" <sa...@gmail.com> wrote:
> >> >>>>
> >> >>>> Hi,
> >> >>>> Thanks for your reply.
> >> >>>> May I know where does hadoop fork multiple threads to use a single
> >> >>>> RecordWriter.
> >> >>>>
> >> >>>> regards,
> >> >>>> sathwik
> >> >>>>
> >> >>>> On Thu, Aug 8, 2013 at 7:06 AM, Azuryy Yu <az...@gmail.com>
> wrote:
> >> >>>>>
> >> >>>>> because we may use multi-threads to write a single file.
> >> >>>>>
> >> >>>>> On Aug 8, 2013 2:54 PM, "Sathwik B P" <sa...@apache.org> wrote:
> >> >>>>>>
> >> >>>>>> Hi,
> >> >>>>>>
> >> >>>>>> LineRecordWriter.write(..) is synchronized. I did not find any
> >> >>>>>> other
> >> >>>>>> RecordWriter implementations define the write as synchronized.
> >> >>>>>> Any specific reason for this.
> >> >>>>>>
> >> >>>>>> regards,
> >> >>>>>> sathwik
> >> >>>>
> >> >>>>
> >> >
> >>
> >>
> >>
> >> --
> >> Harsh J
> >
> >
>
>
>
> --
> Harsh J
>

Re: Why LineRecordWriter.write(..) is synchronized

Posted by Harsh J <ha...@cloudera.com>.
Yes, I feel we could discuss this over a JIRA to remove it if it hurts
perf. too much, but it would have to be a marked incompatible change,
and we have to add a note about the lack of thread safety in the
javadoc of base Mapper/Reducer classes.

On Sun, Aug 11, 2013 at 1:26 PM, Sathwik B P <sa...@gmail.com> wrote:
> Hi Harsh,
>
> Does it make any sense to keep the method in LRW still synchronized. Isn't
> it creating unnecessary overhead for non multi threaded implementations.
>
> regards,
> sathwik
>
>
> On Fri, Aug 9, 2013 at 7:16 AM, Harsh J <ha...@cloudera.com> wrote:
>>
>> I suppose I should have been clearer. There's no problem out of box if
>> people stick to the libraries we offer :)
>>
>> Yes the LRW was marked synchronized at some point over 8 years ago [1]
>> in support for multi-threaded maps, but the framework has changed much
>> since then. The MultithreadedMapper/etc. API we offer now
>> automatically shields the devs away from having to think of output
>> thread safety [2].
>>
>> I can imagine there can only be a problem if a user writes their own
>> unsafe multi threaded task. I suppose we could document that in the
>> Mapper/MapRunner and Reducer APIs.
>>
>> [1] - http://svn.apache.org/viewvc?view=revision&revision=171186 -
>> Commit added a synchronized to the write call.
>> [2] - MultiThreadedMapper/etc. synchronize over the collector -
>>
>> http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/map/MultithreadedMapper.java?view=markup
>>
>> On Thu, Aug 8, 2013 at 7:52 PM, Azuryy Yu <az...@gmail.com> wrote:
>> > sequence writer is also synchronized, I dont think this is bad.
>> >
>> > if you call HDFS api to write concurrently, then its necessary.
>> >
>> > On Aug 8, 2013 7:53 PM, "Jay Vyas" <ja...@gmail.com> wrote:
>> >>
>> >> Then is this a bug?  Synchronization in absence of any race condition
>> >> is
>> >> normally considered "bad".
>> >>
>> >> In any case id like to know why this writer is synchronized whereas the
>> >> other one are not.. That is, I think, then point at issue: either other
>> >> writers should be synchronized or else this one shouldn't be -
>> >> consistency
>> >> across the write implementations is probably desirable so that changes
>> >> to
>> >> output formats or record writers don't lead to bugs in multithreaded
>> >> environments .
>> >>
>> >> On Aug 8, 2013, at 6:50 AM, Harsh J <ha...@cloudera.com> wrote:
>> >>
>> >> While we don't fork by default, we do provide a MultithreadedMapper
>> >> implementation that would require such synchronization. But if you are
>> >> asking is it necessary, then perhaps the answer is no.
>> >>
>> >> On Aug 8, 2013 3:43 PM, "Azuryy Yu" <az...@gmail.com> wrote:
>> >>>
>> >>> its not hadoop forked threads, we may create a line record writer,
>> >>> then
>> >>> call this writer concurrently.
>> >>>
>> >>> On Aug 8, 2013 4:00 PM, "Sathwik B P" <sa...@gmail.com> wrote:
>> >>>>
>> >>>> Hi,
>> >>>> Thanks for your reply.
>> >>>> May I know where does hadoop fork multiple threads to use a single
>> >>>> RecordWriter.
>> >>>>
>> >>>> regards,
>> >>>> sathwik
>> >>>>
>> >>>> On Thu, Aug 8, 2013 at 7:06 AM, Azuryy Yu <az...@gmail.com> wrote:
>> >>>>>
>> >>>>> because we may use multi-threads to write a single file.
>> >>>>>
>> >>>>> On Aug 8, 2013 2:54 PM, "Sathwik B P" <sa...@apache.org> wrote:
>> >>>>>>
>> >>>>>> Hi,
>> >>>>>>
>> >>>>>> LineRecordWriter.write(..) is synchronized. I did not find any
>> >>>>>> other
>> >>>>>> RecordWriter implementations define the write as synchronized.
>> >>>>>> Any specific reason for this.
>> >>>>>>
>> >>>>>> regards,
>> >>>>>> sathwik
>> >>>>
>> >>>>
>> >
>>
>>
>>
>> --
>> Harsh J
>
>



-- 
Harsh J

Re: Why LineRecordWriter.write(..) is synchronized

Posted by Harsh J <ha...@cloudera.com>.
Yes, I feel we could discuss this over a JIRA to remove it if it hurts
perf. too much, but it would have to be a marked incompatible change,
and we have to add a note about the lack of thread safety in the
javadoc of base Mapper/Reducer classes.

On Sun, Aug 11, 2013 at 1:26 PM, Sathwik B P <sa...@gmail.com> wrote:
> Hi Harsh,
>
> Does it make any sense to keep the method in LRW still synchronized. Isn't
> it creating unnecessary overhead for non multi threaded implementations.
>
> regards,
> sathwik
>
>
> On Fri, Aug 9, 2013 at 7:16 AM, Harsh J <ha...@cloudera.com> wrote:
>>
>> I suppose I should have been clearer. There's no problem out of box if
>> people stick to the libraries we offer :)
>>
>> Yes the LRW was marked synchronized at some point over 8 years ago [1]
>> in support for multi-threaded maps, but the framework has changed much
>> since then. The MultithreadedMapper/etc. API we offer now
>> automatically shields the devs away from having to think of output
>> thread safety [2].
>>
>> I can imagine there can only be a problem if a user writes their own
>> unsafe multi threaded task. I suppose we could document that in the
>> Mapper/MapRunner and Reducer APIs.
>>
>> [1] - http://svn.apache.org/viewvc?view=revision&revision=171186 -
>> Commit added a synchronized to the write call.
>> [2] - MultiThreadedMapper/etc. synchronize over the collector -
>>
>> http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/map/MultithreadedMapper.java?view=markup
>>
>> On Thu, Aug 8, 2013 at 7:52 PM, Azuryy Yu <az...@gmail.com> wrote:
>> > sequence writer is also synchronized, I dont think this is bad.
>> >
>> > if you call HDFS api to write concurrently, then its necessary.
>> >
>> > On Aug 8, 2013 7:53 PM, "Jay Vyas" <ja...@gmail.com> wrote:
>> >>
>> >> Then is this a bug?  Synchronization in absence of any race condition
>> >> is
>> >> normally considered "bad".
>> >>
>> >> In any case id like to know why this writer is synchronized whereas the
>> >> other one are not.. That is, I think, then point at issue: either other
>> >> writers should be synchronized or else this one shouldn't be -
>> >> consistency
>> >> across the write implementations is probably desirable so that changes
>> >> to
>> >> output formats or record writers don't lead to bugs in multithreaded
>> >> environments .
>> >>
>> >> On Aug 8, 2013, at 6:50 AM, Harsh J <ha...@cloudera.com> wrote:
>> >>
>> >> While we don't fork by default, we do provide a MultithreadedMapper
>> >> implementation that would require such synchronization. But if you are
>> >> asking is it necessary, then perhaps the answer is no.
>> >>
>> >> On Aug 8, 2013 3:43 PM, "Azuryy Yu" <az...@gmail.com> wrote:
>> >>>
>> >>> its not hadoop forked threads, we may create a line record writer,
>> >>> then
>> >>> call this writer concurrently.
>> >>>
>> >>> On Aug 8, 2013 4:00 PM, "Sathwik B P" <sa...@gmail.com> wrote:
>> >>>>
>> >>>> Hi,
>> >>>> Thanks for your reply.
>> >>>> May I know where does hadoop fork multiple threads to use a single
>> >>>> RecordWriter.
>> >>>>
>> >>>> regards,
>> >>>> sathwik
>> >>>>
>> >>>> On Thu, Aug 8, 2013 at 7:06 AM, Azuryy Yu <az...@gmail.com> wrote:
>> >>>>>
>> >>>>> because we may use multi-threads to write a single file.
>> >>>>>
>> >>>>> On Aug 8, 2013 2:54 PM, "Sathwik B P" <sa...@apache.org> wrote:
>> >>>>>>
>> >>>>>> Hi,
>> >>>>>>
>> >>>>>> LineRecordWriter.write(..) is synchronized. I did not find any
>> >>>>>> other
>> >>>>>> RecordWriter implementations define the write as synchronized.
>> >>>>>> Any specific reason for this.
>> >>>>>>
>> >>>>>> regards,
>> >>>>>> sathwik
>> >>>>
>> >>>>
>> >
>>
>>
>>
>> --
>> Harsh J
>
>



-- 
Harsh J

Re: Why LineRecordWriter.write(..) is synchronized

Posted by Harsh J <ha...@cloudera.com>.
Yes, I feel we could discuss this over a JIRA to remove it if it hurts
perf. too much, but it would have to be a marked incompatible change,
and we have to add a note about the lack of thread safety in the
javadoc of base Mapper/Reducer classes.

On Sun, Aug 11, 2013 at 1:26 PM, Sathwik B P <sa...@gmail.com> wrote:
> Hi Harsh,
>
> Does it make any sense to keep the method in LRW still synchronized. Isn't
> it creating unnecessary overhead for non multi threaded implementations.
>
> regards,
> sathwik
>
>
> On Fri, Aug 9, 2013 at 7:16 AM, Harsh J <ha...@cloudera.com> wrote:
>>
>> I suppose I should have been clearer. There's no problem out of box if
>> people stick to the libraries we offer :)
>>
>> Yes the LRW was marked synchronized at some point over 8 years ago [1]
>> in support for multi-threaded maps, but the framework has changed much
>> since then. The MultithreadedMapper/etc. API we offer now
>> automatically shields the devs away from having to think of output
>> thread safety [2].
>>
>> I can imagine there can only be a problem if a user writes their own
>> unsafe multi threaded task. I suppose we could document that in the
>> Mapper/MapRunner and Reducer APIs.
>>
>> [1] - http://svn.apache.org/viewvc?view=revision&revision=171186 -
>> Commit added a synchronized to the write call.
>> [2] - MultiThreadedMapper/etc. synchronize over the collector -
>>
>> http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/map/MultithreadedMapper.java?view=markup
>>
>> On Thu, Aug 8, 2013 at 7:52 PM, Azuryy Yu <az...@gmail.com> wrote:
>> > sequence writer is also synchronized, I dont think this is bad.
>> >
>> > if you call HDFS api to write concurrently, then its necessary.
>> >
>> > On Aug 8, 2013 7:53 PM, "Jay Vyas" <ja...@gmail.com> wrote:
>> >>
>> >> Then is this a bug?  Synchronization in absence of any race condition
>> >> is
>> >> normally considered "bad".
>> >>
>> >> In any case id like to know why this writer is synchronized whereas the
>> >> other one are not.. That is, I think, then point at issue: either other
>> >> writers should be synchronized or else this one shouldn't be -
>> >> consistency
>> >> across the write implementations is probably desirable so that changes
>> >> to
>> >> output formats or record writers don't lead to bugs in multithreaded
>> >> environments .
>> >>
>> >> On Aug 8, 2013, at 6:50 AM, Harsh J <ha...@cloudera.com> wrote:
>> >>
>> >> While we don't fork by default, we do provide a MultithreadedMapper
>> >> implementation that would require such synchronization. But if you are
>> >> asking is it necessary, then perhaps the answer is no.
>> >>
>> >> On Aug 8, 2013 3:43 PM, "Azuryy Yu" <az...@gmail.com> wrote:
>> >>>
>> >>> its not hadoop forked threads, we may create a line record writer,
>> >>> then
>> >>> call this writer concurrently.
>> >>>
>> >>> On Aug 8, 2013 4:00 PM, "Sathwik B P" <sa...@gmail.com> wrote:
>> >>>>
>> >>>> Hi,
>> >>>> Thanks for your reply.
>> >>>> May I know where does hadoop fork multiple threads to use a single
>> >>>> RecordWriter.
>> >>>>
>> >>>> regards,
>> >>>> sathwik
>> >>>>
>> >>>> On Thu, Aug 8, 2013 at 7:06 AM, Azuryy Yu <az...@gmail.com> wrote:
>> >>>>>
>> >>>>> because we may use multi-threads to write a single file.
>> >>>>>
>> >>>>> On Aug 8, 2013 2:54 PM, "Sathwik B P" <sa...@apache.org> wrote:
>> >>>>>>
>> >>>>>> Hi,
>> >>>>>>
>> >>>>>> LineRecordWriter.write(..) is synchronized. I did not find any
>> >>>>>> other
>> >>>>>> RecordWriter implementations define the write as synchronized.
>> >>>>>> Any specific reason for this.
>> >>>>>>
>> >>>>>> regards,
>> >>>>>> sathwik
>> >>>>
>> >>>>
>> >
>>
>>
>>
>> --
>> Harsh J
>
>



-- 
Harsh J

Re: Why LineRecordWriter.write(..) is synchronized

Posted by Harsh J <ha...@cloudera.com>.
Yes, I feel we could discuss this over a JIRA to remove it if it hurts
perf. too much, but it would have to be a marked incompatible change,
and we have to add a note about the lack of thread safety in the
javadoc of base Mapper/Reducer classes.

On Sun, Aug 11, 2013 at 1:26 PM, Sathwik B P <sa...@gmail.com> wrote:
> Hi Harsh,
>
> Does it make any sense to keep the method in LRW still synchronized. Isn't
> it creating unnecessary overhead for non multi threaded implementations.
>
> regards,
> sathwik
>
>
> On Fri, Aug 9, 2013 at 7:16 AM, Harsh J <ha...@cloudera.com> wrote:
>>
>> I suppose I should have been clearer. There's no problem out of box if
>> people stick to the libraries we offer :)
>>
>> Yes the LRW was marked synchronized at some point over 8 years ago [1]
>> in support for multi-threaded maps, but the framework has changed much
>> since then. The MultithreadedMapper/etc. API we offer now
>> automatically shields the devs away from having to think of output
>> thread safety [2].
>>
>> I can imagine there can only be a problem if a user writes their own
>> unsafe multi threaded task. I suppose we could document that in the
>> Mapper/MapRunner and Reducer APIs.
>>
>> [1] - http://svn.apache.org/viewvc?view=revision&revision=171186 -
>> Commit added a synchronized to the write call.
>> [2] - MultiThreadedMapper/etc. synchronize over the collector -
>>
>> http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/map/MultithreadedMapper.java?view=markup
>>
>> On Thu, Aug 8, 2013 at 7:52 PM, Azuryy Yu <az...@gmail.com> wrote:
>> > sequence writer is also synchronized, I dont think this is bad.
>> >
>> > if you call HDFS api to write concurrently, then its necessary.
>> >
>> > On Aug 8, 2013 7:53 PM, "Jay Vyas" <ja...@gmail.com> wrote:
>> >>
>> >> Then is this a bug?  Synchronization in absence of any race condition
>> >> is
>> >> normally considered "bad".
>> >>
>> >> In any case id like to know why this writer is synchronized whereas the
>> >> other one are not.. That is, I think, then point at issue: either other
>> >> writers should be synchronized or else this one shouldn't be -
>> >> consistency
>> >> across the write implementations is probably desirable so that changes
>> >> to
>> >> output formats or record writers don't lead to bugs in multithreaded
>> >> environments .
>> >>
>> >> On Aug 8, 2013, at 6:50 AM, Harsh J <ha...@cloudera.com> wrote:
>> >>
>> >> While we don't fork by default, we do provide a MultithreadedMapper
>> >> implementation that would require such synchronization. But if you are
>> >> asking is it necessary, then perhaps the answer is no.
>> >>
>> >> On Aug 8, 2013 3:43 PM, "Azuryy Yu" <az...@gmail.com> wrote:
>> >>>
>> >>> its not hadoop forked threads, we may create a line record writer,
>> >>> then
>> >>> call this writer concurrently.
>> >>>
>> >>> On Aug 8, 2013 4:00 PM, "Sathwik B P" <sa...@gmail.com> wrote:
>> >>>>
>> >>>> Hi,
>> >>>> Thanks for your reply.
>> >>>> May I know where does hadoop fork multiple threads to use a single
>> >>>> RecordWriter.
>> >>>>
>> >>>> regards,
>> >>>> sathwik
>> >>>>
>> >>>> On Thu, Aug 8, 2013 at 7:06 AM, Azuryy Yu <az...@gmail.com> wrote:
>> >>>>>
>> >>>>> because we may use multi-threads to write a single file.
>> >>>>>
>> >>>>> On Aug 8, 2013 2:54 PM, "Sathwik B P" <sa...@apache.org> wrote:
>> >>>>>>
>> >>>>>> Hi,
>> >>>>>>
>> >>>>>> LineRecordWriter.write(..) is synchronized. I did not find any
>> >>>>>> other
>> >>>>>> RecordWriter implementations define the write as synchronized.
>> >>>>>> Any specific reason for this.
>> >>>>>>
>> >>>>>> regards,
>> >>>>>> sathwik
>> >>>>
>> >>>>
>> >
>>
>>
>>
>> --
>> Harsh J
>
>



-- 
Harsh J

Re: Why LineRecordWriter.write(..) is synchronized

Posted by Sathwik B P <sa...@gmail.com>.
Hi Harsh,

Does it make any sense to keep the method in LRW still synchronized. Isn't
it creating unnecessary overhead for non multi threaded implementations.

regards,
sathwik

On Fri, Aug 9, 2013 at 7:16 AM, Harsh J <ha...@cloudera.com> wrote:

> I suppose I should have been clearer. There's no problem out of box if
> people stick to the libraries we offer :)
>
> Yes the LRW was marked synchronized at some point over 8 years ago [1]
> in support for multi-threaded maps, but the framework has changed much
> since then. The MultithreadedMapper/etc. API we offer now
> automatically shields the devs away from having to think of output
> thread safety [2].
>
> I can imagine there can only be a problem if a user writes their own
> unsafe multi threaded task. I suppose we could document that in the
> Mapper/MapRunner and Reducer APIs.
>
> [1] - http://svn.apache.org/viewvc?view=revision&revision=171186 -
> Commit added a synchronized to the write call.
> [2] - MultiThreadedMapper/etc. synchronize over the collector -
>
> http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/map/MultithreadedMapper.java?view=markup
>
> On Thu, Aug 8, 2013 at 7:52 PM, Azuryy Yu <az...@gmail.com> wrote:
> > sequence writer is also synchronized, I dont think this is bad.
> >
> > if you call HDFS api to write concurrently, then its necessary.
> >
> > On Aug 8, 2013 7:53 PM, "Jay Vyas" <ja...@gmail.com> wrote:
> >>
> >> Then is this a bug?  Synchronization in absence of any race condition is
> >> normally considered "bad".
> >>
> >> In any case id like to know why this writer is synchronized whereas the
> >> other one are not.. That is, I think, then point at issue: either other
> >> writers should be synchronized or else this one shouldn't be -
> consistency
> >> across the write implementations is probably desirable so that changes
> to
> >> output formats or record writers don't lead to bugs in multithreaded
> >> environments .
> >>
> >> On Aug 8, 2013, at 6:50 AM, Harsh J <ha...@cloudera.com> wrote:
> >>
> >> While we don't fork by default, we do provide a MultithreadedMapper
> >> implementation that would require such synchronization. But if you are
> >> asking is it necessary, then perhaps the answer is no.
> >>
> >> On Aug 8, 2013 3:43 PM, "Azuryy Yu" <az...@gmail.com> wrote:
> >>>
> >>> its not hadoop forked threads, we may create a line record writer, then
> >>> call this writer concurrently.
> >>>
> >>> On Aug 8, 2013 4:00 PM, "Sathwik B P" <sa...@gmail.com> wrote:
> >>>>
> >>>> Hi,
> >>>> Thanks for your reply.
> >>>> May I know where does hadoop fork multiple threads to use a single
> >>>> RecordWriter.
> >>>>
> >>>> regards,
> >>>> sathwik
> >>>>
> >>>> On Thu, Aug 8, 2013 at 7:06 AM, Azuryy Yu <az...@gmail.com> wrote:
> >>>>>
> >>>>> because we may use multi-threads to write a single file.
> >>>>>
> >>>>> On Aug 8, 2013 2:54 PM, "Sathwik B P" <sa...@apache.org> wrote:
> >>>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> LineRecordWriter.write(..) is synchronized. I did not find any other
> >>>>>> RecordWriter implementations define the write as synchronized.
> >>>>>> Any specific reason for this.
> >>>>>>
> >>>>>> regards,
> >>>>>> sathwik
> >>>>
> >>>>
> >
>
>
>
> --
> Harsh J
>

Re: Why LineRecordWriter.write(..) is synchronized

Posted by Sathwik B P <sa...@gmail.com>.
Hi Harsh,

Does it make any sense to keep the method in LRW still synchronized. Isn't
it creating unnecessary overhead for non multi threaded implementations.

regards,
sathwik

On Fri, Aug 9, 2013 at 7:16 AM, Harsh J <ha...@cloudera.com> wrote:

> I suppose I should have been clearer. There's no problem out of box if
> people stick to the libraries we offer :)
>
> Yes the LRW was marked synchronized at some point over 8 years ago [1]
> in support for multi-threaded maps, but the framework has changed much
> since then. The MultithreadedMapper/etc. API we offer now
> automatically shields the devs away from having to think of output
> thread safety [2].
>
> I can imagine there can only be a problem if a user writes their own
> unsafe multi threaded task. I suppose we could document that in the
> Mapper/MapRunner and Reducer APIs.
>
> [1] - http://svn.apache.org/viewvc?view=revision&revision=171186 -
> Commit added a synchronized to the write call.
> [2] - MultiThreadedMapper/etc. synchronize over the collector -
>
> http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/map/MultithreadedMapper.java?view=markup
>
> On Thu, Aug 8, 2013 at 7:52 PM, Azuryy Yu <az...@gmail.com> wrote:
> > sequence writer is also synchronized, I dont think this is bad.
> >
> > if you call HDFS api to write concurrently, then its necessary.
> >
> > On Aug 8, 2013 7:53 PM, "Jay Vyas" <ja...@gmail.com> wrote:
> >>
> >> Then is this a bug?  Synchronization in absence of any race condition is
> >> normally considered "bad".
> >>
> >> In any case id like to know why this writer is synchronized whereas the
> >> other one are not.. That is, I think, then point at issue: either other
> >> writers should be synchronized or else this one shouldn't be -
> consistency
> >> across the write implementations is probably desirable so that changes
> to
> >> output formats or record writers don't lead to bugs in multithreaded
> >> environments .
> >>
> >> On Aug 8, 2013, at 6:50 AM, Harsh J <ha...@cloudera.com> wrote:
> >>
> >> While we don't fork by default, we do provide a MultithreadedMapper
> >> implementation that would require such synchronization. But if you are
> >> asking is it necessary, then perhaps the answer is no.
> >>
> >> On Aug 8, 2013 3:43 PM, "Azuryy Yu" <az...@gmail.com> wrote:
> >>>
> >>> its not hadoop forked threads, we may create a line record writer, then
> >>> call this writer concurrently.
> >>>
> >>> On Aug 8, 2013 4:00 PM, "Sathwik B P" <sa...@gmail.com> wrote:
> >>>>
> >>>> Hi,
> >>>> Thanks for your reply.
> >>>> May I know where does hadoop fork multiple threads to use a single
> >>>> RecordWriter.
> >>>>
> >>>> regards,
> >>>> sathwik
> >>>>
> >>>> On Thu, Aug 8, 2013 at 7:06 AM, Azuryy Yu <az...@gmail.com> wrote:
> >>>>>
> >>>>> because we may use multi-threads to write a single file.
> >>>>>
> >>>>> On Aug 8, 2013 2:54 PM, "Sathwik B P" <sa...@apache.org> wrote:
> >>>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> LineRecordWriter.write(..) is synchronized. I did not find any other
> >>>>>> RecordWriter implementations define the write as synchronized.
> >>>>>> Any specific reason for this.
> >>>>>>
> >>>>>> regards,
> >>>>>> sathwik
> >>>>
> >>>>
> >
>
>
>
> --
> Harsh J
>

Re: Why LineRecordWriter.write(..) is synchronized

Posted by Sathwik B P <sa...@gmail.com>.
Hi Harsh,

Does it make any sense to keep the method in LRW still synchronized. Isn't
it creating unnecessary overhead for non multi threaded implementations.

regards,
sathwik

On Fri, Aug 9, 2013 at 7:16 AM, Harsh J <ha...@cloudera.com> wrote:

> I suppose I should have been clearer. There's no problem out of box if
> people stick to the libraries we offer :)
>
> Yes the LRW was marked synchronized at some point over 8 years ago [1]
> in support for multi-threaded maps, but the framework has changed much
> since then. The MultithreadedMapper/etc. API we offer now
> automatically shields the devs away from having to think of output
> thread safety [2].
>
> I can imagine there can only be a problem if a user writes their own
> unsafe multi threaded task. I suppose we could document that in the
> Mapper/MapRunner and Reducer APIs.
>
> [1] - http://svn.apache.org/viewvc?view=revision&revision=171186 -
> Commit added a synchronized to the write call.
> [2] - MultiThreadedMapper/etc. synchronize over the collector -
>
> http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/map/MultithreadedMapper.java?view=markup
>
> On Thu, Aug 8, 2013 at 7:52 PM, Azuryy Yu <az...@gmail.com> wrote:
> > sequence writer is also synchronized, I dont think this is bad.
> >
> > if you call HDFS api to write concurrently, then its necessary.
> >
> > On Aug 8, 2013 7:53 PM, "Jay Vyas" <ja...@gmail.com> wrote:
> >>
> >> Then is this a bug?  Synchronization in absence of any race condition is
> >> normally considered "bad".
> >>
> >> In any case id like to know why this writer is synchronized whereas the
> >> other one are not.. That is, I think, then point at issue: either other
> >> writers should be synchronized or else this one shouldn't be -
> consistency
> >> across the write implementations is probably desirable so that changes
> to
> >> output formats or record writers don't lead to bugs in multithreaded
> >> environments .
> >>
> >> On Aug 8, 2013, at 6:50 AM, Harsh J <ha...@cloudera.com> wrote:
> >>
> >> While we don't fork by default, we do provide a MultithreadedMapper
> >> implementation that would require such synchronization. But if you are
> >> asking is it necessary, then perhaps the answer is no.
> >>
> >> On Aug 8, 2013 3:43 PM, "Azuryy Yu" <az...@gmail.com> wrote:
> >>>
> >>> its not hadoop forked threads, we may create a line record writer, then
> >>> call this writer concurrently.
> >>>
> >>> On Aug 8, 2013 4:00 PM, "Sathwik B P" <sa...@gmail.com> wrote:
> >>>>
> >>>> Hi,
> >>>> Thanks for your reply.
> >>>> May I know where does hadoop fork multiple threads to use a single
> >>>> RecordWriter.
> >>>>
> >>>> regards,
> >>>> sathwik
> >>>>
> >>>> On Thu, Aug 8, 2013 at 7:06 AM, Azuryy Yu <az...@gmail.com> wrote:
> >>>>>
> >>>>> because we may use multi-threads to write a single file.
> >>>>>
> >>>>> On Aug 8, 2013 2:54 PM, "Sathwik B P" <sa...@apache.org> wrote:
> >>>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> LineRecordWriter.write(..) is synchronized. I did not find any other
> >>>>>> RecordWriter implementations define the write as synchronized.
> >>>>>> Any specific reason for this.
> >>>>>>
> >>>>>> regards,
> >>>>>> sathwik
> >>>>
> >>>>
> >
>
>
>
> --
> Harsh J
>

Re: Why LineRecordWriter.write(..) is synchronized

Posted by Sathwik B P <sa...@gmail.com>.
Hi Harsh,

Does it make any sense to keep the method in LRW still synchronized. Isn't
it creating unnecessary overhead for non multi threaded implementations.

regards,
sathwik

On Fri, Aug 9, 2013 at 7:16 AM, Harsh J <ha...@cloudera.com> wrote:

> I suppose I should have been clearer. There's no problem out of box if
> people stick to the libraries we offer :)
>
> Yes the LRW was marked synchronized at some point over 8 years ago [1]
> in support for multi-threaded maps, but the framework has changed much
> since then. The MultithreadedMapper/etc. API we offer now
> automatically shields the devs away from having to think of output
> thread safety [2].
>
> I can imagine there can only be a problem if a user writes their own
> unsafe multi threaded task. I suppose we could document that in the
> Mapper/MapRunner and Reducer APIs.
>
> [1] - http://svn.apache.org/viewvc?view=revision&revision=171186 -
> Commit added a synchronized to the write call.
> [2] - MultiThreadedMapper/etc. synchronize over the collector -
>
> http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/map/MultithreadedMapper.java?view=markup
>
> On Thu, Aug 8, 2013 at 7:52 PM, Azuryy Yu <az...@gmail.com> wrote:
> > sequence writer is also synchronized, I dont think this is bad.
> >
> > if you call HDFS api to write concurrently, then its necessary.
> >
> > On Aug 8, 2013 7:53 PM, "Jay Vyas" <ja...@gmail.com> wrote:
> >>
> >> Then is this a bug?  Synchronization in absence of any race condition is
> >> normally considered "bad".
> >>
> >> In any case id like to know why this writer is synchronized whereas the
> >> other one are not.. That is, I think, then point at issue: either other
> >> writers should be synchronized or else this one shouldn't be -
> consistency
> >> across the write implementations is probably desirable so that changes
> to
> >> output formats or record writers don't lead to bugs in multithreaded
> >> environments .
> >>
> >> On Aug 8, 2013, at 6:50 AM, Harsh J <ha...@cloudera.com> wrote:
> >>
> >> While we don't fork by default, we do provide a MultithreadedMapper
> >> implementation that would require such synchronization. But if you are
> >> asking is it necessary, then perhaps the answer is no.
> >>
> >> On Aug 8, 2013 3:43 PM, "Azuryy Yu" <az...@gmail.com> wrote:
> >>>
> >>> its not hadoop forked threads, we may create a line record writer, then
> >>> call this writer concurrently.
> >>>
> >>> On Aug 8, 2013 4:00 PM, "Sathwik B P" <sa...@gmail.com> wrote:
> >>>>
> >>>> Hi,
> >>>> Thanks for your reply.
> >>>> May I know where does hadoop fork multiple threads to use a single
> >>>> RecordWriter.
> >>>>
> >>>> regards,
> >>>> sathwik
> >>>>
> >>>> On Thu, Aug 8, 2013 at 7:06 AM, Azuryy Yu <az...@gmail.com> wrote:
> >>>>>
> >>>>> because we may use multi-threads to write a single file.
> >>>>>
> >>>>> On Aug 8, 2013 2:54 PM, "Sathwik B P" <sa...@apache.org> wrote:
> >>>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> LineRecordWriter.write(..) is synchronized. I did not find any other
> >>>>>> RecordWriter implementations define the write as synchronized.
> >>>>>> Any specific reason for this.
> >>>>>>
> >>>>>> regards,
> >>>>>> sathwik
> >>>>
> >>>>
> >
>
>
>
> --
> Harsh J
>

Re: Why LineRecordWriter.write(..) is synchronized

Posted by Harsh J <ha...@cloudera.com>.
I suppose I should have been clearer. There's no problem out of box if
people stick to the libraries we offer :)

Yes the LRW was marked synchronized at some point over 8 years ago [1]
in support for multi-threaded maps, but the framework has changed much
since then. The MultithreadedMapper/etc. API we offer now
automatically shields the devs away from having to think of output
thread safety [2].

I can imagine there can only be a problem if a user writes their own
unsafe multi threaded task. I suppose we could document that in the
Mapper/MapRunner and Reducer APIs.

[1] - http://svn.apache.org/viewvc?view=revision&revision=171186 -
Commit added a synchronized to the write call.
[2] - MultiThreadedMapper/etc. synchronize over the collector -
http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/map/MultithreadedMapper.java?view=markup

On Thu, Aug 8, 2013 at 7:52 PM, Azuryy Yu <az...@gmail.com> wrote:
> sequence writer is also synchronized, I dont think this is bad.
>
> if you call HDFS api to write concurrently, then its necessary.
>
> On Aug 8, 2013 7:53 PM, "Jay Vyas" <ja...@gmail.com> wrote:
>>
>> Then is this a bug?  Synchronization in absence of any race condition is
>> normally considered "bad".
>>
>> In any case id like to know why this writer is synchronized whereas the
>> other one are not.. That is, I think, then point at issue: either other
>> writers should be synchronized or else this one shouldn't be - consistency
>> across the write implementations is probably desirable so that changes to
>> output formats or record writers don't lead to bugs in multithreaded
>> environments .
>>
>> On Aug 8, 2013, at 6:50 AM, Harsh J <ha...@cloudera.com> wrote:
>>
>> While we don't fork by default, we do provide a MultithreadedMapper
>> implementation that would require such synchronization. But if you are
>> asking is it necessary, then perhaps the answer is no.
>>
>> On Aug 8, 2013 3:43 PM, "Azuryy Yu" <az...@gmail.com> wrote:
>>>
>>> its not hadoop forked threads, we may create a line record writer, then
>>> call this writer concurrently.
>>>
>>> On Aug 8, 2013 4:00 PM, "Sathwik B P" <sa...@gmail.com> wrote:
>>>>
>>>> Hi,
>>>> Thanks for your reply.
>>>> May I know where does hadoop fork multiple threads to use a single
>>>> RecordWriter.
>>>>
>>>> regards,
>>>> sathwik
>>>>
>>>> On Thu, Aug 8, 2013 at 7:06 AM, Azuryy Yu <az...@gmail.com> wrote:
>>>>>
>>>>> because we may use multi-threads to write a single file.
>>>>>
>>>>> On Aug 8, 2013 2:54 PM, "Sathwik B P" <sa...@apache.org> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> LineRecordWriter.write(..) is synchronized. I did not find any other
>>>>>> RecordWriter implementations define the write as synchronized.
>>>>>> Any specific reason for this.
>>>>>>
>>>>>> regards,
>>>>>> sathwik
>>>>
>>>>
>



-- 
Harsh J

Re: Why LineRecordWriter.write(..) is synchronized

Posted by Harsh J <ha...@cloudera.com>.
I suppose I should have been clearer. There's no problem out of box if
people stick to the libraries we offer :)

Yes the LRW was marked synchronized at some point over 8 years ago [1]
in support for multi-threaded maps, but the framework has changed much
since then. The MultithreadedMapper/etc. API we offer now
automatically shields the devs away from having to think of output
thread safety [2].

I can imagine there can only be a problem if a user writes their own
unsafe multi threaded task. I suppose we could document that in the
Mapper/MapRunner and Reducer APIs.

[1] - http://svn.apache.org/viewvc?view=revision&revision=171186 -
Commit added a synchronized to the write call.
[2] - MultiThreadedMapper/etc. synchronize over the collector -
http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/map/MultithreadedMapper.java?view=markup

On Thu, Aug 8, 2013 at 7:52 PM, Azuryy Yu <az...@gmail.com> wrote:
> sequence writer is also synchronized, I dont think this is bad.
>
> if you call HDFS api to write concurrently, then its necessary.
>
> On Aug 8, 2013 7:53 PM, "Jay Vyas" <ja...@gmail.com> wrote:
>>
>> Then is this a bug?  Synchronization in absence of any race condition is
>> normally considered "bad".
>>
>> In any case id like to know why this writer is synchronized whereas the
>> other one are not.. That is, I think, then point at issue: either other
>> writers should be synchronized or else this one shouldn't be - consistency
>> across the write implementations is probably desirable so that changes to
>> output formats or record writers don't lead to bugs in multithreaded
>> environments .
>>
>> On Aug 8, 2013, at 6:50 AM, Harsh J <ha...@cloudera.com> wrote:
>>
>> While we don't fork by default, we do provide a MultithreadedMapper
>> implementation that would require such synchronization. But if you are
>> asking is it necessary, then perhaps the answer is no.
>>
>> On Aug 8, 2013 3:43 PM, "Azuryy Yu" <az...@gmail.com> wrote:
>>>
>>> its not hadoop forked threads, we may create a line record writer, then
>>> call this writer concurrently.
>>>
>>> On Aug 8, 2013 4:00 PM, "Sathwik B P" <sa...@gmail.com> wrote:
>>>>
>>>> Hi,
>>>> Thanks for your reply.
>>>> May I know where does hadoop fork multiple threads to use a single
>>>> RecordWriter.
>>>>
>>>> regards,
>>>> sathwik
>>>>
>>>> On Thu, Aug 8, 2013 at 7:06 AM, Azuryy Yu <az...@gmail.com> wrote:
>>>>>
>>>>> because we may use multi-threads to write a single file.
>>>>>
>>>>> On Aug 8, 2013 2:54 PM, "Sathwik B P" <sa...@apache.org> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> LineRecordWriter.write(..) is synchronized. I did not find any other
>>>>>> RecordWriter implementations define the write as synchronized.
>>>>>> Any specific reason for this.
>>>>>>
>>>>>> regards,
>>>>>> sathwik
>>>>
>>>>
>



-- 
Harsh J

Re: Why LineRecordWriter.write(..) is synchronized

Posted by Harsh J <ha...@cloudera.com>.
I suppose I should have been clearer. There's no problem out of box if
people stick to the libraries we offer :)

Yes the LRW was marked synchronized at some point over 8 years ago [1]
in support for multi-threaded maps, but the framework has changed much
since then. The MultithreadedMapper/etc. API we offer now
automatically shields the devs away from having to think of output
thread safety [2].

I can imagine there can only be a problem if a user writes their own
unsafe multi threaded task. I suppose we could document that in the
Mapper/MapRunner and Reducer APIs.

[1] - http://svn.apache.org/viewvc?view=revision&revision=171186 -
Commit added a synchronized to the write call.
[2] - MultiThreadedMapper/etc. synchronize over the collector -
http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/map/MultithreadedMapper.java?view=markup

On Thu, Aug 8, 2013 at 7:52 PM, Azuryy Yu <az...@gmail.com> wrote:
> sequence writer is also synchronized, I dont think this is bad.
>
> if you call HDFS api to write concurrently, then its necessary.
>
> On Aug 8, 2013 7:53 PM, "Jay Vyas" <ja...@gmail.com> wrote:
>>
>> Then is this a bug?  Synchronization in absence of any race condition is
>> normally considered "bad".
>>
>> In any case id like to know why this writer is synchronized whereas the
>> other one are not.. That is, I think, then point at issue: either other
>> writers should be synchronized or else this one shouldn't be - consistency
>> across the write implementations is probably desirable so that changes to
>> output formats or record writers don't lead to bugs in multithreaded
>> environments .
>>
>> On Aug 8, 2013, at 6:50 AM, Harsh J <ha...@cloudera.com> wrote:
>>
>> While we don't fork by default, we do provide a MultithreadedMapper
>> implementation that would require such synchronization. But if you are
>> asking is it necessary, then perhaps the answer is no.
>>
>> On Aug 8, 2013 3:43 PM, "Azuryy Yu" <az...@gmail.com> wrote:
>>>
>>> its not hadoop forked threads, we may create a line record writer, then
>>> call this writer concurrently.
>>>
>>> On Aug 8, 2013 4:00 PM, "Sathwik B P" <sa...@gmail.com> wrote:
>>>>
>>>> Hi,
>>>> Thanks for your reply.
>>>> May I know where does hadoop fork multiple threads to use a single
>>>> RecordWriter.
>>>>
>>>> regards,
>>>> sathwik
>>>>
>>>> On Thu, Aug 8, 2013 at 7:06 AM, Azuryy Yu <az...@gmail.com> wrote:
>>>>>
>>>>> because we may use multi-threads to write a single file.
>>>>>
>>>>> On Aug 8, 2013 2:54 PM, "Sathwik B P" <sa...@apache.org> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> LineRecordWriter.write(..) is synchronized. I did not find any other
>>>>>> RecordWriter implementations define the write as synchronized.
>>>>>> Any specific reason for this.
>>>>>>
>>>>>> regards,
>>>>>> sathwik
>>>>
>>>>
>



-- 
Harsh J

Re: Why LineRecordWriter.write(..) is synchronized

Posted by Harsh J <ha...@cloudera.com>.
I suppose I should have been clearer. There's no problem out of box if
people stick to the libraries we offer :)

Yes the LRW was marked synchronized at some point over 8 years ago [1]
in support for multi-threaded maps, but the framework has changed much
since then. The MultithreadedMapper/etc. API we offer now
automatically shields the devs away from having to think of output
thread safety [2].

I can imagine there can only be a problem if a user writes their own
unsafe multi threaded task. I suppose we could document that in the
Mapper/MapRunner and Reducer APIs.

[1] - http://svn.apache.org/viewvc?view=revision&revision=171186 -
Commit added a synchronized to the write call.
[2] - MultiThreadedMapper/etc. synchronize over the collector -
http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/map/MultithreadedMapper.java?view=markup

On Thu, Aug 8, 2013 at 7:52 PM, Azuryy Yu <az...@gmail.com> wrote:
> sequence writer is also synchronized, I dont think this is bad.
>
> if you call HDFS api to write concurrently, then its necessary.
>
> On Aug 8, 2013 7:53 PM, "Jay Vyas" <ja...@gmail.com> wrote:
>>
>> Then is this a bug?  Synchronization in absence of any race condition is
>> normally considered "bad".
>>
>> In any case id like to know why this writer is synchronized whereas the
>> other one are not.. That is, I think, then point at issue: either other
>> writers should be synchronized or else this one shouldn't be - consistency
>> across the write implementations is probably desirable so that changes to
>> output formats or record writers don't lead to bugs in multithreaded
>> environments .
>>
>> On Aug 8, 2013, at 6:50 AM, Harsh J <ha...@cloudera.com> wrote:
>>
>> While we don't fork by default, we do provide a MultithreadedMapper
>> implementation that would require such synchronization. But if you are
>> asking is it necessary, then perhaps the answer is no.
>>
>> On Aug 8, 2013 3:43 PM, "Azuryy Yu" <az...@gmail.com> wrote:
>>>
>>> its not hadoop forked threads, we may create a line record writer, then
>>> call this writer concurrently.
>>>
>>> On Aug 8, 2013 4:00 PM, "Sathwik B P" <sa...@gmail.com> wrote:
>>>>
>>>> Hi,
>>>> Thanks for your reply.
>>>> May I know where does hadoop fork multiple threads to use a single
>>>> RecordWriter.
>>>>
>>>> regards,
>>>> sathwik
>>>>
>>>> On Thu, Aug 8, 2013 at 7:06 AM, Azuryy Yu <az...@gmail.com> wrote:
>>>>>
>>>>> because we may use multi-threads to write a single file.
>>>>>
>>>>> On Aug 8, 2013 2:54 PM, "Sathwik B P" <sa...@apache.org> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> LineRecordWriter.write(..) is synchronized. I did not find any other
>>>>>> RecordWriter implementations define the write as synchronized.
>>>>>> Any specific reason for this.
>>>>>>
>>>>>> regards,
>>>>>> sathwik
>>>>
>>>>
>



-- 
Harsh J

Re: Why LineRecordWriter.write(..) is synchronized

Posted by Azuryy Yu <az...@gmail.com>.
sequence writer is also synchronized, I dont think this is bad.

if you call HDFS api to write concurrently, then its necessary.
On Aug 8, 2013 7:53 PM, "Jay Vyas" <ja...@gmail.com> wrote:

> Then is this a bug?  Synchronization in absence of any race condition is
> normally considered "bad".
>
> In any case id like to know why this writer is synchronized whereas the
> other one are not.. That is, I think, then point at issue: either other
> writers should be synchronized or else this one shouldn't be - consistency
> across the write implementations is probably desirable so that changes to
> output formats or record writers don't lead to bugs in multithreaded
> environments .
>
> On Aug 8, 2013, at 6:50 AM, Harsh J <ha...@cloudera.com> wrote:
>
> While we don't fork by default, we do provide a MultithreadedMapper
> implementation that would require such synchronization. But if you are
> asking is it necessary, then perhaps the answer is no.
> On Aug 8, 2013 3:43 PM, "Azuryy Yu" <az...@gmail.com> wrote:
>
>> its not hadoop forked threads, we may create a line record writer, then
>> call this writer concurrently.
>> On Aug 8, 2013 4:00 PM, "Sathwik B P" <sa...@gmail.com> wrote:
>>
>>> Hi,
>>> Thanks for your reply.
>>> May I know where does hadoop fork multiple threads to use a single
>>> RecordWriter.
>>>
>>> regards,
>>> sathwik
>>>
>>> On Thu, Aug 8, 2013 at 7:06 AM, Azuryy Yu <az...@gmail.com> wrote:
>>>
>>>> because we may use multi-threads to write a single file.
>>>> On Aug 8, 2013 2:54 PM, "Sathwik B P" <sa...@apache.org> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> LineRecordWriter.write(..) is synchronized. I did not find any other
>>>>> RecordWriter implementations define the write as synchronized.
>>>>> Any specific reason for this.
>>>>>
>>>>> regards,
>>>>> sathwik
>>>>>
>>>>
>>>

Re: Why LineRecordWriter.write(..) is synchronized

Posted by Azuryy Yu <az...@gmail.com>.
sequence writer is also synchronized, I dont think this is bad.

if you call HDFS api to write concurrently, then its necessary.
On Aug 8, 2013 7:53 PM, "Jay Vyas" <ja...@gmail.com> wrote:

> Then is this a bug?  Synchronization in absence of any race condition is
> normally considered "bad".
>
> In any case id like to know why this writer is synchronized whereas the
> other one are not.. That is, I think, then point at issue: either other
> writers should be synchronized or else this one shouldn't be - consistency
> across the write implementations is probably desirable so that changes to
> output formats or record writers don't lead to bugs in multithreaded
> environments .
>
> On Aug 8, 2013, at 6:50 AM, Harsh J <ha...@cloudera.com> wrote:
>
> While we don't fork by default, we do provide a MultithreadedMapper
> implementation that would require such synchronization. But if you are
> asking is it necessary, then perhaps the answer is no.
> On Aug 8, 2013 3:43 PM, "Azuryy Yu" <az...@gmail.com> wrote:
>
>> its not hadoop forked threads, we may create a line record writer, then
>> call this writer concurrently.
>> On Aug 8, 2013 4:00 PM, "Sathwik B P" <sa...@gmail.com> wrote:
>>
>>> Hi,
>>> Thanks for your reply.
>>> May I know where does hadoop fork multiple threads to use a single
>>> RecordWriter.
>>>
>>> regards,
>>> sathwik
>>>
>>> On Thu, Aug 8, 2013 at 7:06 AM, Azuryy Yu <az...@gmail.com> wrote:
>>>
>>>> because we may use multi-threads to write a single file.
>>>> On Aug 8, 2013 2:54 PM, "Sathwik B P" <sa...@apache.org> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> LineRecordWriter.write(..) is synchronized. I did not find any other
>>>>> RecordWriter implementations define the write as synchronized.
>>>>> Any specific reason for this.
>>>>>
>>>>> regards,
>>>>> sathwik
>>>>>
>>>>
>>>

Re: Why LineRecordWriter.write(..) is synchronized

Posted by Niels Basjes <Ni...@basjes.nl>.
I would say yes make this a Jira.
The actual change can fall (as proposed by Jay) in two directions: Put
in synchronization
in all implementations OR take it out of all implementations.

I think the first thing to determine is why the synchronization was put
into the  LineRecordWriter in the first place.

https://github.com/apache/hadoop-common/blame/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/TextOutputFormat.java

The oldest I have been able to find is a commit on 2009-05-18 for
HADOOP-4687 that is about moving stuff around (i.e. this code is even older
than that).

Niels



On Thu, Aug 8, 2013 at 2:21 PM, Sathwik B P <sa...@apache.org> wrote:

> Hi Harsh,
>
> Do you want me to raise a Jira on this.
>
> regards,
> sathwik
>
>
> On Thu, Aug 8, 2013 at 5:23 PM, Jay Vyas <ja...@gmail.com> wrote:
>
>> Then is this a bug?  Synchronization in absence of any race condition is
>> normally considered "bad".
>>
>> In any case id like to know why this writer is synchronized whereas the
>> other one are not.. That is, I think, then point at issue: either other
>> writers should be synchronized or else this one shouldn't be - consistency
>> across the write implementations is probably desirable so that changes to
>> output formats or record writers don't lead to bugs in multithreaded
>> environments .
>>
>> On Aug 8, 2013, at 6:50 AM, Harsh J <ha...@cloudera.com> wrote:
>>
>> While we don't fork by default, we do provide a MultithreadedMapper
>> implementation that would require such synchronization. But if you are
>> asking is it necessary, then perhaps the answer is no.
>> On Aug 8, 2013 3:43 PM, "Azuryy Yu" <az...@gmail.com> wrote:
>>
>>> its not hadoop forked threads, we may create a line record writer, then
>>> call this writer concurrently.
>>> On Aug 8, 2013 4:00 PM, "Sathwik B P" <sa...@gmail.com> wrote:
>>>
>>>> Hi,
>>>> Thanks for your reply.
>>>> May I know where does hadoop fork multiple threads to use a single
>>>> RecordWriter.
>>>>
>>>> regards,
>>>> sathwik
>>>>
>>>> On Thu, Aug 8, 2013 at 7:06 AM, Azuryy Yu <az...@gmail.com> wrote:
>>>>
>>>>> because we may use multi-threads to write a single file.
>>>>> On Aug 8, 2013 2:54 PM, "Sathwik B P" <sa...@apache.org> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> LineRecordWriter.write(..) is synchronized. I did not find any other
>>>>>> RecordWriter implementations define the write as synchronized.
>>>>>> Any specific reason for this.
>>>>>>
>>>>>> regards,
>>>>>> sathwik
>>>>>>
>>>>>
>>>>
>


-- 
Best regards / Met vriendelijke groeten,

Niels Basjes

Re: Why LineRecordWriter.write(..) is synchronized

Posted by Niels Basjes <Ni...@basjes.nl>.
I would say yes make this a Jira.
The actual change can fall (as proposed by Jay) in two directions: Put
in synchronization
in all implementations OR take it out of all implementations.

I think the first thing to determine is why the synchronization was put
into the  LineRecordWriter in the first place.

https://github.com/apache/hadoop-common/blame/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/TextOutputFormat.java

The oldest I have been able to find is a commit on 2009-05-18 for
HADOOP-4687 that is about moving stuff around (i.e. this code is even older
than that).

Niels



On Thu, Aug 8, 2013 at 2:21 PM, Sathwik B P <sa...@apache.org> wrote:

> Hi Harsh,
>
> Do you want me to raise a Jira on this.
>
> regards,
> sathwik
>
>
> On Thu, Aug 8, 2013 at 5:23 PM, Jay Vyas <ja...@gmail.com> wrote:
>
>> Then is this a bug?  Synchronization in absence of any race condition is
>> normally considered "bad".
>>
>> In any case id like to know why this writer is synchronized whereas the
>> other one are not.. That is, I think, then point at issue: either other
>> writers should be synchronized or else this one shouldn't be - consistency
>> across the write implementations is probably desirable so that changes to
>> output formats or record writers don't lead to bugs in multithreaded
>> environments .
>>
>> On Aug 8, 2013, at 6:50 AM, Harsh J <ha...@cloudera.com> wrote:
>>
>> While we don't fork by default, we do provide a MultithreadedMapper
>> implementation that would require such synchronization. But if you are
>> asking is it necessary, then perhaps the answer is no.
>> On Aug 8, 2013 3:43 PM, "Azuryy Yu" <az...@gmail.com> wrote:
>>
>>> its not hadoop forked threads, we may create a line record writer, then
>>> call this writer concurrently.
>>> On Aug 8, 2013 4:00 PM, "Sathwik B P" <sa...@gmail.com> wrote:
>>>
>>>> Hi,
>>>> Thanks for your reply.
>>>> May I know where does hadoop fork multiple threads to use a single
>>>> RecordWriter.
>>>>
>>>> regards,
>>>> sathwik
>>>>
>>>> On Thu, Aug 8, 2013 at 7:06 AM, Azuryy Yu <az...@gmail.com> wrote:
>>>>
>>>>> because we may use multi-threads to write a single file.
>>>>> On Aug 8, 2013 2:54 PM, "Sathwik B P" <sa...@apache.org> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> LineRecordWriter.write(..) is synchronized. I did not find any other
>>>>>> RecordWriter implementations define the write as synchronized.
>>>>>> Any specific reason for this.
>>>>>>
>>>>>> regards,
>>>>>> sathwik
>>>>>>
>>>>>
>>>>
>


-- 
Best regards / Met vriendelijke groeten,

Niels Basjes

Re: Why LineRecordWriter.write(..) is synchronized

Posted by Niels Basjes <Ni...@basjes.nl>.
I would say yes make this a Jira.
The actual change can fall (as proposed by Jay) in two directions: Put
in synchronization
in all implementations OR take it out of all implementations.

I think the first thing to determine is why the synchronization was put
into the  LineRecordWriter in the first place.

https://github.com/apache/hadoop-common/blame/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/TextOutputFormat.java

The oldest I have been able to find is a commit on 2009-05-18 for
HADOOP-4687 that is about moving stuff around (i.e. this code is even older
than that).

Niels



On Thu, Aug 8, 2013 at 2:21 PM, Sathwik B P <sa...@apache.org> wrote:

> Hi Harsh,
>
> Do you want me to raise a Jira on this.
>
> regards,
> sathwik
>
>
> On Thu, Aug 8, 2013 at 5:23 PM, Jay Vyas <ja...@gmail.com> wrote:
>
>> Then is this a bug?  Synchronization in absence of any race condition is
>> normally considered "bad".
>>
>> In any case id like to know why this writer is synchronized whereas the
>> other one are not.. That is, I think, then point at issue: either other
>> writers should be synchronized or else this one shouldn't be - consistency
>> across the write implementations is probably desirable so that changes to
>> output formats or record writers don't lead to bugs in multithreaded
>> environments .
>>
>> On Aug 8, 2013, at 6:50 AM, Harsh J <ha...@cloudera.com> wrote:
>>
>> While we don't fork by default, we do provide a MultithreadedMapper
>> implementation that would require such synchronization. But if you are
>> asking is it necessary, then perhaps the answer is no.
>> On Aug 8, 2013 3:43 PM, "Azuryy Yu" <az...@gmail.com> wrote:
>>
>>> its not hadoop forked threads, we may create a line record writer, then
>>> call this writer concurrently.
>>> On Aug 8, 2013 4:00 PM, "Sathwik B P" <sa...@gmail.com> wrote:
>>>
>>>> Hi,
>>>> Thanks for your reply.
>>>> May I know where does hadoop fork multiple threads to use a single
>>>> RecordWriter.
>>>>
>>>> regards,
>>>> sathwik
>>>>
>>>> On Thu, Aug 8, 2013 at 7:06 AM, Azuryy Yu <az...@gmail.com> wrote:
>>>>
>>>>> because we may use multi-threads to write a single file.
>>>>> On Aug 8, 2013 2:54 PM, "Sathwik B P" <sa...@apache.org> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> LineRecordWriter.write(..) is synchronized. I did not find any other
>>>>>> RecordWriter implementations define the write as synchronized.
>>>>>> Any specific reason for this.
>>>>>>
>>>>>> regards,
>>>>>> sathwik
>>>>>>
>>>>>
>>>>
>


-- 
Best regards / Met vriendelijke groeten,

Niels Basjes

Re: Why LineRecordWriter.write(..) is synchronized

Posted by Niels Basjes <Ni...@basjes.nl>.
I would say yes make this a Jira.
The actual change can fall (as proposed by Jay) in two directions: Put
in synchronization
in all implementations OR take it out of all implementations.

I think the first thing to determine is why the synchronization was put
into the  LineRecordWriter in the first place.

https://github.com/apache/hadoop-common/blame/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/TextOutputFormat.java

The oldest I have been able to find is a commit on 2009-05-18 for
HADOOP-4687 that is about moving stuff around (i.e. this code is even older
than that).

Niels



On Thu, Aug 8, 2013 at 2:21 PM, Sathwik B P <sa...@apache.org> wrote:

> Hi Harsh,
>
> Do you want me to raise a Jira on this.
>
> regards,
> sathwik
>
>
> On Thu, Aug 8, 2013 at 5:23 PM, Jay Vyas <ja...@gmail.com> wrote:
>
>> Then is this a bug?  Synchronization in absence of any race condition is
>> normally considered "bad".
>>
>> In any case id like to know why this writer is synchronized whereas the
>> other one are not.. That is, I think, then point at issue: either other
>> writers should be synchronized or else this one shouldn't be - consistency
>> across the write implementations is probably desirable so that changes to
>> output formats or record writers don't lead to bugs in multithreaded
>> environments .
>>
>> On Aug 8, 2013, at 6:50 AM, Harsh J <ha...@cloudera.com> wrote:
>>
>> While we don't fork by default, we do provide a MultithreadedMapper
>> implementation that would require such synchronization. But if you are
>> asking is it necessary, then perhaps the answer is no.
>> On Aug 8, 2013 3:43 PM, "Azuryy Yu" <az...@gmail.com> wrote:
>>
>>> its not hadoop forked threads, we may create a line record writer, then
>>> call this writer concurrently.
>>> On Aug 8, 2013 4:00 PM, "Sathwik B P" <sa...@gmail.com> wrote:
>>>
>>>> Hi,
>>>> Thanks for your reply.
>>>> May I know where does hadoop fork multiple threads to use a single
>>>> RecordWriter.
>>>>
>>>> regards,
>>>> sathwik
>>>>
>>>> On Thu, Aug 8, 2013 at 7:06 AM, Azuryy Yu <az...@gmail.com> wrote:
>>>>
>>>>> because we may use multi-threads to write a single file.
>>>>> On Aug 8, 2013 2:54 PM, "Sathwik B P" <sa...@apache.org> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> LineRecordWriter.write(..) is synchronized. I did not find any other
>>>>>> RecordWriter implementations define the write as synchronized.
>>>>>> Any specific reason for this.
>>>>>>
>>>>>> regards,
>>>>>> sathwik
>>>>>>
>>>>>
>>>>
>


-- 
Best regards / Met vriendelijke groeten,

Niels Basjes

Re: Why LineRecordWriter.write(..) is synchronized

Posted by Sathwik B P <sa...@apache.org>.
Hi Harsh,

Do you want me to raise a Jira on this.

regards,
sathwik

On Thu, Aug 8, 2013 at 5:23 PM, Jay Vyas <ja...@gmail.com> wrote:

> Then is this a bug?  Synchronization in absence of any race condition is
> normally considered "bad".
>
> In any case id like to know why this writer is synchronized whereas the
> other one are not.. That is, I think, then point at issue: either other
> writers should be synchronized or else this one shouldn't be - consistency
> across the write implementations is probably desirable so that changes to
> output formats or record writers don't lead to bugs in multithreaded
> environments .
>
> On Aug 8, 2013, at 6:50 AM, Harsh J <ha...@cloudera.com> wrote:
>
> While we don't fork by default, we do provide a MultithreadedMapper
> implementation that would require such synchronization. But if you are
> asking is it necessary, then perhaps the answer is no.
> On Aug 8, 2013 3:43 PM, "Azuryy Yu" <az...@gmail.com> wrote:
>
>> its not hadoop forked threads, we may create a line record writer, then
>> call this writer concurrently.
>> On Aug 8, 2013 4:00 PM, "Sathwik B P" <sa...@gmail.com> wrote:
>>
>>> Hi,
>>> Thanks for your reply.
>>> May I know where does hadoop fork multiple threads to use a single
>>> RecordWriter.
>>>
>>> regards,
>>> sathwik
>>>
>>> On Thu, Aug 8, 2013 at 7:06 AM, Azuryy Yu <az...@gmail.com> wrote:
>>>
>>>> because we may use multi-threads to write a single file.
>>>> On Aug 8, 2013 2:54 PM, "Sathwik B P" <sa...@apache.org> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> LineRecordWriter.write(..) is synchronized. I did not find any other
>>>>> RecordWriter implementations define the write as synchronized.
>>>>> Any specific reason for this.
>>>>>
>>>>> regards,
>>>>> sathwik
>>>>>
>>>>
>>>

Re: Why LineRecordWriter.write(..) is synchronized

Posted by Azuryy Yu <az...@gmail.com>.
sequence writer is also synchronized, I dont think this is bad.

if you call HDFS api to write concurrently, then its necessary.
On Aug 8, 2013 7:53 PM, "Jay Vyas" <ja...@gmail.com> wrote:

> Then is this a bug?  Synchronization in absence of any race condition is
> normally considered "bad".
>
> In any case id like to know why this writer is synchronized whereas the
> other one are not.. That is, I think, then point at issue: either other
> writers should be synchronized or else this one shouldn't be - consistency
> across the write implementations is probably desirable so that changes to
> output formats or record writers don't lead to bugs in multithreaded
> environments .
>
> On Aug 8, 2013, at 6:50 AM, Harsh J <ha...@cloudera.com> wrote:
>
> While we don't fork by default, we do provide a MultithreadedMapper
> implementation that would require such synchronization. But if you are
> asking is it necessary, then perhaps the answer is no.
> On Aug 8, 2013 3:43 PM, "Azuryy Yu" <az...@gmail.com> wrote:
>
>> its not hadoop forked threads, we may create a line record writer, then
>> call this writer concurrently.
>> On Aug 8, 2013 4:00 PM, "Sathwik B P" <sa...@gmail.com> wrote:
>>
>>> Hi,
>>> Thanks for your reply.
>>> May I know where does hadoop fork multiple threads to use a single
>>> RecordWriter.
>>>
>>> regards,
>>> sathwik
>>>
>>> On Thu, Aug 8, 2013 at 7:06 AM, Azuryy Yu <az...@gmail.com> wrote:
>>>
>>>> because we may use multi-threads to write a single file.
>>>> On Aug 8, 2013 2:54 PM, "Sathwik B P" <sa...@apache.org> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> LineRecordWriter.write(..) is synchronized. I did not find any other
>>>>> RecordWriter implementations define the write as synchronized.
>>>>> Any specific reason for this.
>>>>>
>>>>> regards,
>>>>> sathwik
>>>>>
>>>>
>>>

Re: Why LineRecordWriter.write(..) is synchronized

Posted by Azuryy Yu <az...@gmail.com>.
sequence writer is also synchronized, I dont think this is bad.

if you call HDFS api to write concurrently, then its necessary.
On Aug 8, 2013 7:53 PM, "Jay Vyas" <ja...@gmail.com> wrote:

> Then is this a bug?  Synchronization in absence of any race condition is
> normally considered "bad".
>
> In any case id like to know why this writer is synchronized whereas the
> other one are not.. That is, I think, then point at issue: either other
> writers should be synchronized or else this one shouldn't be - consistency
> across the write implementations is probably desirable so that changes to
> output formats or record writers don't lead to bugs in multithreaded
> environments .
>
> On Aug 8, 2013, at 6:50 AM, Harsh J <ha...@cloudera.com> wrote:
>
> While we don't fork by default, we do provide a MultithreadedMapper
> implementation that would require such synchronization. But if you are
> asking is it necessary, then perhaps the answer is no.
> On Aug 8, 2013 3:43 PM, "Azuryy Yu" <az...@gmail.com> wrote:
>
>> its not hadoop forked threads, we may create a line record writer, then
>> call this writer concurrently.
>> On Aug 8, 2013 4:00 PM, "Sathwik B P" <sa...@gmail.com> wrote:
>>
>>> Hi,
>>> Thanks for your reply.
>>> May I know where does hadoop fork multiple threads to use a single
>>> RecordWriter.
>>>
>>> regards,
>>> sathwik
>>>
>>> On Thu, Aug 8, 2013 at 7:06 AM, Azuryy Yu <az...@gmail.com> wrote:
>>>
>>>> because we may use multi-threads to write a single file.
>>>> On Aug 8, 2013 2:54 PM, "Sathwik B P" <sa...@apache.org> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> LineRecordWriter.write(..) is synchronized. I did not find any other
>>>>> RecordWriter implementations define the write as synchronized.
>>>>> Any specific reason for this.
>>>>>
>>>>> regards,
>>>>> sathwik
>>>>>
>>>>
>>>

Re: Why LineRecordWriter.write(..) is synchronized

Posted by Sathwik B P <sa...@apache.org>.
Hi Harsh,

Do you want me to raise a Jira on this.

regards,
sathwik

On Thu, Aug 8, 2013 at 5:23 PM, Jay Vyas <ja...@gmail.com> wrote:

> Then is this a bug?  Synchronization in absence of any race condition is
> normally considered "bad".
>
> In any case id like to know why this writer is synchronized whereas the
> other one are not.. That is, I think, then point at issue: either other
> writers should be synchronized or else this one shouldn't be - consistency
> across the write implementations is probably desirable so that changes to
> output formats or record writers don't lead to bugs in multithreaded
> environments .
>
> On Aug 8, 2013, at 6:50 AM, Harsh J <ha...@cloudera.com> wrote:
>
> While we don't fork by default, we do provide a MultithreadedMapper
> implementation that would require such synchronization. But if you are
> asking is it necessary, then perhaps the answer is no.
> On Aug 8, 2013 3:43 PM, "Azuryy Yu" <az...@gmail.com> wrote:
>
>> its not hadoop forked threads, we may create a line record writer, then
>> call this writer concurrently.
>> On Aug 8, 2013 4:00 PM, "Sathwik B P" <sa...@gmail.com> wrote:
>>
>>> Hi,
>>> Thanks for your reply.
>>> May I know where does hadoop fork multiple threads to use a single
>>> RecordWriter.
>>>
>>> regards,
>>> sathwik
>>>
>>> On Thu, Aug 8, 2013 at 7:06 AM, Azuryy Yu <az...@gmail.com> wrote:
>>>
>>>> because we may use multi-threads to write a single file.
>>>> On Aug 8, 2013 2:54 PM, "Sathwik B P" <sa...@apache.org> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> LineRecordWriter.write(..) is synchronized. I did not find any other
>>>>> RecordWriter implementations define the write as synchronized.
>>>>> Any specific reason for this.
>>>>>
>>>>> regards,
>>>>> sathwik
>>>>>
>>>>
>>>

Re: Why LineRecordWriter.write(..) is synchronized

Posted by Sathwik B P <sa...@apache.org>.
Hi Harsh,

Do you want me to raise a Jira on this.

regards,
sathwik

On Thu, Aug 8, 2013 at 5:23 PM, Jay Vyas <ja...@gmail.com> wrote:

> Then is this a bug?  Synchronization in absence of any race condition is
> normally considered "bad".
>
> In any case id like to know why this writer is synchronized whereas the
> other one are not.. That is, I think, then point at issue: either other
> writers should be synchronized or else this one shouldn't be - consistency
> across the write implementations is probably desirable so that changes to
> output formats or record writers don't lead to bugs in multithreaded
> environments .
>
> On Aug 8, 2013, at 6:50 AM, Harsh J <ha...@cloudera.com> wrote:
>
> While we don't fork by default, we do provide a MultithreadedMapper
> implementation that would require such synchronization. But if you are
> asking is it necessary, then perhaps the answer is no.
> On Aug 8, 2013 3:43 PM, "Azuryy Yu" <az...@gmail.com> wrote:
>
>> its not hadoop forked threads, we may create a line record writer, then
>> call this writer concurrently.
>> On Aug 8, 2013 4:00 PM, "Sathwik B P" <sa...@gmail.com> wrote:
>>
>>> Hi,
>>> Thanks for your reply.
>>> May I know where does hadoop fork multiple threads to use a single
>>> RecordWriter.
>>>
>>> regards,
>>> sathwik
>>>
>>> On Thu, Aug 8, 2013 at 7:06 AM, Azuryy Yu <az...@gmail.com> wrote:
>>>
>>>> because we may use multi-threads to write a single file.
>>>> On Aug 8, 2013 2:54 PM, "Sathwik B P" <sa...@apache.org> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> LineRecordWriter.write(..) is synchronized. I did not find any other
>>>>> RecordWriter implementations define the write as synchronized.
>>>>> Any specific reason for this.
>>>>>
>>>>> regards,
>>>>> sathwik
>>>>>
>>>>
>>>

Re: Why LineRecordWriter.write(..) is synchronized

Posted by Sathwik B P <sa...@apache.org>.
Hi Harsh,

Do you want me to raise a Jira on this.

regards,
sathwik

On Thu, Aug 8, 2013 at 5:23 PM, Jay Vyas <ja...@gmail.com> wrote:

> Then is this a bug?  Synchronization in absence of any race condition is
> normally considered "bad".
>
> In any case id like to know why this writer is synchronized whereas the
> other one are not.. That is, I think, then point at issue: either other
> writers should be synchronized or else this one shouldn't be - consistency
> across the write implementations is probably desirable so that changes to
> output formats or record writers don't lead to bugs in multithreaded
> environments .
>
> On Aug 8, 2013, at 6:50 AM, Harsh J <ha...@cloudera.com> wrote:
>
> While we don't fork by default, we do provide a MultithreadedMapper
> implementation that would require such synchronization. But if you are
> asking is it necessary, then perhaps the answer is no.
> On Aug 8, 2013 3:43 PM, "Azuryy Yu" <az...@gmail.com> wrote:
>
>> its not hadoop forked threads, we may create a line record writer, then
>> call this writer concurrently.
>> On Aug 8, 2013 4:00 PM, "Sathwik B P" <sa...@gmail.com> wrote:
>>
>>> Hi,
>>> Thanks for your reply.
>>> May I know where does hadoop fork multiple threads to use a single
>>> RecordWriter.
>>>
>>> regards,
>>> sathwik
>>>
>>> On Thu, Aug 8, 2013 at 7:06 AM, Azuryy Yu <az...@gmail.com> wrote:
>>>
>>>> because we may use multi-threads to write a single file.
>>>> On Aug 8, 2013 2:54 PM, "Sathwik B P" <sa...@apache.org> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> LineRecordWriter.write(..) is synchronized. I did not find any other
>>>>> RecordWriter implementations define the write as synchronized.
>>>>> Any specific reason for this.
>>>>>
>>>>> regards,
>>>>> sathwik
>>>>>
>>>>
>>>

Re: Why LineRecordWriter.write(..) is synchronized

Posted by Jay Vyas <ja...@gmail.com>.
Then is this a bug?  Synchronization in absence of any race condition is normally considered "bad".

In any case id like to know why this writer is synchronized whereas the other one are not.. That is, I think, then point at issue: either other writers should be synchronized or else this one shouldn't be - consistency across the write implementations is probably desirable so that changes to output formats or record writers don't lead to bugs in multithreaded environments .

On Aug 8, 2013, at 6:50 AM, Harsh J <ha...@cloudera.com> wrote:

> While we don't fork by default, we do provide a MultithreadedMapper implementation that would require such synchronization. But if you are asking is it necessary, then perhaps the answer is no.
> 
> On Aug 8, 2013 3:43 PM, "Azuryy Yu" <az...@gmail.com> wrote:
>> its not hadoop forked threads, we may create a line record writer, then call this writer concurrently.
>> 
>> On Aug 8, 2013 4:00 PM, "Sathwik B P" <sa...@gmail.com> wrote:
>>> Hi,
>>> Thanks for your reply.
>>> May I know where does hadoop fork multiple threads to use a single RecordWriter.
>>> 
>>> regards,
>>> sathwik
>>> 
>>> On Thu, Aug 8, 2013 at 7:06 AM, Azuryy Yu <az...@gmail.com> wrote:
>>>> because we may use multi-threads to write a single file.
>>>> 
>>>> On Aug 8, 2013 2:54 PM, "Sathwik B P" <sa...@apache.org> wrote:
>>>>> Hi,
>>>>> 
>>>>> LineRecordWriter.write(..) is synchronized. I did not find any other RecordWriter implementations define the write as synchronized.
>>>>> Any specific reason for this.
>>>>> 
>>>>> regards,
>>>>> sathwik

Re: Why LineRecordWriter.write(..) is synchronized

Posted by Niels Basjes <Ni...@basjes.nl>.
I may be nitpicking here but if "perhaps the answer is no" then I conclude:
Perhaps the other implementations of RecordWriter are a race condition/file
corruption ready to occur.


On Thu, Aug 8, 2013 at 12:50 PM, Harsh J <ha...@cloudera.com> wrote:

> While we don't fork by default, we do provide a MultithreadedMapper
> implementation that would require such synchronization. But if you are
> asking is it necessary, then perhaps the answer is no.
> On Aug 8, 2013 3:43 PM, "Azuryy Yu" <az...@gmail.com> wrote:
>
>> its not hadoop forked threads, we may create a line record writer, then
>> call this writer concurrently.
>> On Aug 8, 2013 4:00 PM, "Sathwik B P" <sa...@gmail.com> wrote:
>>
>>> Hi,
>>> Thanks for your reply.
>>> May I know where does hadoop fork multiple threads to use a single
>>> RecordWriter.
>>>
>>> regards,
>>> sathwik
>>>
>>> On Thu, Aug 8, 2013 at 7:06 AM, Azuryy Yu <az...@gmail.com> wrote:
>>>
>>>> because we may use multi-threads to write a single file.
>>>> On Aug 8, 2013 2:54 PM, "Sathwik B P" <sa...@apache.org> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> LineRecordWriter.write(..) is synchronized. I did not find any other
>>>>> RecordWriter implementations define the write as synchronized.
>>>>> Any specific reason for this.
>>>>>
>>>>> regards,
>>>>> sathwik
>>>>>
>>>>
>>>


-- 
Best regards / Met vriendelijke groeten,

Niels Basjes

Re: Why LineRecordWriter.write(..) is synchronized

Posted by Jay Vyas <ja...@gmail.com>.
Then is this a bug?  Synchronization in absence of any race condition is normally considered "bad".

In any case id like to know why this writer is synchronized whereas the other one are not.. That is, I think, then point at issue: either other writers should be synchronized or else this one shouldn't be - consistency across the write implementations is probably desirable so that changes to output formats or record writers don't lead to bugs in multithreaded environments .

On Aug 8, 2013, at 6:50 AM, Harsh J <ha...@cloudera.com> wrote:

> While we don't fork by default, we do provide a MultithreadedMapper implementation that would require such synchronization. But if you are asking is it necessary, then perhaps the answer is no.
> 
> On Aug 8, 2013 3:43 PM, "Azuryy Yu" <az...@gmail.com> wrote:
>> its not hadoop forked threads, we may create a line record writer, then call this writer concurrently.
>> 
>> On Aug 8, 2013 4:00 PM, "Sathwik B P" <sa...@gmail.com> wrote:
>>> Hi,
>>> Thanks for your reply.
>>> May I know where does hadoop fork multiple threads to use a single RecordWriter.
>>> 
>>> regards,
>>> sathwik
>>> 
>>> On Thu, Aug 8, 2013 at 7:06 AM, Azuryy Yu <az...@gmail.com> wrote:
>>>> because we may use multi-threads to write a single file.
>>>> 
>>>> On Aug 8, 2013 2:54 PM, "Sathwik B P" <sa...@apache.org> wrote:
>>>>> Hi,
>>>>> 
>>>>> LineRecordWriter.write(..) is synchronized. I did not find any other RecordWriter implementations define the write as synchronized.
>>>>> Any specific reason for this.
>>>>> 
>>>>> regards,
>>>>> sathwik

Re: Why LineRecordWriter.write(..) is synchronized

Posted by Jay Vyas <ja...@gmail.com>.
Then is this a bug?  Synchronization in absence of any race condition is normally considered "bad".

In any case id like to know why this writer is synchronized whereas the other one are not.. That is, I think, then point at issue: either other writers should be synchronized or else this one shouldn't be - consistency across the write implementations is probably desirable so that changes to output formats or record writers don't lead to bugs in multithreaded environments .

On Aug 8, 2013, at 6:50 AM, Harsh J <ha...@cloudera.com> wrote:

> While we don't fork by default, we do provide a MultithreadedMapper implementation that would require such synchronization. But if you are asking is it necessary, then perhaps the answer is no.
> 
> On Aug 8, 2013 3:43 PM, "Azuryy Yu" <az...@gmail.com> wrote:
>> its not hadoop forked threads, we may create a line record writer, then call this writer concurrently.
>> 
>> On Aug 8, 2013 4:00 PM, "Sathwik B P" <sa...@gmail.com> wrote:
>>> Hi,
>>> Thanks for your reply.
>>> May I know where does hadoop fork multiple threads to use a single RecordWriter.
>>> 
>>> regards,
>>> sathwik
>>> 
>>> On Thu, Aug 8, 2013 at 7:06 AM, Azuryy Yu <az...@gmail.com> wrote:
>>>> because we may use multi-threads to write a single file.
>>>> 
>>>> On Aug 8, 2013 2:54 PM, "Sathwik B P" <sa...@apache.org> wrote:
>>>>> Hi,
>>>>> 
>>>>> LineRecordWriter.write(..) is synchronized. I did not find any other RecordWriter implementations define the write as synchronized.
>>>>> Any specific reason for this.
>>>>> 
>>>>> regards,
>>>>> sathwik

Re: Why LineRecordWriter.write(..) is synchronized

Posted by Niels Basjes <Ni...@basjes.nl>.
I may be nitpicking here but if "perhaps the answer is no" then I conclude:
Perhaps the other implementations of RecordWriter are a race condition/file
corruption ready to occur.


On Thu, Aug 8, 2013 at 12:50 PM, Harsh J <ha...@cloudera.com> wrote:

> While we don't fork by default, we do provide a MultithreadedMapper
> implementation that would require such synchronization. But if you are
> asking is it necessary, then perhaps the answer is no.
> On Aug 8, 2013 3:43 PM, "Azuryy Yu" <az...@gmail.com> wrote:
>
>> its not hadoop forked threads, we may create a line record writer, then
>> call this writer concurrently.
>> On Aug 8, 2013 4:00 PM, "Sathwik B P" <sa...@gmail.com> wrote:
>>
>>> Hi,
>>> Thanks for your reply.
>>> May I know where does hadoop fork multiple threads to use a single
>>> RecordWriter.
>>>
>>> regards,
>>> sathwik
>>>
>>> On Thu, Aug 8, 2013 at 7:06 AM, Azuryy Yu <az...@gmail.com> wrote:
>>>
>>>> because we may use multi-threads to write a single file.
>>>> On Aug 8, 2013 2:54 PM, "Sathwik B P" <sa...@apache.org> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> LineRecordWriter.write(..) is synchronized. I did not find any other
>>>>> RecordWriter implementations define the write as synchronized.
>>>>> Any specific reason for this.
>>>>>
>>>>> regards,
>>>>> sathwik
>>>>>
>>>>
>>>


-- 
Best regards / Met vriendelijke groeten,

Niels Basjes

Re: Why LineRecordWriter.write(..) is synchronized

Posted by Niels Basjes <Ni...@basjes.nl>.
I may be nitpicking here but if "perhaps the answer is no" then I conclude:
Perhaps the other implementations of RecordWriter are a race condition/file
corruption ready to occur.


On Thu, Aug 8, 2013 at 12:50 PM, Harsh J <ha...@cloudera.com> wrote:

> While we don't fork by default, we do provide a MultithreadedMapper
> implementation that would require such synchronization. But if you are
> asking is it necessary, then perhaps the answer is no.
> On Aug 8, 2013 3:43 PM, "Azuryy Yu" <az...@gmail.com> wrote:
>
>> its not hadoop forked threads, we may create a line record writer, then
>> call this writer concurrently.
>> On Aug 8, 2013 4:00 PM, "Sathwik B P" <sa...@gmail.com> wrote:
>>
>>> Hi,
>>> Thanks for your reply.
>>> May I know where does hadoop fork multiple threads to use a single
>>> RecordWriter.
>>>
>>> regards,
>>> sathwik
>>>
>>> On Thu, Aug 8, 2013 at 7:06 AM, Azuryy Yu <az...@gmail.com> wrote:
>>>
>>>> because we may use multi-threads to write a single file.
>>>> On Aug 8, 2013 2:54 PM, "Sathwik B P" <sa...@apache.org> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> LineRecordWriter.write(..) is synchronized. I did not find any other
>>>>> RecordWriter implementations define the write as synchronized.
>>>>> Any specific reason for this.
>>>>>
>>>>> regards,
>>>>> sathwik
>>>>>
>>>>
>>>


-- 
Best regards / Met vriendelijke groeten,

Niels Basjes

Re: Why LineRecordWriter.write(..) is synchronized

Posted by Jay Vyas <ja...@gmail.com>.
Then is this a bug?  Synchronization in absence of any race condition is normally considered "bad".

In any case id like to know why this writer is synchronized whereas the other one are not.. That is, I think, then point at issue: either other writers should be synchronized or else this one shouldn't be - consistency across the write implementations is probably desirable so that changes to output formats or record writers don't lead to bugs in multithreaded environments .

On Aug 8, 2013, at 6:50 AM, Harsh J <ha...@cloudera.com> wrote:

> While we don't fork by default, we do provide a MultithreadedMapper implementation that would require such synchronization. But if you are asking is it necessary, then perhaps the answer is no.
> 
> On Aug 8, 2013 3:43 PM, "Azuryy Yu" <az...@gmail.com> wrote:
>> its not hadoop forked threads, we may create a line record writer, then call this writer concurrently.
>> 
>> On Aug 8, 2013 4:00 PM, "Sathwik B P" <sa...@gmail.com> wrote:
>>> Hi,
>>> Thanks for your reply.
>>> May I know where does hadoop fork multiple threads to use a single RecordWriter.
>>> 
>>> regards,
>>> sathwik
>>> 
>>> On Thu, Aug 8, 2013 at 7:06 AM, Azuryy Yu <az...@gmail.com> wrote:
>>>> because we may use multi-threads to write a single file.
>>>> 
>>>> On Aug 8, 2013 2:54 PM, "Sathwik B P" <sa...@apache.org> wrote:
>>>>> Hi,
>>>>> 
>>>>> LineRecordWriter.write(..) is synchronized. I did not find any other RecordWriter implementations define the write as synchronized.
>>>>> Any specific reason for this.
>>>>> 
>>>>> regards,
>>>>> sathwik

Re: Why LineRecordWriter.write(..) is synchronized

Posted by Harsh J <ha...@cloudera.com>.
While we don't fork by default, we do provide a MultithreadedMapper
implementation that would require such synchronization. But if you are
asking is it necessary, then perhaps the answer is no.
On Aug 8, 2013 3:43 PM, "Azuryy Yu" <az...@gmail.com> wrote:

> its not hadoop forked threads, we may create a line record writer, then
> call this writer concurrently.
> On Aug 8, 2013 4:00 PM, "Sathwik B P" <sa...@gmail.com> wrote:
>
>> Hi,
>> Thanks for your reply.
>> May I know where does hadoop fork multiple threads to use a single
>> RecordWriter.
>>
>> regards,
>> sathwik
>>
>> On Thu, Aug 8, 2013 at 7:06 AM, Azuryy Yu <az...@gmail.com> wrote:
>>
>>> because we may use multi-threads to write a single file.
>>> On Aug 8, 2013 2:54 PM, "Sathwik B P" <sa...@apache.org> wrote:
>>>
>>>> Hi,
>>>>
>>>> LineRecordWriter.write(..) is synchronized. I did not find any other
>>>> RecordWriter implementations define the write as synchronized.
>>>> Any specific reason for this.
>>>>
>>>> regards,
>>>> sathwik
>>>>
>>>
>>

Re: Why LineRecordWriter.write(..) is synchronized

Posted by Harsh J <ha...@cloudera.com>.
While we don't fork by default, we do provide a MultithreadedMapper
implementation that would require such synchronization. But if you are
asking is it necessary, then perhaps the answer is no.
On Aug 8, 2013 3:43 PM, "Azuryy Yu" <az...@gmail.com> wrote:

> its not hadoop forked threads, we may create a line record writer, then
> call this writer concurrently.
> On Aug 8, 2013 4:00 PM, "Sathwik B P" <sa...@gmail.com> wrote:
>
>> Hi,
>> Thanks for your reply.
>> May I know where does hadoop fork multiple threads to use a single
>> RecordWriter.
>>
>> regards,
>> sathwik
>>
>> On Thu, Aug 8, 2013 at 7:06 AM, Azuryy Yu <az...@gmail.com> wrote:
>>
>>> because we may use multi-threads to write a single file.
>>> On Aug 8, 2013 2:54 PM, "Sathwik B P" <sa...@apache.org> wrote:
>>>
>>>> Hi,
>>>>
>>>> LineRecordWriter.write(..) is synchronized. I did not find any other
>>>> RecordWriter implementations define the write as synchronized.
>>>> Any specific reason for this.
>>>>
>>>> regards,
>>>> sathwik
>>>>
>>>
>>

Re: Why LineRecordWriter.write(..) is synchronized

Posted by Harsh J <ha...@cloudera.com>.
While we don't fork by default, we do provide a MultithreadedMapper
implementation that would require such synchronization. But if you are
asking is it necessary, then perhaps the answer is no.
On Aug 8, 2013 3:43 PM, "Azuryy Yu" <az...@gmail.com> wrote:

> its not hadoop forked threads, we may create a line record writer, then
> call this writer concurrently.
> On Aug 8, 2013 4:00 PM, "Sathwik B P" <sa...@gmail.com> wrote:
>
>> Hi,
>> Thanks for your reply.
>> May I know where does hadoop fork multiple threads to use a single
>> RecordWriter.
>>
>> regards,
>> sathwik
>>
>> On Thu, Aug 8, 2013 at 7:06 AM, Azuryy Yu <az...@gmail.com> wrote:
>>
>>> because we may use multi-threads to write a single file.
>>> On Aug 8, 2013 2:54 PM, "Sathwik B P" <sa...@apache.org> wrote:
>>>
>>>> Hi,
>>>>
>>>> LineRecordWriter.write(..) is synchronized. I did not find any other
>>>> RecordWriter implementations define the write as synchronized.
>>>> Any specific reason for this.
>>>>
>>>> regards,
>>>> sathwik
>>>>
>>>
>>

Re: Why LineRecordWriter.write(..) is synchronized

Posted by Harsh J <ha...@cloudera.com>.
While we don't fork by default, we do provide a MultithreadedMapper
implementation that would require such synchronization. But if you are
asking is it necessary, then perhaps the answer is no.
On Aug 8, 2013 3:43 PM, "Azuryy Yu" <az...@gmail.com> wrote:

> its not hadoop forked threads, we may create a line record writer, then
> call this writer concurrently.
> On Aug 8, 2013 4:00 PM, "Sathwik B P" <sa...@gmail.com> wrote:
>
>> Hi,
>> Thanks for your reply.
>> May I know where does hadoop fork multiple threads to use a single
>> RecordWriter.
>>
>> regards,
>> sathwik
>>
>> On Thu, Aug 8, 2013 at 7:06 AM, Azuryy Yu <az...@gmail.com> wrote:
>>
>>> because we may use multi-threads to write a single file.
>>> On Aug 8, 2013 2:54 PM, "Sathwik B P" <sa...@apache.org> wrote:
>>>
>>>> Hi,
>>>>
>>>> LineRecordWriter.write(..) is synchronized. I did not find any other
>>>> RecordWriter implementations define the write as synchronized.
>>>> Any specific reason for this.
>>>>
>>>> regards,
>>>> sathwik
>>>>
>>>
>>

Re: Why LineRecordWriter.write(..) is synchronized

Posted by Azuryy Yu <az...@gmail.com>.
its not hadoop forked threads, we may create a line record writer, then
call this writer concurrently.
On Aug 8, 2013 4:00 PM, "Sathwik B P" <sa...@gmail.com> wrote:

> Hi,
> Thanks for your reply.
> May I know where does hadoop fork multiple threads to use a single
> RecordWriter.
>
> regards,
> sathwik
>
> On Thu, Aug 8, 2013 at 7:06 AM, Azuryy Yu <az...@gmail.com> wrote:
>
>> because we may use multi-threads to write a single file.
>> On Aug 8, 2013 2:54 PM, "Sathwik B P" <sa...@apache.org> wrote:
>>
>>> Hi,
>>>
>>> LineRecordWriter.write(..) is synchronized. I did not find any other
>>> RecordWriter implementations define the write as synchronized.
>>> Any specific reason for this.
>>>
>>> regards,
>>> sathwik
>>>
>>
>

Re: Why LineRecordWriter.write(..) is synchronized

Posted by Azuryy Yu <az...@gmail.com>.
its not hadoop forked threads, we may create a line record writer, then
call this writer concurrently.
On Aug 8, 2013 4:00 PM, "Sathwik B P" <sa...@gmail.com> wrote:

> Hi,
> Thanks for your reply.
> May I know where does hadoop fork multiple threads to use a single
> RecordWriter.
>
> regards,
> sathwik
>
> On Thu, Aug 8, 2013 at 7:06 AM, Azuryy Yu <az...@gmail.com> wrote:
>
>> because we may use multi-threads to write a single file.
>> On Aug 8, 2013 2:54 PM, "Sathwik B P" <sa...@apache.org> wrote:
>>
>>> Hi,
>>>
>>> LineRecordWriter.write(..) is synchronized. I did not find any other
>>> RecordWriter implementations define the write as synchronized.
>>> Any specific reason for this.
>>>
>>> regards,
>>> sathwik
>>>
>>
>

Re: Why LineRecordWriter.write(..) is synchronized

Posted by Azuryy Yu <az...@gmail.com>.
its not hadoop forked threads, we may create a line record writer, then
call this writer concurrently.
On Aug 8, 2013 4:00 PM, "Sathwik B P" <sa...@gmail.com> wrote:

> Hi,
> Thanks for your reply.
> May I know where does hadoop fork multiple threads to use a single
> RecordWriter.
>
> regards,
> sathwik
>
> On Thu, Aug 8, 2013 at 7:06 AM, Azuryy Yu <az...@gmail.com> wrote:
>
>> because we may use multi-threads to write a single file.
>> On Aug 8, 2013 2:54 PM, "Sathwik B P" <sa...@apache.org> wrote:
>>
>>> Hi,
>>>
>>> LineRecordWriter.write(..) is synchronized. I did not find any other
>>> RecordWriter implementations define the write as synchronized.
>>> Any specific reason for this.
>>>
>>> regards,
>>> sathwik
>>>
>>
>

Re: Why LineRecordWriter.write(..) is synchronized

Posted by Azuryy Yu <az...@gmail.com>.
its not hadoop forked threads, we may create a line record writer, then
call this writer concurrently.
On Aug 8, 2013 4:00 PM, "Sathwik B P" <sa...@gmail.com> wrote:

> Hi,
> Thanks for your reply.
> May I know where does hadoop fork multiple threads to use a single
> RecordWriter.
>
> regards,
> sathwik
>
> On Thu, Aug 8, 2013 at 7:06 AM, Azuryy Yu <az...@gmail.com> wrote:
>
>> because we may use multi-threads to write a single file.
>> On Aug 8, 2013 2:54 PM, "Sathwik B P" <sa...@apache.org> wrote:
>>
>>> Hi,
>>>
>>> LineRecordWriter.write(..) is synchronized. I did not find any other
>>> RecordWriter implementations define the write as synchronized.
>>> Any specific reason for this.
>>>
>>> regards,
>>> sathwik
>>>
>>
>

Re: Why LineRecordWriter.write(..) is synchronized

Posted by Sathwik B P <sa...@gmail.com>.
Hi,
Thanks for your reply.
May I know where does hadoop fork multiple threads to use a single
RecordWriter.

regards,
sathwik

On Thu, Aug 8, 2013 at 7:06 AM, Azuryy Yu <az...@gmail.com> wrote:

> because we may use multi-threads to write a single file.
> On Aug 8, 2013 2:54 PM, "Sathwik B P" <sa...@apache.org> wrote:
>
>> Hi,
>>
>> LineRecordWriter.write(..) is synchronized. I did not find any other
>> RecordWriter implementations define the write as synchronized.
>> Any specific reason for this.
>>
>> regards,
>> sathwik
>>
>

Re: Why LineRecordWriter.write(..) is synchronized

Posted by Sathwik B P <sa...@gmail.com>.
Hi,
Thanks for your reply.
May I know where does hadoop fork multiple threads to use a single
RecordWriter.

regards,
sathwik

On Thu, Aug 8, 2013 at 7:06 AM, Azuryy Yu <az...@gmail.com> wrote:

> because we may use multi-threads to write a single file.
> On Aug 8, 2013 2:54 PM, "Sathwik B P" <sa...@apache.org> wrote:
>
>> Hi,
>>
>> LineRecordWriter.write(..) is synchronized. I did not find any other
>> RecordWriter implementations define the write as synchronized.
>> Any specific reason for this.
>>
>> regards,
>> sathwik
>>
>

Re: Why LineRecordWriter.write(..) is synchronized

Posted by Sathwik B P <sa...@gmail.com>.
Hi,
Thanks for your reply.
May I know where does hadoop fork multiple threads to use a single
RecordWriter.

regards,
sathwik

On Thu, Aug 8, 2013 at 7:06 AM, Azuryy Yu <az...@gmail.com> wrote:

> because we may use multi-threads to write a single file.
> On Aug 8, 2013 2:54 PM, "Sathwik B P" <sa...@apache.org> wrote:
>
>> Hi,
>>
>> LineRecordWriter.write(..) is synchronized. I did not find any other
>> RecordWriter implementations define the write as synchronized.
>> Any specific reason for this.
>>
>> regards,
>> sathwik
>>
>

Re: Why LineRecordWriter.write(..) is synchronized

Posted by Sathwik B P <sa...@gmail.com>.
Hi,
Thanks for your reply.
May I know where does hadoop fork multiple threads to use a single
RecordWriter.

regards,
sathwik

On Thu, Aug 8, 2013 at 7:06 AM, Azuryy Yu <az...@gmail.com> wrote:

> because we may use multi-threads to write a single file.
> On Aug 8, 2013 2:54 PM, "Sathwik B P" <sa...@apache.org> wrote:
>
>> Hi,
>>
>> LineRecordWriter.write(..) is synchronized. I did not find any other
>> RecordWriter implementations define the write as synchronized.
>> Any specific reason for this.
>>
>> regards,
>> sathwik
>>
>

Re: Why LineRecordWriter.write(..) is synchronized

Posted by Azuryy Yu <az...@gmail.com>.
because we may use multi-threads to write a single file.
On Aug 8, 2013 2:54 PM, "Sathwik B P" <sa...@apache.org> wrote:

> Hi,
>
> LineRecordWriter.write(..) is synchronized. I did not find any other
> RecordWriter implementations define the write as synchronized.
> Any specific reason for this.
>
> regards,
> sathwik
>

Re: Why LineRecordWriter.write(..) is synchronized

Posted by Azuryy Yu <az...@gmail.com>.
because we may use multi-threads to write a single file.
On Aug 8, 2013 2:54 PM, "Sathwik B P" <sa...@apache.org> wrote:

> Hi,
>
> LineRecordWriter.write(..) is synchronized. I did not find any other
> RecordWriter implementations define the write as synchronized.
> Any specific reason for this.
>
> regards,
> sathwik
>

Re: Why LineRecordWriter.write(..) is synchronized

Posted by Azuryy Yu <az...@gmail.com>.
because we may use multi-threads to write a single file.
On Aug 8, 2013 2:54 PM, "Sathwik B P" <sa...@apache.org> wrote:

> Hi,
>
> LineRecordWriter.write(..) is synchronized. I did not find any other
> RecordWriter implementations define the write as synchronized.
> Any specific reason for this.
>
> regards,
> sathwik
>

Re: Why LineRecordWriter.write(..) is synchronized

Posted by Azuryy Yu <az...@gmail.com>.
because we may use multi-threads to write a single file.
On Aug 8, 2013 2:54 PM, "Sathwik B P" <sa...@apache.org> wrote:

> Hi,
>
> LineRecordWriter.write(..) is synchronized. I did not find any other
> RecordWriter implementations define the write as synchronized.
> Any specific reason for this.
>
> regards,
> sathwik
>