You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by Nishanth S <ni...@gmail.com> on 2018/01/20 01:04:52 UTC

Concurrent Building of Avro Objects

Hi All,

We have a process that reads data from a  local file share  ,serailizes and
writes to HDFS in avro format. Currently it is running as a single threaded
process. When converted t to a parallel process we did  get  some
performance improvement  but  not the desired .Thread dumps show  that at
any time only on thread  has access to  this method and others are  blocked
.I am just wondering if I am building the avro objects correctly.

"pool-6-thread-5" #53 prio=5 os_prio=0 tid=0x00007fad896c7800 nid=0x4328
waiting for monitor entry [0x00007fad52833000]
          java.lang.Thread.State: BLOCKED (on object monitor)
               at
java.util.Collections$SynchronizedMap.get(Collections.java:2584)
               - waiting to lock <0x000000066a5e3460> (a
java.util.Collections$SynchronizedMap)
               at
org.apache.avro.generic.GenericData.getDefaultValue(GenericData.java:981)
               at
org.apache.avro.data.RecordBuilderBase.defaultValue(RecordBuilderBase.java:135)


"pool-6-thread-4" #52 prio=5 os_prio=0 tid=0x00007fad896c6000 nid=0x4327
waiting for monitor entry [0x00007fad52934000]
          java.lang.Thread.State: BLOCKED (on object monitor)
               at
java.util.Collections$SynchronizedMap.get(Collections.java:2584)
               - waiting to lock <0x000000066a5e3460> (a
java.util.Collections$SynchronizedMap)
               at
org.apache.avro.generic.GenericData.getDefaultValue(GenericData.java:981)
               at
org.apache.avro.data.RecordBuilderBase.defaultValue(RecordBuilderBase.java:135)
               at
com.model.avro.SEGMENT1B$Builder.build(SEGMENT1B.java:4362)

"pool-6-thread-2" #50 prio=5 os_prio=0 tid=0x00007fad8953a800 nid=0x4325
runnable [0x00007fad52b36000]
   java.lang.Thread.State: RUNNABLE
        at java.util.Collections$SynchronizedMap.get(Collections.java:2584)
        - locked <0x000000066a5e3460> (a
java.util.Collections$SynchronizedMap)
        at
org.apache.avro.generic.GenericData.getDefaultValue(GenericData.java:981)

Re: Concurrent Building of Avro Objects

Posted by Nishanth S <ni...@gmail.com>.
We have a process that reads data from a  local file share  ,serailizes and
writes to HDFS in avro format. Currently it is running as a single threaded
process. When converted t to a parallel process we did  get  some
performance improvement  but  not the desired .Thread dumps show  that at
any time only on thread  has access to  this method and others are  blocked
.I am just wondering if I am building the avro objects correctly. For every
record that  that is read from the binary file we create an equivalent avro
object in the below format.

Parent p = new Parent();
LOGHDR hdr = LOGHDR.newBuilder().build()
MSGHDR msg = MSGHDR.newBuilder().build()
p.setHdr(hdr);
p.setMsg(msg);

Then  all fields in p and all the nested types that p holds together like
LOGHDR and MSGHDR are set  .

"pool-6-thread-5" #53 prio=5 os_prio=0 tid=0x00007fad896c7800 nid=0x4328
waiting for monitor entry [0x00007fad52833000]
          java.lang.Thread.State: BLOCKED (on object monitor)
               at java.util.Collections$SynchronizedMap.get(
Collections.java:2584)
               - waiting to lock <0x000000066a5e3460> (a
java.util.Collections$SynchronizedMap)
               at org.apache.avro.generic.GenericData.getDefaultValue(
GenericData.java:981)
               at org.apache.avro.data.RecordBuilderBase.defaultValue(
RecordBuilderBase.java:135)


"pool-6-thread-4" #52 prio=5 os_prio=0 tid=0x00007fad896c6000 nid=0x4327
waiting for monitor entry [0x00007fad52934000]
          java.lang.Thread.State: BLOCKED (on object monitor)
               at java.util.Collections$SynchronizedMap.get(
Collections.java:2584)
               - waiting to lock <0x000000066a5e3460> (a
java.util.Collections$SynchronizedMap)
               at org.apache.avro.generic.GenericData.getDefaultValue(
GenericData.java:981)
               at org.apache.avro.data.RecordBuilderBase.defaultValue(
RecordBuilderBase.java:135)
               at com.model.avro.SEGMENT1B$Builder.build(SEGMENT1B.java:
4362)

"pool-6-thread-2" #50 prio=5 os_prio=0 tid=0x00007fad8953a800 nid=0x4325
runnable [0x00007fad52b36000]
   java.lang.Thread.State: RUNNABLE
        at java.util.Collections$SynchronizedMap.get(Collections.java:2584)
        - locked <0x000000066a5e3460> (a java.util.Collections$
SynchronizedMap)
        at org.apache.avro.generic.GenericData.getDefaultValue(
GenericData.java:981)

On Fri, Jan 19, 2018 at 6:04 PM, Nishanth S <ni...@gmail.com> wrote:

> Hi All,
>
> We have a process that reads data from a  local file share  ,serailizes
> and writes to HDFS in avro format. Currently it is running as a single
> threaded process. When converted t to a parallel process we did  get  some
> performance improvement  but  not the desired .Thread dumps show  that at
> any time only on thread  has access to  this method and others are  blocked
> .I am just wondering if I am building the avro objects correctly.
>
> "pool-6-thread-5" #53 prio=5 os_prio=0 tid=0x00007fad896c7800 nid=0x4328
> waiting for monitor entry [0x00007fad52833000]
>           java.lang.Thread.State: BLOCKED (on object monitor)
>                at java.util.Collections$SynchronizedMap.get(
> Collections.java:2584)
>                - waiting to lock <0x000000066a5e3460> (a
> java.util.Collections$SynchronizedMap)
>                at org.apache.avro.generic.GenericData.getDefaultValue(
> GenericData.java:981)
>                at org.apache.avro.data.RecordBuilderBase.defaultValue(
> RecordBuilderBase.java:135)
>
>
> "pool-6-thread-4" #52 prio=5 os_prio=0 tid=0x00007fad896c6000 nid=0x4327
> waiting for monitor entry [0x00007fad52934000]
>           java.lang.Thread.State: BLOCKED (on object monitor)
>                at java.util.Collections$SynchronizedMap.get(
> Collections.java:2584)
>                - waiting to lock <0x000000066a5e3460> (a
> java.util.Collections$SynchronizedMap)
>                at org.apache.avro.generic.GenericData.getDefaultValue(
> GenericData.java:981)
>                at org.apache.avro.data.RecordBuilderBase.defaultValue(
> RecordBuilderBase.java:135)
>                at com.model.avro.SEGMENT1B$Builder.build(SEGMENT1B.java:
> 4362)
>
> "pool-6-thread-2" #50 prio=5 os_prio=0 tid=0x00007fad8953a800 nid=0x4325
> runnable [0x00007fad52b36000]
>    java.lang.Thread.State: RUNNABLE
>         at java.util.Collections$SynchronizedMap.get(
> Collections.java:2584)
>         - locked <0x000000066a5e3460> (a java.util.Collections$
> SynchronizedMap)
>         at org.apache.avro.generic.GenericData.getDefaultValue(
> GenericData.java:981)
>
>
>

Re: Concurrent Building of Avro Objects

Posted by Nishanth S <ni...@gmail.com>.
After upgrading to 1.8.2  I do not see hotspots with parallel threads.
Thanks Doug!

On Tue, Jan 23, 2018 at 8:41 PM, Nishanth S <ni...@gmail.com> wrote:

> Thanks Doug ..That sounds like it..We are using 1.7.6.I will upgrade our
> version and let every one .Thanks for jumping in.
>
> On Jan 23, 2018 5:19 PM, "Doug Cutting" <cu...@gmail.com> wrote:
>
>> This sounds like AVRO-1760, fixed since Avro 1.8.0.
>>
>> https://issues.apache.org/jira/browse/AVRO-1760
>>
>> What version of Avro are you using?
>>
>> Doug
>>
>> On Mon, Jan 22, 2018 at 9:45 AM, Nishanth S <ni...@gmail.com>
>> wrote:
>>
>>> Hi All,
>>>
>>> We have a process that reads data from a  local file share  ,serailizes
>>> and writes to HDFS in avro format. Currently it is running as a single
>>> threaded process. When converted  to a parallel process we did  get  some
>>> performance improvement  but  not the desired .Thread dumps are pasted
>>> below .I am just wondering if I am building the avro objects correctly.
>>> For every record that  that is read from the binary file we create an
>>> equivalent avro object in the below format. Our avro schema is  pretty
>>> big,  around 1800 fields and all of those have default values . After doing
>>> some profiling  I  could see that the most  time consuming method is
>>> org.apache.avro.generic.GenericData.getDefaultValue() . This is in fact
>>> taking  more time than doing the actual reads/writes. Thanks for taking a
>>> look.
>>>
>>> Parent p = new Parent();
>>> LOGHDR hdr = LOGHDR.newBuilder().build()
>>> MSGHDR msg = MSGHDR.newBuilder().build()
>>> p.setHdr(hdr);
>>> p.setMsg(msg);
>>>
>>> Then  all fields in p and all the nested types that p holds together
>>> like LOGHDR and MSGHDR are set  .
>>>
>>>
>>>
>>>
>>> "pool-6-thread-5" #53 prio=5 os_prio=0 tid=0x00007fad896c7800 nid=0x4328
>>> waiting for monitor entry [0x00007fad52833000]
>>>           java.lang.Thread.State: BLOCKED (on object monitor)
>>>                at java.util.Collections$Synchron
>>> izedMap.get(Collections.java:2584)
>>>                - waiting to lock <0x000000066a5e3460> (a
>>> java.util.Collections$SynchronizedMap)
>>>                at org.apache.avro.generic.Generi
>>> cData.getDefaultValue(GenericData.java:981)
>>>                at org.apache.avro.data.RecordBui
>>> lderBase.defaultValue(RecordBuilderBase.java:135)
>>>
>>>
>>> "pool-6-thread-4" #52 prio=5 os_prio=0 tid=0x00007fad896c6000 nid=0x4327
>>> waiting for monitor entry [0x00007fad52934000]
>>>           java.lang.Thread.State: BLOCKED (on object monitor)
>>>                at java.util.Collections$Synchron
>>> izedMap.get(Collections.java:2584)
>>>                - waiting to lock <0x000000066a5e3460> (a
>>> java.util.Collections$SynchronizedMap)
>>>                at org.apache.avro.generic.Generi
>>> cData.getDefaultValue(GenericData.java:981)
>>>                at org.apache.avro.data.RecordBui
>>> lderBase.defaultValue(RecordBuilderBase.java:135)
>>>                at com.model.avro.SEGMENT1B$Build
>>> er.build(SEGMENT1B.java:4362)
>>>
>>> "pool-6-thread-2" #50 prio=5 os_prio=0 tid=0x00007fad8953a800 nid=0x4325
>>> runnable [0x00007fad52b36000]
>>>    java.lang.Thread.State: RUNNABLE
>>>         at java.util.Collections$SynchronizedMap.get(Collections.java:2
>>> 584)
>>>         - locked <0x000000066a5e3460> (a java.util.Collections$Synchron
>>> izedMap)
>>>         at org.apache.avro.generic.GenericData.getDefaultValue(GenericD
>>> ata.java:981)
>>>
>>>
>>> On Fri, Jan 19, 2018 at 6:04 PM, Nishanth S <ni...@gmail.com>
>>> wrote:
>>>
>>>> Hi All,
>>>>
>>>> We have a process that reads data from a  local file share  ,serailizes
>>>> and writes to HDFS in avro format. Currently it is running as a single
>>>> threaded process. When converted t to a parallel process we did  get  some
>>>> performance improvement  but  not the desired .Thread dumps show  that at
>>>> any time only on thread  has access to  this method and others are  blocked
>>>> .I am just wondering if I am building the avro objects correctly.
>>>>
>>>> "pool-6-thread-5" #53 prio=5 os_prio=0 tid=0x00007fad896c7800
>>>> nid=0x4328 waiting for monitor entry [0x00007fad52833000]
>>>>           java.lang.Thread.State: BLOCKED (on object monitor)
>>>>                at java.util.Collections$Synchron
>>>> izedMap.get(Collections.java:2584)
>>>>                - waiting to lock <0x000000066a5e3460> (a
>>>> java.util.Collections$SynchronizedMap)
>>>>                at org.apache.avro.generic.Generi
>>>> cData.getDefaultValue(GenericData.java:981)
>>>>                at org.apache.avro.data.RecordBui
>>>> lderBase.defaultValue(RecordBuilderBase.java:135)
>>>>
>>>>
>>>> "pool-6-thread-4" #52 prio=5 os_prio=0 tid=0x00007fad896c6000
>>>> nid=0x4327 waiting for monitor entry [0x00007fad52934000]
>>>>           java.lang.Thread.State: BLOCKED (on object monitor)
>>>>                at java.util.Collections$Synchron
>>>> izedMap.get(Collections.java:2584)
>>>>                - waiting to lock <0x000000066a5e3460> (a
>>>> java.util.Collections$SynchronizedMap)
>>>>                at org.apache.avro.generic.Generi
>>>> cData.getDefaultValue(GenericData.java:981)
>>>>                at org.apache.avro.data.RecordBui
>>>> lderBase.defaultValue(RecordBuilderBase.java:135)
>>>>                at com.model.avro.SEGMENT1B$Build
>>>> er.build(SEGMENT1B.java:4362)
>>>>
>>>> "pool-6-thread-2" #50 prio=5 os_prio=0 tid=0x00007fad8953a800
>>>> nid=0x4325 runnable [0x00007fad52b36000]
>>>>    java.lang.Thread.State: RUNNABLE
>>>>         at java.util.Collections$SynchronizedMap.get(Collections.java:2
>>>> 584)
>>>>         - locked <0x000000066a5e3460> (a java.util.Collections$Synchron
>>>> izedMap)
>>>>         at org.apache.avro.generic.GenericData.getDefaultValue(GenericD
>>>> ata.java:981)
>>>>
>>>>
>>>>
>>>
>>

Re: Concurrent Building of Avro Objects

Posted by Nishanth S <ni...@gmail.com>.
Thanks Doug ..That sounds like it..We are using 1.7.6.I will upgrade our
version and let every one .Thanks for jumping in.

On Jan 23, 2018 5:19 PM, "Doug Cutting" <cu...@gmail.com> wrote:

> This sounds like AVRO-1760, fixed since Avro 1.8.0.
>
> https://issues.apache.org/jira/browse/AVRO-1760
>
> What version of Avro are you using?
>
> Doug
>
> On Mon, Jan 22, 2018 at 9:45 AM, Nishanth S <ni...@gmail.com>
> wrote:
>
>> Hi All,
>>
>> We have a process that reads data from a  local file share  ,serailizes
>> and writes to HDFS in avro format. Currently it is running as a single
>> threaded process. When converted  to a parallel process we did  get  some
>> performance improvement  but  not the desired .Thread dumps are pasted
>> below .I am just wondering if I am building the avro objects correctly.
>> For every record that  that is read from the binary file we create an
>> equivalent avro object in the below format. Our avro schema is  pretty
>> big,  around 1800 fields and all of those have default values . After doing
>> some profiling  I  could see that the most  time consuming method is
>> org.apache.avro.generic.GenericData.getDefaultValue() . This is in fact
>> taking  more time than doing the actual reads/writes. Thanks for taking a
>> look.
>>
>> Parent p = new Parent();
>> LOGHDR hdr = LOGHDR.newBuilder().build()
>> MSGHDR msg = MSGHDR.newBuilder().build()
>> p.setHdr(hdr);
>> p.setMsg(msg);
>>
>> Then  all fields in p and all the nested types that p holds together like
>> LOGHDR and MSGHDR are set  .
>>
>>
>>
>>
>> "pool-6-thread-5" #53 prio=5 os_prio=0 tid=0x00007fad896c7800 nid=0x4328
>> waiting for monitor entry [0x00007fad52833000]
>>           java.lang.Thread.State: BLOCKED (on object monitor)
>>                at java.util.Collections$Synchron
>> izedMap.get(Collections.java:2584)
>>                - waiting to lock <0x000000066a5e3460> (a
>> java.util.Collections$SynchronizedMap)
>>                at org.apache.avro.generic.Generi
>> cData.getDefaultValue(GenericData.java:981)
>>                at org.apache.avro.data.RecordBui
>> lderBase.defaultValue(RecordBuilderBase.java:135)
>>
>>
>> "pool-6-thread-4" #52 prio=5 os_prio=0 tid=0x00007fad896c6000 nid=0x4327
>> waiting for monitor entry [0x00007fad52934000]
>>           java.lang.Thread.State: BLOCKED (on object monitor)
>>                at java.util.Collections$Synchron
>> izedMap.get(Collections.java:2584)
>>                - waiting to lock <0x000000066a5e3460> (a
>> java.util.Collections$SynchronizedMap)
>>                at org.apache.avro.generic.Generi
>> cData.getDefaultValue(GenericData.java:981)
>>                at org.apache.avro.data.RecordBui
>> lderBase.defaultValue(RecordBuilderBase.java:135)
>>                at com.model.avro.SEGMENT1B$Build
>> er.build(SEGMENT1B.java:4362)
>>
>> "pool-6-thread-2" #50 prio=5 os_prio=0 tid=0x00007fad8953a800 nid=0x4325
>> runnable [0x00007fad52b36000]
>>    java.lang.Thread.State: RUNNABLE
>>         at java.util.Collections$SynchronizedMap.get(Collections.java:2
>> 584)
>>         - locked <0x000000066a5e3460> (a java.util.Collections$Synchron
>> izedMap)
>>         at org.apache.avro.generic.GenericData.getDefaultValue(GenericD
>> ata.java:981)
>>
>>
>> On Fri, Jan 19, 2018 at 6:04 PM, Nishanth S <ni...@gmail.com>
>> wrote:
>>
>>> Hi All,
>>>
>>> We have a process that reads data from a  local file share  ,serailizes
>>> and writes to HDFS in avro format. Currently it is running as a single
>>> threaded process. When converted t to a parallel process we did  get  some
>>> performance improvement  but  not the desired .Thread dumps show  that at
>>> any time only on thread  has access to  this method and others are  blocked
>>> .I am just wondering if I am building the avro objects correctly.
>>>
>>> "pool-6-thread-5" #53 prio=5 os_prio=0 tid=0x00007fad896c7800 nid=0x4328
>>> waiting for monitor entry [0x00007fad52833000]
>>>           java.lang.Thread.State: BLOCKED (on object monitor)
>>>                at java.util.Collections$Synchron
>>> izedMap.get(Collections.java:2584)
>>>                - waiting to lock <0x000000066a5e3460> (a
>>> java.util.Collections$SynchronizedMap)
>>>                at org.apache.avro.generic.Generi
>>> cData.getDefaultValue(GenericData.java:981)
>>>                at org.apache.avro.data.RecordBui
>>> lderBase.defaultValue(RecordBuilderBase.java:135)
>>>
>>>
>>> "pool-6-thread-4" #52 prio=5 os_prio=0 tid=0x00007fad896c6000 nid=0x4327
>>> waiting for monitor entry [0x00007fad52934000]
>>>           java.lang.Thread.State: BLOCKED (on object monitor)
>>>                at java.util.Collections$Synchron
>>> izedMap.get(Collections.java:2584)
>>>                - waiting to lock <0x000000066a5e3460> (a
>>> java.util.Collections$SynchronizedMap)
>>>                at org.apache.avro.generic.Generi
>>> cData.getDefaultValue(GenericData.java:981)
>>>                at org.apache.avro.data.RecordBui
>>> lderBase.defaultValue(RecordBuilderBase.java:135)
>>>                at com.model.avro.SEGMENT1B$Build
>>> er.build(SEGMENT1B.java:4362)
>>>
>>> "pool-6-thread-2" #50 prio=5 os_prio=0 tid=0x00007fad8953a800 nid=0x4325
>>> runnable [0x00007fad52b36000]
>>>    java.lang.Thread.State: RUNNABLE
>>>         at java.util.Collections$SynchronizedMap.get(Collections.java:2
>>> 584)
>>>         - locked <0x000000066a5e3460> (a java.util.Collections$Synchron
>>> izedMap)
>>>         at org.apache.avro.generic.GenericData.getDefaultValue(GenericD
>>> ata.java:981)
>>>
>>>
>>>
>>
>

Re: Concurrent Building of Avro Objects

Posted by Doug Cutting <cu...@gmail.com>.
This sounds like AVRO-1760, fixed since Avro 1.8.0.

https://issues.apache.org/jira/browse/AVRO-1760

What version of Avro are you using?

Doug

On Mon, Jan 22, 2018 at 9:45 AM, Nishanth S <ni...@gmail.com> wrote:

> Hi All,
>
> We have a process that reads data from a  local file share  ,serailizes
> and writes to HDFS in avro format. Currently it is running as a single
> threaded process. When converted  to a parallel process we did  get  some
> performance improvement  but  not the desired .Thread dumps are pasted
> below .I am just wondering if I am building the avro objects correctly.
> For every record that  that is read from the binary file we create an
> equivalent avro object in the below format. Our avro schema is  pretty
> big,  around 1800 fields and all of those have default values . After doing
> some profiling  I  could see that the most  time consuming method
> is  org.apache.avro.generic.GenericData.getDefaultValue() . This is in
> fact taking  more time than doing the actual reads/writes. Thanks for
> taking a look.
>
> Parent p = new Parent();
> LOGHDR hdr = LOGHDR.newBuilder().build()
> MSGHDR msg = MSGHDR.newBuilder().build()
> p.setHdr(hdr);
> p.setMsg(msg);
>
> Then  all fields in p and all the nested types that p holds together like
> LOGHDR and MSGHDR are set  .
>
>
>
>
> "pool-6-thread-5" #53 prio=5 os_prio=0 tid=0x00007fad896c7800 nid=0x4328
> waiting for monitor entry [0x00007fad52833000]
>           java.lang.Thread.State: BLOCKED (on object monitor)
>                at java.util.Collections$Synchron
> izedMap.get(Collections.java:2584)
>                - waiting to lock <0x000000066a5e3460> (a
> java.util.Collections$SynchronizedMap)
>                at org.apache.avro.generic.Generi
> cData.getDefaultValue(GenericData.java:981)
>                at org.apache.avro.data.RecordBui
> lderBase.defaultValue(RecordBuilderBase.java:135)
>
>
> "pool-6-thread-4" #52 prio=5 os_prio=0 tid=0x00007fad896c6000 nid=0x4327
> waiting for monitor entry [0x00007fad52934000]
>           java.lang.Thread.State: BLOCKED (on object monitor)
>                at java.util.Collections$Synchron
> izedMap.get(Collections.java:2584)
>                - waiting to lock <0x000000066a5e3460> (a
> java.util.Collections$SynchronizedMap)
>                at org.apache.avro.generic.Generi
> cData.getDefaultValue(GenericData.java:981)
>                at org.apache.avro.data.RecordBui
> lderBase.defaultValue(RecordBuilderBase.java:135)
>                at com.model.avro.SEGMENT1B$Build
> er.build(SEGMENT1B.java:4362)
>
> "pool-6-thread-2" #50 prio=5 os_prio=0 tid=0x00007fad8953a800 nid=0x4325
> runnable [0x00007fad52b36000]
>    java.lang.Thread.State: RUNNABLE
>         at java.util.Collections$SynchronizedMap.get(Collections.java:
> 2584)
>         - locked <0x000000066a5e3460> (a java.util.Collections$Synchron
> izedMap)
>         at org.apache.avro.generic.GenericData.getDefaultValue(GenericD
> ata.java:981)
>
>
> On Fri, Jan 19, 2018 at 6:04 PM, Nishanth S <ni...@gmail.com>
> wrote:
>
>> Hi All,
>>
>> We have a process that reads data from a  local file share  ,serailizes
>> and writes to HDFS in avro format. Currently it is running as a single
>> threaded process. When converted t to a parallel process we did  get  some
>> performance improvement  but  not the desired .Thread dumps show  that at
>> any time only on thread  has access to  this method and others are  blocked
>> .I am just wondering if I am building the avro objects correctly.
>>
>> "pool-6-thread-5" #53 prio=5 os_prio=0 tid=0x00007fad896c7800 nid=0x4328
>> waiting for monitor entry [0x00007fad52833000]
>>           java.lang.Thread.State: BLOCKED (on object monitor)
>>                at java.util.Collections$Synchron
>> izedMap.get(Collections.java:2584)
>>                - waiting to lock <0x000000066a5e3460> (a
>> java.util.Collections$SynchronizedMap)
>>                at org.apache.avro.generic.Generi
>> cData.getDefaultValue(GenericData.java:981)
>>                at org.apache.avro.data.RecordBui
>> lderBase.defaultValue(RecordBuilderBase.java:135)
>>
>>
>> "pool-6-thread-4" #52 prio=5 os_prio=0 tid=0x00007fad896c6000 nid=0x4327
>> waiting for monitor entry [0x00007fad52934000]
>>           java.lang.Thread.State: BLOCKED (on object monitor)
>>                at java.util.Collections$Synchron
>> izedMap.get(Collections.java:2584)
>>                - waiting to lock <0x000000066a5e3460> (a
>> java.util.Collections$SynchronizedMap)
>>                at org.apache.avro.generic.Generi
>> cData.getDefaultValue(GenericData.java:981)
>>                at org.apache.avro.data.RecordBui
>> lderBase.defaultValue(RecordBuilderBase.java:135)
>>                at com.model.avro.SEGMENT1B$Build
>> er.build(SEGMENT1B.java:4362)
>>
>> "pool-6-thread-2" #50 prio=5 os_prio=0 tid=0x00007fad8953a800 nid=0x4325
>> runnable [0x00007fad52b36000]
>>    java.lang.Thread.State: RUNNABLE
>>         at java.util.Collections$SynchronizedMap.get(Collections.java:
>> 2584)
>>         - locked <0x000000066a5e3460> (a java.util.Collections$Synchron
>> izedMap)
>>         at org.apache.avro.generic.GenericData.getDefaultValue(GenericD
>> ata.java:981)
>>
>>
>>
>

Re: Concurrent Building of Avro Objects

Posted by Nishanth S <ni...@gmail.com>.
Hi All,

We have a process that reads data from a  local file share  ,serailizes and
writes to HDFS in avro format. Currently it is running as a single threaded
process. When converted  to a parallel process we did  get  some
performance improvement  but  not the desired .Thread dumps are pasted
below .I am just wondering if I am building the avro objects correctly. For
every record that  that is read from the binary file we create an
equivalent avro object in the below format. Our avro schema is  pretty
big,  around 1800 fields and all of those have default values . After doing
some profiling  I  could see that the most  time consuming method
is  org.apache.avro.generic.GenericData.getDefaultValue() . This is in fact
taking  more time than doing the actual reads/writes. Thanks for taking a
look.

Parent p = new Parent();
LOGHDR hdr = LOGHDR.newBuilder().build()
MSGHDR msg = MSGHDR.newBuilder().build()
p.setHdr(hdr);
p.setMsg(msg);

Then  all fields in p and all the nested types that p holds together like
LOGHDR and MSGHDR are set  .




"pool-6-thread-5" #53 prio=5 os_prio=0 tid=0x00007fad896c7800 nid=0x4328
waiting for monitor entry [0x00007fad52833000]
          java.lang.Thread.State: BLOCKED (on object monitor)
               at java.util.Collections$SynchronizedMap.get(
Collections.java:2584)
               - waiting to lock <0x000000066a5e3460> (a
java.util.Collections$SynchronizedMap)
               at org.apache.avro.generic.GenericData.getDefaultValue(
GenericData.java:981)
               at org.apache.avro.data.RecordBuilderBase.defaultValue(
RecordBuilderBase.java:135)


"pool-6-thread-4" #52 prio=5 os_prio=0 tid=0x00007fad896c6000 nid=0x4327
waiting for monitor entry [0x00007fad52934000]
          java.lang.Thread.State: BLOCKED (on object monitor)
               at java.util.Collections$SynchronizedMap.get(
Collections.java:2584)
               - waiting to lock <0x000000066a5e3460> (a
java.util.Collections$SynchronizedMap)
               at org.apache.avro.generic.GenericData.getDefaultValue(
GenericData.java:981)
               at org.apache.avro.data.RecordBuilderBase.defaultValue(
RecordBuilderBase.java:135)
               at com.model.avro.SEGMENT1B$Builder.build(SEGMENT1B.java:
4362)

"pool-6-thread-2" #50 prio=5 os_prio=0 tid=0x00007fad8953a800 nid=0x4325
runnable [0x00007fad52b36000]
   java.lang.Thread.State: RUNNABLE
        at java.util.Collections$SynchronizedMap.get(Collections.java:2584)
        - locked <0x000000066a5e3460> (a java.util.Collections$
SynchronizedMap)
        at org.apache.avro.generic.GenericData.getDefaultValue(
GenericData.java:981)


On Fri, Jan 19, 2018 at 6:04 PM, Nishanth S <ni...@gmail.com> wrote:

> Hi All,
>
> We have a process that reads data from a  local file share  ,serailizes
> and writes to HDFS in avro format. Currently it is running as a single
> threaded process. When converted t to a parallel process we did  get  some
> performance improvement  but  not the desired .Thread dumps show  that at
> any time only on thread  has access to  this method and others are  blocked
> .I am just wondering if I am building the avro objects correctly.
>
> "pool-6-thread-5" #53 prio=5 os_prio=0 tid=0x00007fad896c7800 nid=0x4328
> waiting for monitor entry [0x00007fad52833000]
>           java.lang.Thread.State: BLOCKED (on object monitor)
>                at java.util.Collections$SynchronizedMap.get(
> Collections.java:2584)
>                - waiting to lock <0x000000066a5e3460> (a
> java.util.Collections$SynchronizedMap)
>                at org.apache.avro.generic.GenericData.getDefaultValue(
> GenericData.java:981)
>                at org.apache.avro.data.RecordBuilderBase.defaultValue(
> RecordBuilderBase.java:135)
>
>
> "pool-6-thread-4" #52 prio=5 os_prio=0 tid=0x00007fad896c6000 nid=0x4327
> waiting for monitor entry [0x00007fad52934000]
>           java.lang.Thread.State: BLOCKED (on object monitor)
>                at java.util.Collections$SynchronizedMap.get(
> Collections.java:2584)
>                - waiting to lock <0x000000066a5e3460> (a
> java.util.Collections$SynchronizedMap)
>                at org.apache.avro.generic.GenericData.getDefaultValue(
> GenericData.java:981)
>                at org.apache.avro.data.RecordBuilderBase.defaultValue(
> RecordBuilderBase.java:135)
>                at com.model.avro.SEGMENT1B$Builder.build(SEGMENT1B.java:
> 4362)
>
> "pool-6-thread-2" #50 prio=5 os_prio=0 tid=0x00007fad8953a800 nid=0x4325
> runnable [0x00007fad52b36000]
>    java.lang.Thread.State: RUNNABLE
>         at java.util.Collections$SynchronizedMap.get(
> Collections.java:2584)
>         - locked <0x000000066a5e3460> (a java.util.Collections$
> SynchronizedMap)
>         at org.apache.avro.generic.GenericData.getDefaultValue(
> GenericData.java:981)
>
>
>