You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by Nishanth S <ni...@gmail.com> on 2018/01/20 01:04:52 UTC
Concurrent Building of Avro Objects
Hi All,
We have a process that reads data from a local file share ,serailizes and
writes to HDFS in avro format. Currently it is running as a single threaded
process. When converted t to a parallel process we did get some
performance improvement but not the desired .Thread dumps show that at
any time only on thread has access to this method and others are blocked
.I am just wondering if I am building the avro objects correctly.
"pool-6-thread-5" #53 prio=5 os_prio=0 tid=0x00007fad896c7800 nid=0x4328
waiting for monitor entry [0x00007fad52833000]
java.lang.Thread.State: BLOCKED (on object monitor)
at
java.util.Collections$SynchronizedMap.get(Collections.java:2584)
- waiting to lock <0x000000066a5e3460> (a
java.util.Collections$SynchronizedMap)
at
org.apache.avro.generic.GenericData.getDefaultValue(GenericData.java:981)
at
org.apache.avro.data.RecordBuilderBase.defaultValue(RecordBuilderBase.java:135)
"pool-6-thread-4" #52 prio=5 os_prio=0 tid=0x00007fad896c6000 nid=0x4327
waiting for monitor entry [0x00007fad52934000]
java.lang.Thread.State: BLOCKED (on object monitor)
at
java.util.Collections$SynchronizedMap.get(Collections.java:2584)
- waiting to lock <0x000000066a5e3460> (a
java.util.Collections$SynchronizedMap)
at
org.apache.avro.generic.GenericData.getDefaultValue(GenericData.java:981)
at
org.apache.avro.data.RecordBuilderBase.defaultValue(RecordBuilderBase.java:135)
at
com.model.avro.SEGMENT1B$Builder.build(SEGMENT1B.java:4362)
"pool-6-thread-2" #50 prio=5 os_prio=0 tid=0x00007fad8953a800 nid=0x4325
runnable [0x00007fad52b36000]
java.lang.Thread.State: RUNNABLE
at java.util.Collections$SynchronizedMap.get(Collections.java:2584)
- locked <0x000000066a5e3460> (a
java.util.Collections$SynchronizedMap)
at
org.apache.avro.generic.GenericData.getDefaultValue(GenericData.java:981)
Re: Concurrent Building of Avro Objects
Posted by Nishanth S <ni...@gmail.com>.
We have a process that reads data from a local file share ,serailizes and
writes to HDFS in avro format. Currently it is running as a single threaded
process. When converted t to a parallel process we did get some
performance improvement but not the desired .Thread dumps show that at
any time only on thread has access to this method and others are blocked
.I am just wondering if I am building the avro objects correctly. For every
record that that is read from the binary file we create an equivalent avro
object in the below format.
Parent p = new Parent();
LOGHDR hdr = LOGHDR.newBuilder().build()
MSGHDR msg = MSGHDR.newBuilder().build()
p.setHdr(hdr);
p.setMsg(msg);
Then all fields in p and all the nested types that p holds together like
LOGHDR and MSGHDR are set .
"pool-6-thread-5" #53 prio=5 os_prio=0 tid=0x00007fad896c7800 nid=0x4328
waiting for monitor entry [0x00007fad52833000]
java.lang.Thread.State: BLOCKED (on object monitor)
at java.util.Collections$SynchronizedMap.get(
Collections.java:2584)
- waiting to lock <0x000000066a5e3460> (a
java.util.Collections$SynchronizedMap)
at org.apache.avro.generic.GenericData.getDefaultValue(
GenericData.java:981)
at org.apache.avro.data.RecordBuilderBase.defaultValue(
RecordBuilderBase.java:135)
"pool-6-thread-4" #52 prio=5 os_prio=0 tid=0x00007fad896c6000 nid=0x4327
waiting for monitor entry [0x00007fad52934000]
java.lang.Thread.State: BLOCKED (on object monitor)
at java.util.Collections$SynchronizedMap.get(
Collections.java:2584)
- waiting to lock <0x000000066a5e3460> (a
java.util.Collections$SynchronizedMap)
at org.apache.avro.generic.GenericData.getDefaultValue(
GenericData.java:981)
at org.apache.avro.data.RecordBuilderBase.defaultValue(
RecordBuilderBase.java:135)
at com.model.avro.SEGMENT1B$Builder.build(SEGMENT1B.java:
4362)
"pool-6-thread-2" #50 prio=5 os_prio=0 tid=0x00007fad8953a800 nid=0x4325
runnable [0x00007fad52b36000]
java.lang.Thread.State: RUNNABLE
at java.util.Collections$SynchronizedMap.get(Collections.java:2584)
- locked <0x000000066a5e3460> (a java.util.Collections$
SynchronizedMap)
at org.apache.avro.generic.GenericData.getDefaultValue(
GenericData.java:981)
On Fri, Jan 19, 2018 at 6:04 PM, Nishanth S <ni...@gmail.com> wrote:
> Hi All,
>
> We have a process that reads data from a local file share ,serailizes
> and writes to HDFS in avro format. Currently it is running as a single
> threaded process. When converted t to a parallel process we did get some
> performance improvement but not the desired .Thread dumps show that at
> any time only on thread has access to this method and others are blocked
> .I am just wondering if I am building the avro objects correctly.
>
> "pool-6-thread-5" #53 prio=5 os_prio=0 tid=0x00007fad896c7800 nid=0x4328
> waiting for monitor entry [0x00007fad52833000]
> java.lang.Thread.State: BLOCKED (on object monitor)
> at java.util.Collections$SynchronizedMap.get(
> Collections.java:2584)
> - waiting to lock <0x000000066a5e3460> (a
> java.util.Collections$SynchronizedMap)
> at org.apache.avro.generic.GenericData.getDefaultValue(
> GenericData.java:981)
> at org.apache.avro.data.RecordBuilderBase.defaultValue(
> RecordBuilderBase.java:135)
>
>
> "pool-6-thread-4" #52 prio=5 os_prio=0 tid=0x00007fad896c6000 nid=0x4327
> waiting for monitor entry [0x00007fad52934000]
> java.lang.Thread.State: BLOCKED (on object monitor)
> at java.util.Collections$SynchronizedMap.get(
> Collections.java:2584)
> - waiting to lock <0x000000066a5e3460> (a
> java.util.Collections$SynchronizedMap)
> at org.apache.avro.generic.GenericData.getDefaultValue(
> GenericData.java:981)
> at org.apache.avro.data.RecordBuilderBase.defaultValue(
> RecordBuilderBase.java:135)
> at com.model.avro.SEGMENT1B$Builder.build(SEGMENT1B.java:
> 4362)
>
> "pool-6-thread-2" #50 prio=5 os_prio=0 tid=0x00007fad8953a800 nid=0x4325
> runnable [0x00007fad52b36000]
> java.lang.Thread.State: RUNNABLE
> at java.util.Collections$SynchronizedMap.get(
> Collections.java:2584)
> - locked <0x000000066a5e3460> (a java.util.Collections$
> SynchronizedMap)
> at org.apache.avro.generic.GenericData.getDefaultValue(
> GenericData.java:981)
>
>
>
Re: Concurrent Building of Avro Objects
Posted by Nishanth S <ni...@gmail.com>.
After upgrading to 1.8.2 I do not see hotspots with parallel threads.
Thanks Doug!
On Tue, Jan 23, 2018 at 8:41 PM, Nishanth S <ni...@gmail.com> wrote:
> Thanks Doug ..That sounds like it..We are using 1.7.6.I will upgrade our
> version and let every one .Thanks for jumping in.
>
> On Jan 23, 2018 5:19 PM, "Doug Cutting" <cu...@gmail.com> wrote:
>
>> This sounds like AVRO-1760, fixed since Avro 1.8.0.
>>
>> https://issues.apache.org/jira/browse/AVRO-1760
>>
>> What version of Avro are you using?
>>
>> Doug
>>
>> On Mon, Jan 22, 2018 at 9:45 AM, Nishanth S <ni...@gmail.com>
>> wrote:
>>
>>> Hi All,
>>>
>>> We have a process that reads data from a local file share ,serailizes
>>> and writes to HDFS in avro format. Currently it is running as a single
>>> threaded process. When converted to a parallel process we did get some
>>> performance improvement but not the desired .Thread dumps are pasted
>>> below .I am just wondering if I am building the avro objects correctly.
>>> For every record that that is read from the binary file we create an
>>> equivalent avro object in the below format. Our avro schema is pretty
>>> big, around 1800 fields and all of those have default values . After doing
>>> some profiling I could see that the most time consuming method is
>>> org.apache.avro.generic.GenericData.getDefaultValue() . This is in fact
>>> taking more time than doing the actual reads/writes. Thanks for taking a
>>> look.
>>>
>>> Parent p = new Parent();
>>> LOGHDR hdr = LOGHDR.newBuilder().build()
>>> MSGHDR msg = MSGHDR.newBuilder().build()
>>> p.setHdr(hdr);
>>> p.setMsg(msg);
>>>
>>> Then all fields in p and all the nested types that p holds together
>>> like LOGHDR and MSGHDR are set .
>>>
>>>
>>>
>>>
>>> "pool-6-thread-5" #53 prio=5 os_prio=0 tid=0x00007fad896c7800 nid=0x4328
>>> waiting for monitor entry [0x00007fad52833000]
>>> java.lang.Thread.State: BLOCKED (on object monitor)
>>> at java.util.Collections$Synchron
>>> izedMap.get(Collections.java:2584)
>>> - waiting to lock <0x000000066a5e3460> (a
>>> java.util.Collections$SynchronizedMap)
>>> at org.apache.avro.generic.Generi
>>> cData.getDefaultValue(GenericData.java:981)
>>> at org.apache.avro.data.RecordBui
>>> lderBase.defaultValue(RecordBuilderBase.java:135)
>>>
>>>
>>> "pool-6-thread-4" #52 prio=5 os_prio=0 tid=0x00007fad896c6000 nid=0x4327
>>> waiting for monitor entry [0x00007fad52934000]
>>> java.lang.Thread.State: BLOCKED (on object monitor)
>>> at java.util.Collections$Synchron
>>> izedMap.get(Collections.java:2584)
>>> - waiting to lock <0x000000066a5e3460> (a
>>> java.util.Collections$SynchronizedMap)
>>> at org.apache.avro.generic.Generi
>>> cData.getDefaultValue(GenericData.java:981)
>>> at org.apache.avro.data.RecordBui
>>> lderBase.defaultValue(RecordBuilderBase.java:135)
>>> at com.model.avro.SEGMENT1B$Build
>>> er.build(SEGMENT1B.java:4362)
>>>
>>> "pool-6-thread-2" #50 prio=5 os_prio=0 tid=0x00007fad8953a800 nid=0x4325
>>> runnable [0x00007fad52b36000]
>>> java.lang.Thread.State: RUNNABLE
>>> at java.util.Collections$SynchronizedMap.get(Collections.java:2
>>> 584)
>>> - locked <0x000000066a5e3460> (a java.util.Collections$Synchron
>>> izedMap)
>>> at org.apache.avro.generic.GenericData.getDefaultValue(GenericD
>>> ata.java:981)
>>>
>>>
>>> On Fri, Jan 19, 2018 at 6:04 PM, Nishanth S <ni...@gmail.com>
>>> wrote:
>>>
>>>> Hi All,
>>>>
>>>> We have a process that reads data from a local file share ,serailizes
>>>> and writes to HDFS in avro format. Currently it is running as a single
>>>> threaded process. When converted t to a parallel process we did get some
>>>> performance improvement but not the desired .Thread dumps show that at
>>>> any time only on thread has access to this method and others are blocked
>>>> .I am just wondering if I am building the avro objects correctly.
>>>>
>>>> "pool-6-thread-5" #53 prio=5 os_prio=0 tid=0x00007fad896c7800
>>>> nid=0x4328 waiting for monitor entry [0x00007fad52833000]
>>>> java.lang.Thread.State: BLOCKED (on object monitor)
>>>> at java.util.Collections$Synchron
>>>> izedMap.get(Collections.java:2584)
>>>> - waiting to lock <0x000000066a5e3460> (a
>>>> java.util.Collections$SynchronizedMap)
>>>> at org.apache.avro.generic.Generi
>>>> cData.getDefaultValue(GenericData.java:981)
>>>> at org.apache.avro.data.RecordBui
>>>> lderBase.defaultValue(RecordBuilderBase.java:135)
>>>>
>>>>
>>>> "pool-6-thread-4" #52 prio=5 os_prio=0 tid=0x00007fad896c6000
>>>> nid=0x4327 waiting for monitor entry [0x00007fad52934000]
>>>> java.lang.Thread.State: BLOCKED (on object monitor)
>>>> at java.util.Collections$Synchron
>>>> izedMap.get(Collections.java:2584)
>>>> - waiting to lock <0x000000066a5e3460> (a
>>>> java.util.Collections$SynchronizedMap)
>>>> at org.apache.avro.generic.Generi
>>>> cData.getDefaultValue(GenericData.java:981)
>>>> at org.apache.avro.data.RecordBui
>>>> lderBase.defaultValue(RecordBuilderBase.java:135)
>>>> at com.model.avro.SEGMENT1B$Build
>>>> er.build(SEGMENT1B.java:4362)
>>>>
>>>> "pool-6-thread-2" #50 prio=5 os_prio=0 tid=0x00007fad8953a800
>>>> nid=0x4325 runnable [0x00007fad52b36000]
>>>> java.lang.Thread.State: RUNNABLE
>>>> at java.util.Collections$SynchronizedMap.get(Collections.java:2
>>>> 584)
>>>> - locked <0x000000066a5e3460> (a java.util.Collections$Synchron
>>>> izedMap)
>>>> at org.apache.avro.generic.GenericData.getDefaultValue(GenericD
>>>> ata.java:981)
>>>>
>>>>
>>>>
>>>
>>
Re: Concurrent Building of Avro Objects
Posted by Nishanth S <ni...@gmail.com>.
Thanks Doug ..That sounds like it..We are using 1.7.6.I will upgrade our
version and let every one .Thanks for jumping in.
On Jan 23, 2018 5:19 PM, "Doug Cutting" <cu...@gmail.com> wrote:
> This sounds like AVRO-1760, fixed since Avro 1.8.0.
>
> https://issues.apache.org/jira/browse/AVRO-1760
>
> What version of Avro are you using?
>
> Doug
>
> On Mon, Jan 22, 2018 at 9:45 AM, Nishanth S <ni...@gmail.com>
> wrote:
>
>> Hi All,
>>
>> We have a process that reads data from a local file share ,serailizes
>> and writes to HDFS in avro format. Currently it is running as a single
>> threaded process. When converted to a parallel process we did get some
>> performance improvement but not the desired .Thread dumps are pasted
>> below .I am just wondering if I am building the avro objects correctly.
>> For every record that that is read from the binary file we create an
>> equivalent avro object in the below format. Our avro schema is pretty
>> big, around 1800 fields and all of those have default values . After doing
>> some profiling I could see that the most time consuming method is
>> org.apache.avro.generic.GenericData.getDefaultValue() . This is in fact
>> taking more time than doing the actual reads/writes. Thanks for taking a
>> look.
>>
>> Parent p = new Parent();
>> LOGHDR hdr = LOGHDR.newBuilder().build()
>> MSGHDR msg = MSGHDR.newBuilder().build()
>> p.setHdr(hdr);
>> p.setMsg(msg);
>>
>> Then all fields in p and all the nested types that p holds together like
>> LOGHDR and MSGHDR are set .
>>
>>
>>
>>
>> "pool-6-thread-5" #53 prio=5 os_prio=0 tid=0x00007fad896c7800 nid=0x4328
>> waiting for monitor entry [0x00007fad52833000]
>> java.lang.Thread.State: BLOCKED (on object monitor)
>> at java.util.Collections$Synchron
>> izedMap.get(Collections.java:2584)
>> - waiting to lock <0x000000066a5e3460> (a
>> java.util.Collections$SynchronizedMap)
>> at org.apache.avro.generic.Generi
>> cData.getDefaultValue(GenericData.java:981)
>> at org.apache.avro.data.RecordBui
>> lderBase.defaultValue(RecordBuilderBase.java:135)
>>
>>
>> "pool-6-thread-4" #52 prio=5 os_prio=0 tid=0x00007fad896c6000 nid=0x4327
>> waiting for monitor entry [0x00007fad52934000]
>> java.lang.Thread.State: BLOCKED (on object monitor)
>> at java.util.Collections$Synchron
>> izedMap.get(Collections.java:2584)
>> - waiting to lock <0x000000066a5e3460> (a
>> java.util.Collections$SynchronizedMap)
>> at org.apache.avro.generic.Generi
>> cData.getDefaultValue(GenericData.java:981)
>> at org.apache.avro.data.RecordBui
>> lderBase.defaultValue(RecordBuilderBase.java:135)
>> at com.model.avro.SEGMENT1B$Build
>> er.build(SEGMENT1B.java:4362)
>>
>> "pool-6-thread-2" #50 prio=5 os_prio=0 tid=0x00007fad8953a800 nid=0x4325
>> runnable [0x00007fad52b36000]
>> java.lang.Thread.State: RUNNABLE
>> at java.util.Collections$SynchronizedMap.get(Collections.java:2
>> 584)
>> - locked <0x000000066a5e3460> (a java.util.Collections$Synchron
>> izedMap)
>> at org.apache.avro.generic.GenericData.getDefaultValue(GenericD
>> ata.java:981)
>>
>>
>> On Fri, Jan 19, 2018 at 6:04 PM, Nishanth S <ni...@gmail.com>
>> wrote:
>>
>>> Hi All,
>>>
>>> We have a process that reads data from a local file share ,serailizes
>>> and writes to HDFS in avro format. Currently it is running as a single
>>> threaded process. When converted t to a parallel process we did get some
>>> performance improvement but not the desired .Thread dumps show that at
>>> any time only on thread has access to this method and others are blocked
>>> .I am just wondering if I am building the avro objects correctly.
>>>
>>> "pool-6-thread-5" #53 prio=5 os_prio=0 tid=0x00007fad896c7800 nid=0x4328
>>> waiting for monitor entry [0x00007fad52833000]
>>> java.lang.Thread.State: BLOCKED (on object monitor)
>>> at java.util.Collections$Synchron
>>> izedMap.get(Collections.java:2584)
>>> - waiting to lock <0x000000066a5e3460> (a
>>> java.util.Collections$SynchronizedMap)
>>> at org.apache.avro.generic.Generi
>>> cData.getDefaultValue(GenericData.java:981)
>>> at org.apache.avro.data.RecordBui
>>> lderBase.defaultValue(RecordBuilderBase.java:135)
>>>
>>>
>>> "pool-6-thread-4" #52 prio=5 os_prio=0 tid=0x00007fad896c6000 nid=0x4327
>>> waiting for monitor entry [0x00007fad52934000]
>>> java.lang.Thread.State: BLOCKED (on object monitor)
>>> at java.util.Collections$Synchron
>>> izedMap.get(Collections.java:2584)
>>> - waiting to lock <0x000000066a5e3460> (a
>>> java.util.Collections$SynchronizedMap)
>>> at org.apache.avro.generic.Generi
>>> cData.getDefaultValue(GenericData.java:981)
>>> at org.apache.avro.data.RecordBui
>>> lderBase.defaultValue(RecordBuilderBase.java:135)
>>> at com.model.avro.SEGMENT1B$Build
>>> er.build(SEGMENT1B.java:4362)
>>>
>>> "pool-6-thread-2" #50 prio=5 os_prio=0 tid=0x00007fad8953a800 nid=0x4325
>>> runnable [0x00007fad52b36000]
>>> java.lang.Thread.State: RUNNABLE
>>> at java.util.Collections$SynchronizedMap.get(Collections.java:2
>>> 584)
>>> - locked <0x000000066a5e3460> (a java.util.Collections$Synchron
>>> izedMap)
>>> at org.apache.avro.generic.GenericData.getDefaultValue(GenericD
>>> ata.java:981)
>>>
>>>
>>>
>>
>
Re: Concurrent Building of Avro Objects
Posted by Doug Cutting <cu...@gmail.com>.
This sounds like AVRO-1760, fixed since Avro 1.8.0.
https://issues.apache.org/jira/browse/AVRO-1760
What version of Avro are you using?
Doug
On Mon, Jan 22, 2018 at 9:45 AM, Nishanth S <ni...@gmail.com> wrote:
> Hi All,
>
> We have a process that reads data from a local file share ,serailizes
> and writes to HDFS in avro format. Currently it is running as a single
> threaded process. When converted to a parallel process we did get some
> performance improvement but not the desired .Thread dumps are pasted
> below .I am just wondering if I am building the avro objects correctly.
> For every record that that is read from the binary file we create an
> equivalent avro object in the below format. Our avro schema is pretty
> big, around 1800 fields and all of those have default values . After doing
> some profiling I could see that the most time consuming method
> is org.apache.avro.generic.GenericData.getDefaultValue() . This is in
> fact taking more time than doing the actual reads/writes. Thanks for
> taking a look.
>
> Parent p = new Parent();
> LOGHDR hdr = LOGHDR.newBuilder().build()
> MSGHDR msg = MSGHDR.newBuilder().build()
> p.setHdr(hdr);
> p.setMsg(msg);
>
> Then all fields in p and all the nested types that p holds together like
> LOGHDR and MSGHDR are set .
>
>
>
>
> "pool-6-thread-5" #53 prio=5 os_prio=0 tid=0x00007fad896c7800 nid=0x4328
> waiting for monitor entry [0x00007fad52833000]
> java.lang.Thread.State: BLOCKED (on object monitor)
> at java.util.Collections$Synchron
> izedMap.get(Collections.java:2584)
> - waiting to lock <0x000000066a5e3460> (a
> java.util.Collections$SynchronizedMap)
> at org.apache.avro.generic.Generi
> cData.getDefaultValue(GenericData.java:981)
> at org.apache.avro.data.RecordBui
> lderBase.defaultValue(RecordBuilderBase.java:135)
>
>
> "pool-6-thread-4" #52 prio=5 os_prio=0 tid=0x00007fad896c6000 nid=0x4327
> waiting for monitor entry [0x00007fad52934000]
> java.lang.Thread.State: BLOCKED (on object monitor)
> at java.util.Collections$Synchron
> izedMap.get(Collections.java:2584)
> - waiting to lock <0x000000066a5e3460> (a
> java.util.Collections$SynchronizedMap)
> at org.apache.avro.generic.Generi
> cData.getDefaultValue(GenericData.java:981)
> at org.apache.avro.data.RecordBui
> lderBase.defaultValue(RecordBuilderBase.java:135)
> at com.model.avro.SEGMENT1B$Build
> er.build(SEGMENT1B.java:4362)
>
> "pool-6-thread-2" #50 prio=5 os_prio=0 tid=0x00007fad8953a800 nid=0x4325
> runnable [0x00007fad52b36000]
> java.lang.Thread.State: RUNNABLE
> at java.util.Collections$SynchronizedMap.get(Collections.java:
> 2584)
> - locked <0x000000066a5e3460> (a java.util.Collections$Synchron
> izedMap)
> at org.apache.avro.generic.GenericData.getDefaultValue(GenericD
> ata.java:981)
>
>
> On Fri, Jan 19, 2018 at 6:04 PM, Nishanth S <ni...@gmail.com>
> wrote:
>
>> Hi All,
>>
>> We have a process that reads data from a local file share ,serailizes
>> and writes to HDFS in avro format. Currently it is running as a single
>> threaded process. When converted t to a parallel process we did get some
>> performance improvement but not the desired .Thread dumps show that at
>> any time only on thread has access to this method and others are blocked
>> .I am just wondering if I am building the avro objects correctly.
>>
>> "pool-6-thread-5" #53 prio=5 os_prio=0 tid=0x00007fad896c7800 nid=0x4328
>> waiting for monitor entry [0x00007fad52833000]
>> java.lang.Thread.State: BLOCKED (on object monitor)
>> at java.util.Collections$Synchron
>> izedMap.get(Collections.java:2584)
>> - waiting to lock <0x000000066a5e3460> (a
>> java.util.Collections$SynchronizedMap)
>> at org.apache.avro.generic.Generi
>> cData.getDefaultValue(GenericData.java:981)
>> at org.apache.avro.data.RecordBui
>> lderBase.defaultValue(RecordBuilderBase.java:135)
>>
>>
>> "pool-6-thread-4" #52 prio=5 os_prio=0 tid=0x00007fad896c6000 nid=0x4327
>> waiting for monitor entry [0x00007fad52934000]
>> java.lang.Thread.State: BLOCKED (on object monitor)
>> at java.util.Collections$Synchron
>> izedMap.get(Collections.java:2584)
>> - waiting to lock <0x000000066a5e3460> (a
>> java.util.Collections$SynchronizedMap)
>> at org.apache.avro.generic.Generi
>> cData.getDefaultValue(GenericData.java:981)
>> at org.apache.avro.data.RecordBui
>> lderBase.defaultValue(RecordBuilderBase.java:135)
>> at com.model.avro.SEGMENT1B$Build
>> er.build(SEGMENT1B.java:4362)
>>
>> "pool-6-thread-2" #50 prio=5 os_prio=0 tid=0x00007fad8953a800 nid=0x4325
>> runnable [0x00007fad52b36000]
>> java.lang.Thread.State: RUNNABLE
>> at java.util.Collections$SynchronizedMap.get(Collections.java:
>> 2584)
>> - locked <0x000000066a5e3460> (a java.util.Collections$Synchron
>> izedMap)
>> at org.apache.avro.generic.GenericData.getDefaultValue(GenericD
>> ata.java:981)
>>
>>
>>
>
Re: Concurrent Building of Avro Objects
Posted by Nishanth S <ni...@gmail.com>.
Hi All,
We have a process that reads data from a local file share ,serailizes and
writes to HDFS in avro format. Currently it is running as a single threaded
process. When converted to a parallel process we did get some
performance improvement but not the desired .Thread dumps are pasted
below .I am just wondering if I am building the avro objects correctly. For
every record that that is read from the binary file we create an
equivalent avro object in the below format. Our avro schema is pretty
big, around 1800 fields and all of those have default values . After doing
some profiling I could see that the most time consuming method
is org.apache.avro.generic.GenericData.getDefaultValue() . This is in fact
taking more time than doing the actual reads/writes. Thanks for taking a
look.
Parent p = new Parent();
LOGHDR hdr = LOGHDR.newBuilder().build()
MSGHDR msg = MSGHDR.newBuilder().build()
p.setHdr(hdr);
p.setMsg(msg);
Then all fields in p and all the nested types that p holds together like
LOGHDR and MSGHDR are set .
"pool-6-thread-5" #53 prio=5 os_prio=0 tid=0x00007fad896c7800 nid=0x4328
waiting for monitor entry [0x00007fad52833000]
java.lang.Thread.State: BLOCKED (on object monitor)
at java.util.Collections$SynchronizedMap.get(
Collections.java:2584)
- waiting to lock <0x000000066a5e3460> (a
java.util.Collections$SynchronizedMap)
at org.apache.avro.generic.GenericData.getDefaultValue(
GenericData.java:981)
at org.apache.avro.data.RecordBuilderBase.defaultValue(
RecordBuilderBase.java:135)
"pool-6-thread-4" #52 prio=5 os_prio=0 tid=0x00007fad896c6000 nid=0x4327
waiting for monitor entry [0x00007fad52934000]
java.lang.Thread.State: BLOCKED (on object monitor)
at java.util.Collections$SynchronizedMap.get(
Collections.java:2584)
- waiting to lock <0x000000066a5e3460> (a
java.util.Collections$SynchronizedMap)
at org.apache.avro.generic.GenericData.getDefaultValue(
GenericData.java:981)
at org.apache.avro.data.RecordBuilderBase.defaultValue(
RecordBuilderBase.java:135)
at com.model.avro.SEGMENT1B$Builder.build(SEGMENT1B.java:
4362)
"pool-6-thread-2" #50 prio=5 os_prio=0 tid=0x00007fad8953a800 nid=0x4325
runnable [0x00007fad52b36000]
java.lang.Thread.State: RUNNABLE
at java.util.Collections$SynchronizedMap.get(Collections.java:2584)
- locked <0x000000066a5e3460> (a java.util.Collections$
SynchronizedMap)
at org.apache.avro.generic.GenericData.getDefaultValue(
GenericData.java:981)
On Fri, Jan 19, 2018 at 6:04 PM, Nishanth S <ni...@gmail.com> wrote:
> Hi All,
>
> We have a process that reads data from a local file share ,serailizes
> and writes to HDFS in avro format. Currently it is running as a single
> threaded process. When converted t to a parallel process we did get some
> performance improvement but not the desired .Thread dumps show that at
> any time only on thread has access to this method and others are blocked
> .I am just wondering if I am building the avro objects correctly.
>
> "pool-6-thread-5" #53 prio=5 os_prio=0 tid=0x00007fad896c7800 nid=0x4328
> waiting for monitor entry [0x00007fad52833000]
> java.lang.Thread.State: BLOCKED (on object monitor)
> at java.util.Collections$SynchronizedMap.get(
> Collections.java:2584)
> - waiting to lock <0x000000066a5e3460> (a
> java.util.Collections$SynchronizedMap)
> at org.apache.avro.generic.GenericData.getDefaultValue(
> GenericData.java:981)
> at org.apache.avro.data.RecordBuilderBase.defaultValue(
> RecordBuilderBase.java:135)
>
>
> "pool-6-thread-4" #52 prio=5 os_prio=0 tid=0x00007fad896c6000 nid=0x4327
> waiting for monitor entry [0x00007fad52934000]
> java.lang.Thread.State: BLOCKED (on object monitor)
> at java.util.Collections$SynchronizedMap.get(
> Collections.java:2584)
> - waiting to lock <0x000000066a5e3460> (a
> java.util.Collections$SynchronizedMap)
> at org.apache.avro.generic.GenericData.getDefaultValue(
> GenericData.java:981)
> at org.apache.avro.data.RecordBuilderBase.defaultValue(
> RecordBuilderBase.java:135)
> at com.model.avro.SEGMENT1B$Builder.build(SEGMENT1B.java:
> 4362)
>
> "pool-6-thread-2" #50 prio=5 os_prio=0 tid=0x00007fad8953a800 nid=0x4325
> runnable [0x00007fad52b36000]
> java.lang.Thread.State: RUNNABLE
> at java.util.Collections$SynchronizedMap.get(
> Collections.java:2584)
> - locked <0x000000066a5e3460> (a java.util.Collections$
> SynchronizedMap)
> at org.apache.avro.generic.GenericData.getDefaultValue(
> GenericData.java:981)
>
>
>