You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by ey-chih chow <ey...@hotmail.com> on 2011/05/31 19:38:39 UTC
avro object reuse
Hi,
We have several mapreduce jobs using avro. They take too much memory when running on production. Can anybody suggest some object reuse techniques to cut down memory usage? Thanks.
Ey-Chih Chow
Re: avro object reuse
Posted by Scott Carey <sc...@richrelevance.com>.
No, that should not trigger Jackson parsing. Schema.parse() and Protocol.parse() do.
On 6/2/11 10:23 AM, "ey-chih chow" <ey...@hotmail.com>> wrote:
We create GenericData.Record a lot in our code via new GenericData.Record(schema). Will this generates Jackson calls? Thanks.
Ey-Chih Chow
> From: scott@richrelevance.com<ma...@richrelevance.com>
> To: user@avro.apache.org<ma...@avro.apache.org>
> Date: Wed, 1 Jun 2011 18:48:15 -0700
> Subject: Re: avro object reuse
>
> One thing we do right now that might be related is the following:
>
> We keep Avro default Schema values as JsonNode objects. While traversing
> the JSON Avro schema representation using ObjectMapper.readTree() we
> remember JsonNodes that are "default" properties on fields and keep them
> on the Schema object.
> If these keep references to the parent (and the whole JSON tree, or worse,
> the ObjectMapper and input stream) it would be poor use of Jackson by us;
> although we'd need a way to keep a detached JsonNode or equivalent.
>
> However, even if that is the case (which it does not seem to be -- the
> jmap output has no JsonNode instances), it doesn't explain why we would be
> calling ObjectMapper frequently. We only call
> ObjectMapper.readTree(JsonParser) when creating a Schema from JSON. We
> call JsonNode methods from extracted fragments for everything else.
>
>
> This brings me to the following suspicion based on the data:
> Somewhere, Schema objects are being created frequently via one of the
> Schema.parse() or Protocol.parse() static methods.
>
> On 6/1/11 5:48 PM, "Tatu Saloranta" <ts...@gmail.com>> wrote:
>
> >On Wed, Jun 1, 2011 at 5:45 PM, Scott Carey <sc...@richrelevance.com>>
> >wrote:
> >> It would be useful to get a 'jmap -histo:live' report as well, which
> >>will
> >> only have items that remain after a full GC.
> >>
> >> However, a high churn of short lived Jackson objects is not expected
> >>here
> >> unless the user is reading Json serialized files and not Avro binary.
> >> Avro Data Files only contain binary encoded Avro content.
> >>
> >> It would be surprising to see many Jackson objects here if reading Avro
> >> Data Files, because we expect to use Jackson to parse an Avro schema
> >>from
> >> json only once or twice per file. After the schema is parsed, Jackson
> >> shouldn't be used. A hundred thousand DeserializationConfig instances
> >> means that isn't the case.
> >
> >Right -- it indicates that something (else) is using Jackson; and
> >there will typically be one instance of DeserializationConfig for each
> >data-binding call (ObjectMapper.readValue()), as a read-only copy is
> >made for operation.
> >... or if something is reading schema that many times, that sounds
> >like a problem in itself.
> >
> >-+ Tatu +-
>
RE: avro object reuse
Posted by ey-chih chow <ey...@hotmail.com>.
We create GenericData.Record a lot in our code via new GenericData.Record(schema). Will this generates Jackson calls? Thanks.
Ey-Chih Chow
> From: scott@richrelevance.com
> To: user@avro.apache.org
> Date: Wed, 1 Jun 2011 18:48:15 -0700
> Subject: Re: avro object reuse
>
> One thing we do right now that might be related is the following:
>
> We keep Avro default Schema values as JsonNode objects. While traversing
> the JSON Avro schema representation using ObjectMapper.readTree() we
> remember JsonNodes that are "default" properties on fields and keep them
> on the Schema object.
> If these keep references to the parent (and the whole JSON tree, or worse,
> the ObjectMapper and input stream) it would be poor use of Jackson by us;
> although we'd need a way to keep a detached JsonNode or equivalent.
>
> However, even if that is the case (which it does not seem to be -- the
> jmap output has no JsonNode instances), it doesn't explain why we would be
> calling ObjectMapper frequently. We only call
> ObjectMapper.readTree(JsonParser) when creating a Schema from JSON. We
> call JsonNode methods from extracted fragments for everything else.
>
>
> This brings me to the following suspicion based on the data:
> Somewhere, Schema objects are being created frequently via one of the
> Schema.parse() or Protocol.parse() static methods.
>
> On 6/1/11 5:48 PM, "Tatu Saloranta" <ts...@gmail.com> wrote:
>
> >On Wed, Jun 1, 2011 at 5:45 PM, Scott Carey <sc...@richrelevance.com>
> >wrote:
> >> It would be useful to get a 'jmap -histo:live' report as well, which
> >>will
> >> only have items that remain after a full GC.
> >>
> >> However, a high churn of short lived Jackson objects is not expected
> >>here
> >> unless the user is reading Json serialized files and not Avro binary.
> >> Avro Data Files only contain binary encoded Avro content.
> >>
> >> It would be surprising to see many Jackson objects here if reading Avro
> >> Data Files, because we expect to use Jackson to parse an Avro schema
> >>from
> >> json only once or twice per file. After the schema is parsed, Jackson
> >> shouldn't be used. A hundred thousand DeserializationConfig instances
> >> means that isn't the case.
> >
> >Right -- it indicates that something (else) is using Jackson; and
> >there will typically be one instance of DeserializationConfig for each
> >data-binding call (ObjectMapper.readValue()), as a read-only copy is
> >made for operation.
> >... or if something is reading schema that many times, that sounds
> >like a problem in itself.
> >
> >-+ Tatu +-
>
Re: avro object reuse
Posted by Scott Carey <sc...@richrelevance.com>.
One thing we do right now that might be related is the following:
We keep Avro default Schema values as JsonNode objects. While traversing
the JSON Avro schema representation using ObjectMapper.readTree() we
remember JsonNodes that are "default" properties on fields and keep them
on the Schema object.
If these keep references to the parent (and the whole JSON tree, or worse,
the ObjectMapper and input stream) it would be poor use of Jackson by us;
although we'd need a way to keep a detached JsonNode or equivalent.
However, even if that is the case (which it does not seem to be -- the
jmap output has no JsonNode instances), it doesn't explain why we would be
calling ObjectMapper frequently. We only call
ObjectMapper.readTree(JsonParser) when creating a Schema from JSON. We
call JsonNode methods from extracted fragments for everything else.
This brings me to the following suspicion based on the data:
Somewhere, Schema objects are being created frequently via one of the
Schema.parse() or Protocol.parse() static methods.
On 6/1/11 5:48 PM, "Tatu Saloranta" <ts...@gmail.com> wrote:
>On Wed, Jun 1, 2011 at 5:45 PM, Scott Carey <sc...@richrelevance.com>
>wrote:
>> It would be useful to get a 'jmap -histo:live' report as well, which
>>will
>> only have items that remain after a full GC.
>>
>> However, a high churn of short lived Jackson objects is not expected
>>here
>> unless the user is reading Json serialized files and not Avro binary.
>> Avro Data Files only contain binary encoded Avro content.
>>
>> It would be surprising to see many Jackson objects here if reading Avro
>> Data Files, because we expect to use Jackson to parse an Avro schema
>>from
>> json only once or twice per file. After the schema is parsed, Jackson
>> shouldn't be used. A hundred thousand DeserializationConfig instances
>> means that isn't the case.
>
>Right -- it indicates that something (else) is using Jackson; and
>there will typically be one instance of DeserializationConfig for each
>data-binding call (ObjectMapper.readValue()), as a read-only copy is
>made for operation.
>... or if something is reading schema that many times, that sounds
>like a problem in itself.
>
>-+ Tatu +-
Re: avro object reuse
Posted by Tatu Saloranta <ts...@gmail.com>.
On Wed, Jun 1, 2011 at 5:45 PM, Scott Carey <sc...@richrelevance.com> wrote:
> It would be useful to get a 'jmap -histo:live' report as well, which will
> only have items that remain after a full GC.
>
> However, a high churn of short lived Jackson objects is not expected here
> unless the user is reading Json serialized files and not Avro binary.
> Avro Data Files only contain binary encoded Avro content.
>
> It would be surprising to see many Jackson objects here if reading Avro
> Data Files, because we expect to use Jackson to parse an Avro schema from
> json only once or twice per file. After the schema is parsed, Jackson
> shouldn't be used. A hundred thousand DeserializationConfig instances
> means that isn't the case.
Right -- it indicates that something (else) is using Jackson; and
there will typically be one instance of DeserializationConfig for each
data-binding call (ObjectMapper.readValue()), as a read-only copy is
made for operation.
... or if something is reading schema that many times, that sounds
like a problem in itself.
-+ Tatu +-
Re: avro object reuse
Posted by Scott Carey <sc...@richrelevance.com>.
It would be useful to get a 'jmap -histo:live' report as well, which will
only have items that remain after a full GC.
However, a high churn of short lived Jackson objects is not expected here
unless the user is reading Json serialized files and not Avro binary.
Avro Data Files only contain binary encoded Avro content.
It would be surprising to see many Jackson objects here if reading Avro
Data Files, because we expect to use Jackson to parse an Avro schema from
json only once or twice per file. After the schema is parsed, Jackson
shouldn't be used. A hundred thousand DeserializationConfig instances
means that isn't the case.
On 6/1/11 5:13 PM, "Tatu Saloranta" <ts...@gmail.com> wrote:
>On Wed, Jun 1, 2011 at 1:45 PM, Scott Carey <sc...@richrelevance.com>
>wrote:
>> Lower down this list of object counts, what are the top
>>org.apache.avro.**
>> object counts?
>> How many AvroSerialization objects? How many AvroMapper, AvroWrapper,
>>etc?
>> What about org.apache.hadoop.** objects?
>
>Also: is this jmap view of live objects, or just dump of ALL objects,
>live and dead?
>It seems like dump of latter, as most Jackson objects are short-term
>things created for per-invocation purposes, and discarded after
>process is complete. High count is not necessarily surprising for
>high-throughput systems; it is only odd if these are actual live
>objects.
>
>-+ Tatu +-
Re: avro object reuse
Posted by Tatu Saloranta <ts...@gmail.com>.
On Wed, Jun 1, 2011 at 1:45 PM, Scott Carey <sc...@richrelevance.com> wrote:
> Lower down this list of object counts, what are the top org.apache.avro.**
> object counts?
> How many AvroSerialization objects? How many AvroMapper, AvroWrapper, etc?
> What about org.apache.hadoop.** objects?
Also: is this jmap view of live objects, or just dump of ALL objects,
live and dead?
It seems like dump of latter, as most Jackson objects are short-term
things created for per-invocation purposes, and discarded after
process is complete. High count is not necessarily surprising for
high-throughput systems; it is only odd if these are actual live
objects.
-+ Tatu +-
RE: avro object reuse
Posted by ey-chih chow <ey...@hotmail.com>.
What follows is the whole output of our jmap. Hope this can help you identify the problem.
num #instances #bytes Class description--------------------------------------------------------------------------1: 24405 291733256 byte[]2: 6056 40228984 int[]3: 388799 19966776 char[]4: 101779 16284640 org.codehaus.jackson.impl.ReaderBasedParser5: 369623 11827936 java.lang.String6: 111059 8769424 java.util.HashMap$Entry[]7: 204083 8163320 org.codehaus.jackson.impl.JsonReadContext8: 211374 6763968 java.util.HashMap$Entry9: 102551 5742856 org.codehaus.jackson.util.TextBuffer10: 105854 5080992 java.nio.HeapByteBuffer11: 105821 5079408 java.nio.HeapCharBuffer12: 104578 5019744 java.util.HashMap13: 102551 4922448 org.codehaus.jackson.io.IOContext14: 101782 4885536 org.codehaus.jackson.map.DeserializationConfig15: 101783 4071320 org.codehaus.jackson.sym.CharsToNameCanonicalizer16: 101779 4071160 org.codehaus.jackson.map.deser.StdDeserializationContext17: 101779 4071160 java.io.StringReader18: 101754 4070160 java.util.HashMap$KeyIterator19: 24001 3429704 * ConstMethodKlass20: 139087 3338088 java.lang.Long21: 115338 3215280 java.lang.Object[]22: 24001 2887768 * MethodKlass23: 2147 2414896 * ConstantPoolKlass24: 39532 2017320 * SymbolKlass25: 102735 1643760 java.util.HashMap$KeySet26: 2147 1596304 * InstanceKlassKlass27: 1865 1482184 * ConstantPoolCacheKlass28: 15780 1136160 com.sun.org.apache.xerces.internal.dom.DeferredElementNSImpl29: 27860 1114400 java.util.HashMap$EntryIterator30: 27585 1103400 com.sun.org.apache.xerces.internal.dom.DeferredTextImpl31: 1025 535536 * MethodDataKlass32: 5140 331816 short[]33: 5814 316424 java.lang.String[]34: 13135 315240 java.lang.StringBuilder35: 7723 247136 java.util.AbstractList$ListItr36: 1321 245632 org.apache.avro.io.parsing.Symbol[]37: 2332 242528 java.lang.Class38: 4712 226176 org.apache.avro.Schema$Props39: 6848 219136 java.util.AbstractList$Itr40: 12793 204688 java.lang.Integer41: 6033 193056 com.sun.org.apache.xerces.internal.xni.QName42: 4710 188400 java.util.LinkedHashMap$Entry43: 3190 171896 * System ObjArray44: 5228 167296 java.util.Hashtable$Entry45: 1789 114496 java.net.URL46: 777 100592 java.util.Hashtable$Entry[]47: 156 91104 * ObjArrayKlassKlass48: 3408 81792 java.util.ArrayList49: 450 64800 int[][]50: 90 64080 com.sun.org.apache.xerces.internal.util.SymbolTable$Entry[]51: 2513 60312 org.apache.avro.util.Utf852: 681 59928 java.lang.reflect.Method53: 1060 59360 java.util.LinkedHashMap54: 2160 51840 com.sun.org.apache.xerces.internal.util.XMLStringBuffer55: 1034 49632 org.apache.avro.Schema$Field56: 772 49408 org.codehaus.jackson.impl.WriterBasedGenerator57: 1980 47520 com.sun.org.apache.xerces.internal.xni.XMLString58: 775 43400 org.codehaus.jackson.map.ser.StdSerializerProvider59: 775 43400 org.codehaus.jackson.map.SerializationConfig60: 2596 41536 org.codehaus.jackson.node.TextNode61: 271 39128 java.lang.Object[][]62: 1564 37536 org.apache.avro.generic.GenericData$Record63: 900 36000 com.sun.org.apache.xerces.internal.xni.parser.XMLConfigurationException64: 360 34560 com.sun.org.apache.xerces.internal.xni.QName[]65: 720 34560 com.sun.org.apache.xerces.internal.util.XMLAttributesImpl$Attribute66: 1035 33120 java.util.LinkedHashMap$KeyIterator67: 2064 33024 java.util.HashMap$EntrySet68: 673 32304 java.util.Hashtable69: 90 30960 com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl70: 772 30880 org.codehaus.jackson.impl.ObjectWContext71: 949 30368 org.apache.avro.Schema$LockableArrayList72: 462 29568 java.util.regex.Matcher73: 1217 29208 java.lang.Double74: 900 28800 com.sun.org.apache.xerces.internal.util.AugmentationsImpl$SmallContainer75: 1077 25848 java.io.File76: 1035 24840 org.codehaus.jackson.node.ObjectNode77: 773 24736 org.codehaus.jackson.map.ser.ReadOnlyClassToSerializerMap78: 772 24704 org.codehaus.jackson.io.SegmentedStringWriter79: 772 24704 org.apache.avro.generic.GenericData$Array80: 772 24704 org.codehaus.jackson.impl.RootWContext81: 916 21984 org.apache.avro.Schema$ArraySchema82: 838 20112 org.apache.avro.Schema$StringSchema83: 620 19840 java.util.Vector84: 619 19808 org.apache.avro.io.parsing.Symbol$UnionAdjustAction85: 615 19680 com.sun.org.apache.xerces.internal.util.SymbolTable$Entry86: 180 18720 sun.net.www.protocol.file.FileURLConnection87: 776 18624 org.codehaus.jackson.map.ser.SerializerCache$UntypedKeyRaw88: 774 18576 org.apache.avro.Schema$UnionSchema89: 774 18576 org.codehaus.jackson.map.ser.SerializerCache$TypedKeyRaw90: 772 18528 org.apache.avro.mapred.Pair91: 772 18528 org.apache.avro.io.parsing.Symbol$Sequence92: 772 18528 org.apache.avro.Schema$SeenPair93: 770 18480 org.apache.avro.Schema$NullSchema94: 90 18000 com.sun.org.apache.xerces.internal.dom.DeferredDocumentImpl95: 544 17408 java.util.Stack96: 720 17280 com.sun.org.apache.xerces.internal.dom.DeferredDocumentImpl$RefCount97: 90 17280 com.sun.org.apache.xerces.internal.impl.XMLDTDScannerImpl98: 90 17280 com.sun.org.apache.xerces.internal.parsers.XIncludeAwareParserConfiguration99: 707 16968 org.codehaus.jackson.sym.CharsToNameCanonicalizer$Bucket100: 690 16560 org.codehaus.jackson.node.ArrayNode101: 754 16192 java.lang.Class[]102: 90 15840 com.sun.org.apache.xerces.internal.impl.dtd.XMLNSDTDValidator103: 90 15120 com.sun.org.apache.xerces.internal.xinclude.XIncludeHandler104: 605 14520 java.lang.StringBuffer105: 389 14472 boolean[]106: 450 14400 com.sun.org.apache.xerces.internal.util.XMLResourceIdentifierImpl107: 900 14400 com.sun.org.apache.xerces.internal.util.AugmentationsImpl108: 570 13680 java.net.URLClassLoader$2109: 184 13248 java.lang.reflect.Field110: 92 13248 org.codehaus.jackson.sym.CharsToNameCanonicalizer$Bucket[]111: 90 12960 com.sun.org.apache.xerces.internal.parsers.DOMParser112: 773 12368 org.codehaus.jackson.map.ser.SerializerCache$TypedKeyFull113: 171 12312 java.lang.reflect.Constructor114: 307 12280 java.lang.ref.SoftReference115: 293 11720 java.lang.ref.Finalizer116: 284 11360 java.util.concurrent.ConcurrentHashMap$Segment117: 90 10800 com.sun.org.apache.xerces.internal.impl.XMLEntityManager118: 131 10480 java.util.jar.JarFile$JarFileEntry119: 262 10480 org.apache.avro.util.WeakIdentityHashMap$IdentityWeakReference120: 161 10304 java.util.regex.Pattern121: 299 9568 org.apache.avro.io.parsing.Symbol$Alternative122: 232 9280 sun.misc.FloatingDecimal123: 288 9216 java.util.concurrent.locks.ReentrantLock$NonfairSync124: 90 8640 com.sun.org.apache.xerces.internal.impl.dtd.XMLDTDProcessor125: 90 8640 com.sun.xml.internal.stream.Entity$ScannedEntity126: 152 8512 java.util.regex.Pattern$GroupHead[]127: 288 8472 java.util.concurrent.ConcurrentHashMap$HashEntry[]128: 175 8400 org.apache.avro.Schema$RecordSchema129: 498 7968 java.util.HashSet130: 98 7840 java.net.URI131: 180 7200 com.sun.org.apache.xerces.internal.impl.dtd.XMLSimpleType132: 180 7200 com.sun.org.apache.xerces.internal.impl.dtd.XMLEntityDecl133: 295 7080 org.apache.avro.Schema$Name134: 93 5952 java.util.zip.ZipEntry135: 180 5760 com.sun.org.apache.xerces.internal.util.NamespaceSupport136: 180 5760 com.sun.org.apache.xerces.internal.impl.XMLEntityManager$CharacterBuffer[]137: 180 5760 com.sun.org.apache.xerces.internal.dom.NodeListCache138: 180 5760 com.sun.org.apache.xerces.internal.util.XMLAttributesImpl$Attribute[]139: 239 5736 org.apache.avro.io.parsing.Symbol$WriterUnionAction140: 80 5408 java.lang.reflect.Method[]141: 168 5376 java.lang.ref.WeakReference142: 90 5040 com.sun.org.apache.xerces.internal.impl.XMLEntityScanner143: 90 5040 org.apache.avro.Schema$Names144: 100 4800 org.apache.avro.io.DirectBinaryDecoder145: 117 4680 org.apache.hadoop.io.DataInputBuffer146: 8 4672 * TypeArrayKlassKlass147: 82 4592 java.lang.Package148: 187 4488 java.util.LinkedList$Entry149: 138 4416 java.lang.ThreadLocal$ThreadLocalMap$Entry150: 181 4344 com.sun.org.apache.xerces.internal.impl.Constants$ArrayEnumeration151: 180 4320 com.sun.org.apache.xerces.internal.impl.dv.SecuritySupport$3152: 180 4320 javax.xml.parsers.SecuritySupport$4153: 90 4320 com.sun.org.apache.xerces.internal.impl.XMLEntityManager$RewindableInputStream154: 90 4320 com.sun.org.apache.xerces.internal.util.URI155: 180 4320 com.sun.org.apache.xerces.internal.parsers.SecuritySupport$3156: 90 4320 com.sun.org.apache.xerces.internal.impl.io.UTF8Reader157: 180 4320 sun.net.www.MessageHeader158: 90 4320 com.sun.org.apache.xerces.internal.util.XMLAttributesIteratorImpl159: 90 4320 com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$ElementStack160: 90 4320 com.sun.org.apache.xerces.internal.xinclude.XIncludeNamespaceSupport161: 90 4320 com.sun.org.apache.xerces.internal.dom.DeferredProcessingInstructionImpl162: 180 4320 com.sun.org.apache.xerces.internal.util.IntStack163: 134 4288 org.apache.hadoop.io.DataInputBuffer$Buffer164: 178 4272 java.io.FileInputStream165: 256 4096 java.lang.Byte166: 256 4096 java.lang.Short167: 45 3960 sun.net.www.protocol.jar.JarURLConnection168: 161 3864 java.util.regex.Pattern$Start169: 68 3808 java.beans.MethodDescriptor170: 155 3720 org.apache.avro.Schema$LongSchema171: 116 3712 java.lang.ref.ReferenceQueue172: 154 3696 java.util.regex.Pattern$Slice173: 91 3640 com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl174: 151 3624 java.util.regex.Pattern$TreeInfo175: 45 3600 sun.net.www.protocol.jar.URLJarFile$URLJarFileEntry176: 90 3600 com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$ElementStack2177: 90 3600 com.sun.org.apache.xerces.internal.util.XMLAttributesImpl178: 90 3600 com.sun.org.apache.xerces.internal.xni.parser.XMLInputSource179: 90 3600 com.sun.org.apache.xerces.internal.impl.dtd.XMLDTDDescription180: 90 3600 com.sun.org.apache.xerces.internal.impl.validation.ValidationState181: 90 3600 com.sun.org.apache.xerces.internal.impl.XMLVersionDetector182: 90 3600 com.sun.org.apache.xerces.internal.impl.XMLErrorReporter183: 90 3600 com.sun.xml.internal.stream.XMLEntityStorage184: 90 3600 short[][]185: 90 3600 com.sun.org.apache.xerces.internal.impl.XMLEntityManager$CharacterBufferPool186: 90 3600 com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl187: 88 3520 java.math.BigInteger188: 107 3424 * CompilerICHolderKlass189: 142 3408 java.util.jar.Attributes$Name190: 85 3400 java.util.WeakHashMap$Entry191: 101 3232 org.apache.avro.io.parsing.SkipParser192: 100 3200 java.util.concurrent.ConcurrentHashMap$HashEntry193: 196 3136 java.io.FileDescriptor194: 90 2880 com.sun.org.apache.xerces.internal.util.ParserConfigurationSettings195: 90 2880 org.xml.sax.InputSource196: 180 2880 javax.xml.parsers.SecuritySupport$1197: 90 2880 com.sun.org.apache.xerces.internal.impl.dtd.XMLElementDecl198: 90 2880 com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl$NSContentDriver199: 117 2808 org.apache.log4j.CategoryKey200: 48 2688 java.util.zip.ZipFile$1201: 65 2600 org.apache.log4j.Logger202: 102 2448 org.apache.avro.util.WeakIdentityHashMap203: 76 2432 java.net.URI$Parser204: 101 2424 org.apache.avro.io.ResolvingDecoder205: 101 2424 org.apache.avro.io.parsing.Symbol$Root206: 60 2400 org.codehaus.jackson.map.type.SimpleType207: 42 2352 java.util.jar.JarFile208: 98 2352 com.sun.org.apache.xml.internal.serializer.EncodingInfo209: 12 2304 * KlassKlass210: 48 2304 java.util.zip.ZipFile$ZipFileInputStream211: 51 2192 org.apache.avro.Schema$Field[]212: 90 2160 com.sun.org.apache.xerces.internal.parsers.SecuritySupport$4213: 90 2160 com.sun.org.apache.xerces.internal.impl.dtd.XMLAttributeDecl214: 90 2160 javax.xml.parsers.SecuritySupport$2215: 90 2160 com.sun.org.apache.xerces.internal.dom.SecuritySupport$4216: 90 2160 com.sun.org.apache.xerces.internal.impl.msg.XMLMessageFormatter217: 90 2160 com.sun.org.apache.xerces.internal.impl.dtd.DTDGrammarBucket218: 90 2160 com.sun.org.apache.xerces.internal.util.SecurityManager219: 90 2160 com.sun.org.apache.xerces.internal.util.SymbolTable220: 90 2160 com.sun.org.apache.xerces.internal.xinclude.XIncludeMessageFormatter221: 90 2160 com.sun.org.apache.xerces.internal.impl.validation.ValidationManager222: 128 2048 java.lang.Character223: 49 1960 java.io.BufferedInputStream224: 48 1920 sun.misc.URLClassPath$JarLoader225: 118 1888 java.lang.ref.ReferenceQueue$Lock226: 76 1864 java.lang.reflect.Constructor[]227: 16 1792 java.lang.ThreadLocal$ThreadLocalMap$Entry[]228: 55 1760 java.io.FilePermission229: 51 1632 org.apache.avro.io.parsing.Symbol$FieldOrderAction230: 100 1600 org.apache.avro.io.DirectBinaryDecoder$ByteReader231: 11 1584 java.text.DecimalFormat232: 22 1584 java.beans.PropertyDescriptor233: 16 1536 org.apache.hadoop.mapred.IFile$Writer234: 48 1536 sun.misc.URLClassPath$JarLoader$2235: 48 1536 org.apache.log4j.ProvisionNode236: 23 1504 java.util.concurrent.ConcurrentHashMap$Segment[]237: 61 1464 org.apache.commons.logging.impl.Log4JLogger238: 90 1440 com.sun.org.apache.xerces.internal.parsers.SecuritySupport$2239: 90 1440 com.sun.org.apache.xerces.internal.parsers.SecuritySupport$1240: 45 1440 sun.misc.URLClassPath$FileLoader$1241: 90 1440 com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$XMLDeclDriver242: 90 1440 com.sun.org.apache.xerces.internal.impl.dv.dtd.DTDDVFactoryImpl243: 90 1440 com.sun.org.apache.xerces.internal.impl.dv.SecuritySupport$2244: 90 1440 com.sun.org.apache.xerces.internal.impl.dv.SecuritySupport$1245: 90 1440 com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver246: 30 1440 java.util.StringTokenizer247: 90 1440 com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$TrailingMiscDriver248: 25 1400 java.util.ResourceBundle$CacheKey249: 58 1392 java.util.LinkedList250: 29 1392 java.util.Properties251: 56 1344 sun.reflect.NativeConstructorAccessorImpl252: 42 1344 java.util.zip.Inflater253: 78 1248 java.lang.Object254: 25 1200 java.util.ResourceBundle$BundleReference255: 30 1200 java.math.BigDecimal256: 8 1192 long[]257: 4 1112 java.lang.Long[]258: 23 1104 java.util.concurrent.ConcurrentHashMap259: 34 1088 java.util.concurrent.locks.AbstractQueuedSynchronizer$Node260: 34 1088 org.apache.avro.mapred.AvroSerialization$AvroWrapperSerializer261: 45 1080 sun.net.www.protocol.jar.JarURLConnection$JarURLInputStream262: 67 1072 org.apache.hadoop.fs.Path263: 11 1072 java.util.WeakHashMap$Entry[]264: 2 1064 java.lang.Integer[]265: 33 1056 org.apache.hadoop.io.DataOutputBuffer266: 33 1056 java.util.concurrent.SynchronousQueue$TransferStack$SNode267: 1 1040 java.lang.Byte[]268: 18 1040 java.lang.reflect.Field[]269: 1 1040 java.lang.Short[]270: 43 1032 java.lang.ProcessEnvironment$Variable271: 43 1032 java.lang.ProcessEnvironment$Value272: 43 1032 com.hadoop.compression.lzo.LzoCompressor$CompressionStrategy273: 16 1024 org.apache.hadoop.mapred.Task$CombineValuesIterator274: 32 1024 org.apache.avro.mapred.AvroSerialization$AvroWrapperDeserializer275: 42 1008 java.util.zip.ZStreamRef276: 63 1008 sun.reflect.DelegatingConstructorAccessorImpl277: 17 952 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$InMemValBytes278: 7 904 java.beans.MethodDescriptor[]279: 22 880 java.io.ObjectStreamField280: 35 840 org.apache.hadoop.io.nativeio.Errno281: 34 816 org.apache.avro.io.BinaryEncoder282: 3 816 org.codehaus.jackson.sym.Name[]283: 34 816 org.apache.avro.specific.SpecificDatumWriter284: 25 800 java.util.LinkedList$ListItr285: 25 800 java.util.ResourceBundle$LoaderReference286: 33 792 org.apache.hadoop.io.DataOutputBuffer$Buffer287: 33 792 org.apache.avro.specific.SpecificDatumReader288: 7 784 java.lang.Thread289: 49 784 org.apache.avro.mapred.AvroKey290: 16 768 java.util.concurrent.FutureTask$Sync291: 48 768 sun.net.www.ParseUtil292: 31 744 org.apache.hadoop.io.serializer.SerializationFactory293: 23 736 java.security.AccessControlContext294: 45 720 java.io.FilePermission$1295: 30 720 sun.reflect.generics.tree.SimpleClassTypeSignature296: 11 704 java.text.DecimalFormatSymbols297: 12 672 sun.reflect.DelegatingClassLoader298: 42 672 java.lang.ThreadLocal299: 16 640 org.apache.hadoop.ipc.Client$Call300: 16 640 org.apache.hadoop.conf.Configuration301: 16 640 org.apache.hadoop.io.compress.BlockCompressorStream302: 20 640 java.util.regex.Pattern$Curly303: 20 640 org.apache.hadoop.mapred.Counters$Counter304: 19 608 java.util.Locale305: 25 600 java.util.regex.Pattern$GroupHead306: 5 600 java.net.SocksSocketImpl307: 15 600 sun.nio.ch.SelectionKeyImpl308: 25 600 java.util.regex.Pattern$GroupTail309: 9 576 java.nio.DirectByteBuffer310: 18 576 org.apache.hadoop.fs.FSDataOutputStream311: 18 576 org.apache.hadoop.fs.FSDataOutputStream$PositionCache312: 5 560 java.util.GregorianCalendar313: 5 560 sun.nio.ch.SocketChannelImpl314: 34 544 org.apache.avro.io.BinaryEncoder$SimpleByteWriter315: 17 544 org.apache.hadoop.util.DataChecksum316: 33 528 org.apache.avro.mapred.AvroSerialization317: 11 528 sun.nio.cs.UTF_8$Encoder318: 22 528 sun.reflect.NativeMethodAccessorImpl319: 1 528 java.lang.Character[]320: 33 528 org.apache.avro.mapred.AvroValue321: 16 512 org.apache.hadoop.ipc.RPC$Invocation322: 16 512 org.apache.hadoop.mapred.IFileOutputStream323: 16 512 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$MRResultIterator324: 16 512 com.ngmoco.ngpipes.sourcing.NgBucketingEventCountingReducer325: 16 512 com.ngmoco.ngpipes.sourcing.NgBucketingEventCountingCombiner326: 16 512 org.apache.avro.mapred.HadoopCombiner$PairCollector327: 30 504 sun.reflect.generics.tree.TypeArgument[]328: 31 496 org.apache.hadoop.io.serializer.WritableSerialization329: 6 480 org.apache.hadoop.fs.DF330: 30 480 org.apache.avro.io.parsing.ResolvingGrammarGenerator331: 5 480 sun.util.calendar.Gregorian$Date332: 12 480 java.security.ProtectionDomain333: 19 456 com.ngmoco.ngpipes.utils.NgPipesGlobals$EventClassCounter334: 4 440 java.math.BigInteger[]335: 11 440 java.text.DigitList336: 18 432 java.security.ProtectionDomain[]337: 18 432 java.text.DateFormat$Field338: 18 432 org.apache.avro.io.parsing.Symbol$Terminal339: 13 416 java.security.CodeSource340: 13 416 org.codehaus.jackson.JsonToken341: 17 408 java.util.regex.Pattern$Single342: 17 408 java.util.regex.Pattern$BitClass343: 1 408 com.sun.org.apache.xml.internal.serializer.EncodingInfo[]344: 2 400 org.apache.hadoop.ipc.Client$Connection345: 16 384 java.util.concurrent.Executors$RunnableAdapter346: 12 384 java.io.FileNotFoundException347: 8 384 java.util.TreeMap348: 16 384 org.apache.avro.mapred.HadoopReducerBase$ReduceIterable349: 8 384 java.util.WeakHashMap350: 12 384 java.net.Inet4Address351: 12 384 java.util.regex.Pattern$Branch352: 16 384 org.apache.hadoop.ipc.Client$Connection$3353: 16 384 org.apache.avro.mapred.HadoopCombiner354: 16 384 org.apache.hadoop.io.ObjectWritable355: 2 384 com.hadoop.compression.lzo.LzoCompressor$CompressionStrategy[]356: 23 368 sun.reflect.DelegatingMethodAccessorImpl357: 15 360 org.apache.hadoop.mapred.Task$Counter358: 9 360 sun.misc.Cleaner359: 15 360 java.io.Closeable[]360: 15 360 org.apache.avro.io.parsing.ResolvingGrammarGenerator$LitS2361: 15 360 sun.nio.ch.EPollArrayWrapper$Updator362: 15 360 java.lang.ThreadLocal$ThreadLocalMap363: 7 360 java.beans.PropertyDescriptor[]364: 11 352 java.security.Permissions365: 7 336 java.beans.BeanDescriptor366: 7 336 org.apache.hadoop.fs.permission.FsAction[]367: 14 336 org.apache.avro.Schema$Type368: 14 336 org.apache.hadoop.mapred.JvmTask369: 4 320 java.nio.ByteBuffer[]370: 10 320 java.security.BasicPermissionCollection371: 5 320 org.apache.hadoop.fs.RawLocalFileSystem$RawLocalFileStatus372: 13 312 java.util.concurrent.atomic.AtomicLong373: 3 312 double[]374: 13 312 java.lang.RuntimePermission375: 13 312 org.apache.log4j.Level376: 12 288 org.codehaus.jackson.map.SerializationConfig$Feature377: 18 288 org.apache.hadoop.util.PureJavaCrc32378: 12 288 java.util.Arrays$ArrayList379: 12 288 java.io.ExpiringCache$Entry380: 12 288 java.util.regex.Pattern$Node[]381: 12 288 java.util.regex.Pattern$CharProperty$1382: 11 288 java.io.ObjectStreamField[]383: 12 288 sun.reflect.annotation.AnnotationInvocationHandler384: 4 288 org.apache.log4j.spi.LoggingEvent385: 7 280 java.beans.GenericBeanInfo386: 11 264 org.apache.hadoop.security.UserGroupInformation387: 11 264 java.io.FileOutputStream388: 11 264 sun.misc.MetaIndex389: 11 264 org.apache.avro.Schema$FloatSchema390: 3 264 org.apache.hadoop.hdfs.protocol.DatanodeInfo391: 11 264 org.apache.avro.Schema$IntSchema392: 8 256 java.lang.OutOfMemoryError393: 16 256 java.util.concurrent.FutureTask394: 8 256 sun.misc.ProxyGenerator$PrimitiveTypeInfo395: 8 256 javax.security.auth.Subject$ClassSet396: 14 256 java.security.Principal[]397: 8 256 sun.reflect.UnsafeQualifiedStaticObjectFieldAccessorImpl398: 4 256 java.text.SimpleDateFormat399: 8 248 java.lang.Boolean[]400: 10 240 java.util.jar.Manifest401: 5 240 sun.nio.ch.SocketAdaptor402: 10 240 javax.security.auth.Subject$SecureSet$1403: 10 240 java.net.InetSocketAddress404: 10 240 java.io.FilePermissionCollection405: 6 240 sun.nio.cs.UTF_8$Decoder406: 10 240 java.util.Collections$SynchronizedSet407: 6 240 java.util.IdentityHashMap408: 10 240 org.codehaus.jackson.map.DeserializationConfig$Feature409: 7 224 java.util.Collections$UnmodifiableMap410: 4 224 sun.util.calendar.ZoneInfo411: 4 224 java.text.DateFormatSymbols412: 7 224 org.apache.avro.io.BinaryDecoder$BufferAccessor413: 7 224 java.lang.ClassLoader$NativeLibrary414: 9 216 java.util.logging.Level415: 9 216 org.apache.avro.Schema$BytesSchema416: 9 216 org.apache.hadoop.io.Text417: 3 216 sun.net.www.protocol.jar.URLJarFile418: 6 216 org.apache.hadoop.mapred.TaskLog$LogName[]419: 9 216 javax.security.auth.Subject$SecureSet420: 9 216 java.nio.DirectByteBuffer$Deallocator421: 13 208 java.util.jar.Attributes422: 5 200 java.util.HashMap$ValueIterator423: 5 200 java.util.TreeMap$Entry424: 6 192 java.util.Random425: 6 192 org.apache.hadoop.fs.permission.FsPermission$2426: 8 192 org.apache.hadoop.mapred.TaskStatus$State427: 4 192 org.apache.hadoop.mapred.JobConf428: 8 192 java.lang.annotation.ElementType429: 8 192 org.apache.hadoop.fs.permission.FsAction430: 8 192 java.util.regex.Pattern$8431: 8 192 com.ngmoco.ngpipes.sourcing.NgBucketingEventCountingReducer$TYPE_COUNTERS432: 8 192 java.math.RoundingMode433: 6 192 java.lang.annotation.ElementType[]434: 12 192 java.security.ProtectionDomain$Key435: 12 192 java.util.regex.Pattern$BranchConn436: 12 192 java.util.Formatter$Flags437: 2 184 java.text.DateFormat$Field[]438: 1 184 org.apache.hadoop.mapred.MapTask$MapOutputBuffer439: 11 176 java.text.NumberFormat$Field440: 7 168 org.codehaus.jackson.JsonParser$Feature441: 3 168 org.codehaus.jackson.sym.BytesToNameCanonicalizer442: 7 168 org.apache.avro.io.parsing.Symbol$Kind443: 7 168 java.io.BufferedOutputStream444: 7 168 org.codehaus.jackson.annotate.JsonMethod445: 3 168 org.codehaus.jackson.map.ObjectMapper446: 7 168 org.apache.avro.Schema$DoubleSchema447: 3 168 sun.nio.cs.StreamEncoder448: 5 160 org.apache.hadoop.mapred.TaskLog$LogFileDetail449: 5 160 org.codehaus.jackson.JsonGenerator$Feature450: 10 160 sun.reflect.BootstrapConstructorAccessorImpl451: 5 160 org.apache.hadoop.fs.FileSystem$Cache$Key452: 10 160 sun.reflect.generics.tree.ClassTypeSignature453: 1 160 org.apache.hadoop.io.nativeio.Errno[]454: 10 160 java.util.concurrent.atomic.AtomicInteger455: 5 160 java.nio.channels.SelectionKey[]456: 5 160 org.apache.hadoop.mapred.Counters$Group457: 5 160 sun.reflect.annotation.AnnotationType458: 5 160 org.apache.hadoop.fs.permission.FsPermission459: 2 144 java.math.BigDecimal[]460: 6 144 java.util.regex.Pattern$Ctype461: 1 144 sun.reflect.MethodAccessorGenerator462: 6 144 org.apache.avro.Schema$BooleanSchema463: 6 144 javax.security.auth.login.AppConfigurationEntry464: 6 144 org.codehaus.jackson.annotate.JsonAutoDetect$Visibility465: 6 144 java.net.URLClassLoader$1466: 6 144 org.apache.avro.Schema$MapSchema467: 6 144 org.apache.hadoop.security.UserGroupInformation$AuthenticationMethod468: 6 144 java.lang.StringCoding$StringEncoder469: 6 144 org.apache.hadoop.mapred.TaskStatus$Phase470: 4 128 sun.util.LocaleServiceProviderPool471: 2 128 java.util.logging.Logger472: 4 128 org.apache.avro.io.BinaryDecoder$ByteArrayByteSource473: 4 128 org.apache.log4j.helpers.PatternParser$LiteralPatternConverter474: 4 128 org.apache.avro.io.BinaryDecoder475: 4 128 sun.reflect.generics.reflectiveObjects.TypeVariableImpl476: 1 128 org.apache.hadoop.hdfs.DFSClient$BlockReader477: 1 128 org.apache.hadoop.mapred.MapTask478: 4 128 java.util.concurrent.atomic.AtomicReferenceFieldUpdater$AtomicReferenceFieldUpdaterImpl479: 4 128 sun.reflect.ClassFileAssembler480: 5 120 org.apache.hadoop.mapred.TaskLog$LogName481: 1 120 org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer482: 1 120 org.apache.hadoop.mapred.Child$3483: 1 120 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread484: 5 120 java.util.logging.LogManager$LogNode485: 3 120 org.apache.hadoop.security.User486: 5 120 java.util.Date487: 1 120 org.apache.hadoop.mapred.Child$2488: 3 120 org.codehaus.jackson.annotate.JsonMethod[]489: 1 120 java.util.logging.LogManager$Cleaner490: 2 112 java.io.ExpiringCache$1491: 1 112 java.lang.ref.Finalizer$FinalizerThread492: 1 112 java.lang.ref.Reference$ReferenceHandler493: 4 96 org.codehaus.jackson.util.BufferRecycler$CharBufferType494: 2 96 org.apache.hadoop.fs.LocalFileSystem495: 4 96 org.apache.avro.io.parsing.Symbol$ImplicitAction496: 4 96 org.apache.hadoop.metrics.util.MetricsTimeVaryingRate$Metrics497: 2 96 org.apache.hadoop.mapred.Task$FileSystemStatisticUpdater498: 4 96 sun.reflect.generics.tree.FormalTypeParameter499: 3 96 org.apache.hadoop.security.SaslRpcServer$AuthMethod500: 1 96 com.ngmoco.ngpipes.utils.NgPipesGlobals$EventClassCounter[]501: 4 96 java.util.regex.Pattern$2502: 3 96 sun.misc.URLClassPath503: 4 96 sun.reflect.generics.tree.FieldTypeSignature[]504: 2 96 javax.security.auth.SubjectDomainCombiner$WeakKeyValueMap505: 4 96 com.ngmoco.ngpipes.sourcing.NgBucketingEventCountingCombiner$JobType506: 2 96 java.lang.ThreadGroup507: 3 96 java.util.RandomAccessSubList508: 2 96 org.apache.hadoop.ipc.Client$ConnectionId509: 6 96 java.util.TreeSet510: 3 96 java.security.PrivilegedActionException511: 6 96 sun.reflect.generics.tree.TypeVariableSignature512: 4 96 java.util.Formatter$FixedString513: 4 96 java.util.Formatter$FormatString[]514: 2 96 java.util.Formatter$FormatSpecifier515: 4 96 sun.reflect.ByteVectorImpl516: 6 96 java.util.concurrent.atomic.AtomicBoolean517: 4 96 sun.nio.ch.Util$BufferCache518: 3 96 java.io.OutputStreamWriter519: 2 96 org.apache.hadoop.metrics.spi.AbstractMetricsContext$TagMap520: 2 96 org.apache.hadoop.mapred.TaskStatus$State[]521: 1 96 org.apache.avro.file.DataFileReader522: 4 96 javax.security.auth.Subject$ClassSet$1523: 3 96 java.io.DataInputStream524: 3 96 java.lang.ClassNotFoundException525: 3 96 javax.security.auth.Subject526: 2 96 org.apache.hadoop.metrics.spi.AbstractMetricsContext$RecordMap527: 2 96 java.io.BufferedWriter528: 3 96 org.apache.hadoop.net.SocketInputStream$Reader529: 3 96 java.util.Collections$SynchronizedMap530: 3 96 java.io.DataOutputStream531: 1 88 org.apache.hadoop.hdfs.DFSClient$DFSInputStream532: 5 80 java.nio.channels.spi.AbstractInterruptibleChannel$1533: 2 80 org.apache.hadoop.fs.FileSystem$Statistics534: 1 80 org.apache.hadoop.hdfs.DFSClient535: 2 80 java.util.Formatter$Flags[]536: 2 80 java.util.PropertyResourceBundle537: 2 80 org.apache.hadoop.mapred.TaskStatus$Phase[]538: 1 80 sun.misc.Launcher$ExtClassLoader539: 1 80 com.hadoop.compression.lzo.LzoCompressor540: 2 80 java.io.ExpiringCache541: 2 80 org.codehaus.jackson.map.type.MapType542: 1 80 org.apache.hadoop.mapred.Task$Counter[]543: 2 80 org.apache.hadoop.metrics.spi.NullContext544: 1 80 java.util.concurrent.ThreadPoolExecutor545: 2 80 org.codehaus.jackson.annotate.JsonAutoDetect$Visibility[]546: 2 80 com.sun.xml.internal.stream.util.BufferAllocator547: 5 80 java.util.HashMap$Values548: 1 80 org.apache.hadoop.mapred.TaskLogAppender549: 1 80 org.apache.hadoop.mapred.MapTaskStatus550: 3 80 javax.security.auth.login.AppConfigurationEntry[]551: 2 80 java.lang.Thread[]552: 1 72 sun.misc.Launcher$AppClassLoader553: 3 72 java.util.Collections$UnmodifiableRandomAccessList554: 3 72 org.apache.hadoop.hdfs.protocol.FSConstants$SafeModeAction555: 1 72 java.util.logging.LogManager$RootLogger556: 3 72 org.codehaus.jackson.map.ser.BasicSerializerFactory$SerializerMapping557: 3 72 org.codehaus.jackson.map.ser.SerializerCache558: 3 72 org.apache.avro.io.parsing.Symbol$Repeater559: 3 72 org.codehaus.jackson.util.BufferRecycler$ByteBufferType560: 3 72 org.codehaus.jackson.map.deser.StdDeserializerProvider561: 3 72 org.apache.hadoop.mapred.JobID562: 1 72 org.apache.avro.Schema$Type[]563: 3 72 java.net.InetAddress[]564: 1 72 sun.nio.ch.EPollSelectorImpl565: 3 72 org.apache.hadoop.mapred.TaskID566: 3 72 org.apache.hadoop.ipc.Status567: 3 72 sun.misc.Signal568: 3 72 org.apache.hadoop.mapred.TaskAttemptID569: 3 72 org.apache.hadoop.hdfs.protocol.FSConstants$DatanodeReportType570: 3 72 java.net.InetAddress$CacheEntry571: 3 72 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject572: 3 72 org.apache.hadoop.hdfs.protocol.FSConstants$UpgradeAction573: 3 72 org.apache.avro.Schema$Field$Order574: 3 72 java.lang.annotation.RetentionPolicy575: 3 72 java.util.SubList$1576: 3 72 org.apache.hadoop.hdfs.protocol.DatanodeInfo$AdminStates577: 3 72 sun.misc.URLClassPath$FileLoader578: 1 72 org.codehaus.jackson.JsonToken[]579: 2 64 org.apache.hadoop.net.SocketOutputStream$Writer580: 2 64 java.util.Formatter581: 1 64 org.apache.hadoop.metrics.jvm.JvmMetrics582: 4 64 sun.net.www.protocol.jar.Handler583: 1 64 float[]584: 2 64 org.apache.hadoop.security.token.Token585: 2 64 sun.reflect.generics.repository.ClassRepository586: 2 64 java.lang.ref.ReferenceQueue$Null587: 2 64 java.io.PrintStream588: 2 64 org.apache.avro.file.DataFileStream$DataBlock589: 2 64 java.lang.annotation.RetentionPolicy[]590: 2 64 org.apache.avro.Schema$Field$Order[]591: 2 64 org.apache.hadoop.fs.RawLocalFileSystem592: 2 64 javax.security.auth.SubjectDomainCombiner593: 2 64 org.apache.hadoop.metrics.util.MetricsTimeVaryingRate$MinMax594: 2 64 org.apache.hadoop.mapred.SortedRanges$Range595: 4 64 java.util.LinkedHashSet596: 1 64 com.ngmoco.ngpipes.utils.bucketingeventcounting.BucketingEventHandler[]597: 2 64 org.apache.hadoop.hdfs.protocol.DatanodeInfo$AdminStates[]598: 2 64 org.apache.log4j.helpers.PatternParser$BasicPatternConverter599: 4 64 $Proxy4600: 2 64 org.codehaus.jackson.map.MappingJsonFactory601: 4 64 java.util.concurrent.locks.ReentrantLock602: 4 64 com.sun.org.apache.xml.internal.serializer.CharInfo$CharKey603: 1 64 org.codehaus.jackson.map.SerializationConfig$Feature[]604: 2 64 org.apache.hadoop.metrics.spi.MetricsRecordImpl605: 2 64 org.apache.hadoop.metrics.util.MetricsTimeVaryingRate606: 4 64 javax.security.auth.login.AppConfigurationEntry$LoginModuleControlFlag607: 4 64 $Proxy3608: 1 56 java.lang.Runnable[]609: 1 56 com.sun.security.auth.module.UnixLoginModule610: 2 56 sun.reflect.generics.tree.ClassTypeSignature[]611: 1 56 java.nio.ByteBufferAsLongBufferB612: 1 56 sun.nio.ch.EPollArrayWrapper613: 1 56 sun.awt.AppContext614: 1 56 org.codehaus.jackson.util.InternCache615: 1 56 java.util.ResourceBundle$RBClassLoader616: 1 56 javax.security.auth.login.LoginContext617: 1 56 org.codehaus.jackson.map.DeserializationConfig$Feature[]618: 1 48 java.util.concurrent.TimeUnit[]619: 1 48 org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext620: 2 48 org.apache.log4j.helpers.OnlyOnceErrorHandler621: 2 48 org.codehaus.jackson.map.ser.StdSerializers$BooleanSerializer622: 3 48 org.apache.hadoop.fs.LocalDirAllocator623: 2 48 java.net.InetAddress$Cache$Type624: 2 48 org.codehaus.jackson.map.deser.StdDeserializer$CharacterDeserializer625: 2 48 sun.misc.NativeSignalHandler626: 2 48 javax.security.auth.login.LoginContext$ModuleInfo627: 1 48 org.codehaus.jackson.JsonParser$Feature[]628: 2 48 sun.reflect.generics.tree.ClassSignature629: 2 48 org.apache.avro.mapred.AvroKeyComparator630: 1 48 org.apache.hadoop.hdfs.DistributedFileSystem631: 2 48 org.codehaus.jackson.map.deser.StdDeserializer$LongDeserializer632: 1 48 sun.nio.cs.StreamDecoder633: 1 48 org.apache.log4j.Hierarchy634: 2 48 org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream635: 3 48 org.apache.hadoop.net.SocketInputStream636: 3 48 com.sun.org.apache.xerces.internal.impl.dv.dtd.ListDatatypeValidator637: 2 48 sun.reflect.generics.scope.ClassScope638: 2 48 sun.reflect.generics.tree.FormalTypeParameter[]639: 2 48 sun.misc.JarIndex640: 2 48 org.codehaus.jackson.map.deser.StdDeserializer$CalendarDeserializer641: 2 48 org.apache.hadoop.mapred.Counters642: 3 48 java.text.AttributedCharacterIterator$Attribute643: 2 48 org.apache.hadoop.ipc.ConnectionHeader644: 1 48 org.apache.log4j.helpers.PatternParser645: 2 48 org.codehaus.jackson.map.deser.StdDeserializer$FloatDeserializer646: 2 48 sun.awt.MostRecentKeyValue647: 2 48 java.lang.management.ManagementPermission648: 1 48 com.ngmoco.ngpipes.sourcing.NgBucketingEventCountingReducer$TYPE_COUNTERS[]649: 3 48 java.nio.charset.CodingErrorAction650: 2 48 java.nio.charset.CoderResult651: 2 48 java.lang.reflect.TypeVariable[]652: 2 48 com.sun.org.apache.xerces.internal.impl.RevalidationHandler[]653: 2 48 org.codehaus.jackson.map.deser.StdDeserializer$DoubleDeserializer654: 2 48 java.net.InetAddress$Cache655: 2 48 sun.reflect.Label$PatchInfo656: 2 48 java.util.Currency657: 2 48 org.codehaus.jackson.map.deser.StdDeserializer$ShortDeserializer658: 1 48 org.apache.avro.io.parsing.Symbol$Kind[]659: 2 48 sun.reflect.generics.factory.CoreReflectionFactory660: 1 48 java.io.BufferedReader661: 1 48 org.apache.hadoop.mapred.MapTask$TrackedRecordReader662: 2 48 org.apache.hadoop.fs.RawLocalFileSystem$TrackingFileInputStream663: 1 48 java.util.Hashtable$Enumerator664: 2 48 org.codehaus.jackson.map.deser.StdDeserializer$IntegerDeserializer665: 2 48 org.apache.hadoop.ipc.RPC$Invoker666: 1 48 java.math.RoundingMode[]667: 2 48 org.codehaus.jackson.map.deser.StdDeserializer$BooleanDeserializer668: 2 48 org.apache.hadoop.ipc.Client$Connection$PingInputStream669: 2 48 java.util.regex.Pattern$6670: 1 48 org.codehaus.jackson.map.deser.MapDeserializer671: 2 48 org.codehaus.jackson.map.deser.StdDeserializer$ByteDeserializer672: 1 40 sun.util.resources.CalendarData673: 1 40 sun.text.resources.FormatData_en_US674: 1 40 org.apache.hadoop.util.Progress675: 1 40 org.codehaus.jackson.map.ser.MapSerializer676: 1 40 java.util.ResourceBundle$1677: 1 40 org.apache.hadoop.security.UserGroupInformation$AuthenticationMethod[]678: 1 40 org.apache.log4j.helpers.PatternParser$CategoryPatternConverter679: 1 40 sun.text.resources.FormatData_en680: 1 40 org.apache.hadoop.mapred.FileSplit681: 1 40 sun.util.resources.CurrencyNames682: 1 40 org.apache.avro.mapred.HadoopMapper$MapCollector683: 1 40 org.apache.commons.logging.impl.LogFactoryImpl684: 1 40 org.apache.avro.mapred.AvroRecordReader685: 1 40 java.util.logging.LogManager686: 1 40 sun.nio.cs.StandardCharsets$Classes687: 1 40 org.apache.hadoop.mapred.IndexRecord688: 1 40 sun.nio.cs.StandardCharsets$Cache689: 1 40 org.apache.log4j.helpers.PatternParser$DatePatternConverter690: 1 40 sun.util.resources.CalendarData_en691: 1 40 org.apache.hadoop.mapred.Task$TaskReporter692: 1 40 org.apache.hadoop.security.KerberosName$Rule693: 1 40 org.codehaus.jackson.map.util.StdDateFormat694: 1 40 org.codehaus.jackson.JsonGenerator$Feature[]695: 1 40 sun.util.resources.CurrencyNames_en_US696: 1 40 org.apache.hadoop.mapred.TaskAttemptContext697: 1 40 org.apache.hadoop.metrics.jvm.EventCounter698: 1 40 org.apache.log4j.spi.RootLogger699: 1 40 com.sun.security.auth.module.UnixSystem700: 1 40 java.util.IdentityHashMap$ValueIterator701: 1 40 org.apache.hadoop.hdfs.protocol.Block702: 1 40 com.sun.org.apache.xerces.internal.dom.CoreDOMImplementationImpl703: 1 40 org.apache.hadoop.mapred.JobContext704: 1 40 org.apache.hadoop.fs.DF[]705: 1 40 java.util.concurrent.ThreadPoolExecutor$Worker706: 1 40 org.apache.hadoop.ipc.Client707: 1 40 com.sun.org.apache.xml.internal.serializer.CharInfo708: 1 40 sun.text.resources.FormatData709: 1 40 org.apache.hadoop.mapred.Task$OldCombinerRunner710: 2 40 java.io.File[]711: 1 40 sun.security.util.AuthResources712: 1 40 sun.nio.cs.StandardCharsets$Aliases713: 1 40 org.apache.hadoop.hdfs.protocol.LocatedBlock714: 2 32 $Proxy1715: 1 32 org.apache.hadoop.mapred.Child$4716: 1 32 java.io.UnixFileSystem717: 1 32 java.lang.InterruptedException718: 1 32 java.lang.ArithmeticException719: 1 32 com.ngmoco.ngpipes.sourcing.NgBucketingEventCountingCombiner$JobType[]720: 1 32 java.util.concurrent.SynchronousQueue721: 1 32 org.apache.hadoop.hdfs.protocol.FSConstants$SafeModeAction[]722: 2 32 java.util.regex.Pattern$Dot723: 2 32 sun.nio.ch.SocketChannelImpl$1724: 1 32 java.util.TreeMap$KeyIterator725: 2 32 org.apache.hadoop.net.SocketOutputStream726: 1 32 org.apache.hadoop.security.SaslRpcServer$AuthMethod[]727: 1 32 org.apache.hadoop.hdfs.protocol.LocatedBlocks728: 2 32 org.apache.hadoop.net.StandardSocketFactory729: 2 32 sun.nio.ch.SocketOptsImpl$IP$TCP730: 1 32 org.apache.hadoop.ipc.Status[]731: 1 32 org.codehaus.jackson.map.ser.ContainerSerializers$IndexedListSerializer732: 2 32 org.apache.avro.mapred.AvroWrapper733: 1 32 org.apache.hadoop.io.retry.RetryPolicies$RetryUpToMaximumCountWithFixedSleep734: 1 32 byte[][]735: 1 32 sun.misc.HexDumpEncoder736: 2 32 com.sun.org.apache.xerces.internal.impl.dv.dtd.ENTITYDatatypeValidator737: 1 32 org.apache.hadoop.hdfs.protocol.FSConstants$DatanodeReportType[]738: 1 32 java.lang.ClassCastException739: 1 32 java.lang.ref.Reference740: 1 32 org.apache.hadoop.hdfs.DFSClient$DFSDataInputStream741: 1 32 org.apache.hadoop.mapred.TaskLogsTruncater742: 2 32 java.util.logging.Handler[]743: 1 32 org.apache.log4j.helpers.QuietWriter744: 1 32 org.codehaus.jackson.map.ser.ArraySerializers$ObjectArraySerializer745: 1 32 sun.management.VMManagementImpl746: 1 32 java.lang.NullPointerException747: 1 32 org.apache.avro.io.BinaryData$Decoders748: 1 32 sun.reflect.MethodAccessorGenerator$1749: 1 32 org.codehaus.jackson.util.BufferRecycler$CharBufferType[]750: 2 32 sun.nio.ch.OptionAdaptor751: 1 32 java.lang.RuntimeException752: 1 32 org.apache.hadoop.io.NullWritable$Comparator753: 1 32 org.codehaus.jackson.JsonFactory754: 1 32 org.apache.hadoop.security.UserGroupInformation$UgiMetrics755: 1 32 org.apache.log4j.PatternLayout756: 1 32 sun.misc.SoftCache757: 1 32 org.apache.hadoop.security.Groups758: 1 32 java.lang.VirtualMachineError759: 1 32 java.text.DontCareFieldPosition760: 2 32 java.lang.Boolean761: 1 32 org.apache.hadoop.hdfs.protocol.DatanodeInfo[]762: 1 32 org.codehaus.jackson.map.introspect.VisibilityChecker$Std763: 1 32 org.apache.hadoop.io.Text$Comparator764: 1 32 org.apache.hadoop.mapred.SortedRanges$SkipRangeIterator765: 2 32 javax.security.auth.Subject$1766: 1 32 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$BlockingBuffer767: 1 32 java.beans.PropertyChangeSupport768: 1 32 org.apache.hadoop.io.UTF8$Comparator769: 2 32 java.lang.Shutdown$Lock770: 1 32 char[][]771: 1 32 sun.nio.ch.AllocatedNativeObject772: 2 32 java.lang.annotation.Annotation[]773: 1 32 org.codehaus.jackson.map.ser.ContainerSerializers$CollectionSerializer774: 2 32 $Proxy2775: 1 32 org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool$SelectorInfo776: 1 32 java.lang.ThreadGroup[]777: 2 32 java.nio.ByteOrder778: 1 32 org.apache.hadoop.hdfs.protocol.FSConstants$UpgradeAction[]779: 1 32 org.codehaus.jackson.util.BufferRecycler$ByteBufferType[]780: 1 32 sun.nio.cs.StandardCharsets781: 2 32 org.codehaus.jackson.map.ser.StdKeySerializer782: 1 32 java.lang.OutOfMemoryError[]783: 2 32 org.codehaus.jackson.map.ser.StdSerializers$NumberSerializer784: 1 32 com.ngmoco.ngpipes.sourcing.NgBucketingEventCountingMapper785: 1 24 org.apache.hadoop.mapred.JvmContext786: 1 24 java.util.zip.CheckedOutputStream787: 1 24 org.apache.avro.specific.SpecificData788: 1 24 org.apache.avro.mapred.FsInput789: 1 24 java.lang.reflect.ReflectPermission790: 1 24 org.codehaus.jackson.map.ser.ArraySerializers$BooleanArraySerializer791: 1 24 org.codehaus.jackson.map.ext.DOMSerializer792: 1 24 org.apache.avro.io.DecoderFactory$DefaultDecoderFactory793: 1 24 java.util.Vector$1794: 1 24 org.apache.hadoop.metrics.ContextFactory795: 1 24 org.apache.avro.io.DecoderFactory796: 1 24 java.util.concurrent.CopyOnWriteArrayList797: 1 24 org.apache.avro.file.DataFileReader$SeekableInputStream798: 1 24 org.apache.hadoop.io.ObjectWritable$NullInstance799: 1 24 javax.crypto.spec.SecretKeySpec800: 1 24 org.codehaus.jackson.map.ser.FailingSerializer801: 1 24 org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool$ProviderInfo802: 1 24 org.apache.hadoop.security.Credentials803: 1 24 java.lang.Class$4804: 1 24 sun.net.ProgressMonitor805: 1 24 java.util.Collections$EmptyMap806: 1 24 com.ngmoco.ngpipes.sourcing.NgBucketingEventCountingMapper$MapperHandlersCalled[]807: 1 24 java.util.concurrent.TimeUnit$1808: 1 24 org.codehaus.jackson.map.ser.ArraySerializers$StringArraySerializer809: 1 24 java.lang.Class$1810: 1 24 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer811: 1 24 org.apache.hadoop.fs.Path[]812: 1 24 java.lang.ProcessEnvironment$StringEnvironment813: 1 24 com.ngmoco.ngpipes.sourcing.NgBucketingEventCountingMapper$MapperHandlersCalled814: 1 24 java.util.logging.LoggingPermission815: 1 24 sun.management.ThreadImpl816: 1 24 org.apache.hadoop.hdfs.DFSClient$LeaseChecker817: 1 24 org.apache.hadoop.mapred.Task$CombineOutputCollector818: 1 24 javax.security.auth.login.LoginContext$ModuleInfo[]819: 1 24 org.codehaus.jackson.map.ser.ArraySerializers$LongArraySerializer820: 1 24 com.sun.security.auth.PolicyFile821: 1 24 java.util.BitSet822: 1 24 org.codehaus.jackson.map.ser.ArraySerializers$IntArraySerializer823: 1 24 java.net.InetAddress$Cache$Type[]824: 1 24 java.net.Inet6AddressImpl825: 1 24 org.codehaus.jackson.util.BufferRecycler826: 1 24 java.lang.reflect.Type[]827: 1 24 org.apache.hadoop.mapreduce.server.tasktracker.JVMInfo828: 1 24 org.codehaus.jackson.map.ser.ArraySerializers$FloatArraySerializer829: 1 24 org.apache.hadoop.ipc.Client$1830: 1 24 java.io.FileReader831: 1 24 org.codehaus.jackson.map.ser.ArraySerializers$ShortArraySerializer832: 1 24 java.security.Policy$UnsupportedEmptyCollection833: 1 24 com.sun.security.auth.UnixNumericGroupPrincipal834: 1 24 org.apache.hadoop.io.retry.RetryInvocationHandler835: 1 24 org.apache.hadoop.io.retry.RetryPolicies$RemoteExceptionDependentRetry836: 1 24 java.util.concurrent.TimeUnit$2837: 1 24 org.apache.hadoop.fs.FileSystem$Cache838: 1 24 org.apache.avro.mapred.HadoopMapper839: 1 24 org.codehaus.jackson.map.deser.JsonNodeDeserializer840: 1 24 java.util.concurrent.TimeUnit$4841: 1 24 org.apache.hadoop.mapred.MapRunner842: 1 24 org.apache.avro.io.BinaryDecoder$InputStreamByteSource843: 1 24 org.apache.hadoop.mapred.SpillRecord844: 1 24 org.codehaus.jackson.map.ser.ArraySerializers$DoubleArraySerializer845: 1 24 sun.reflect.generics.reflectiveObjects.ParameterizedTypeImpl846: 1 24 org.apache.hadoop.mapred.Task[]847: 1 24 com.sun.org.apache.xml.internal.utils.XMLReaderManager848: 1 24 org.apache.hadoop.mapred.JVMId849: 1 24 java.util.concurrent.Executors$DefaultThreadFactory850: 1 24 sun.nio.cs.UTF_8851: 1 24 java.lang.StringCoding$StringDecoder852: 1 24 org.apache.log4j.helpers.ISO8601DateFormat853: 1 24 java.util.concurrent.TimeUnit$7854: 1 24 org.apache.hadoop.mapred.MapOutputFile855: 1 24 java.util.concurrent.TimeUnit$5856: 1 24 org.apache.hadoop.mapreduce.split.JobSplit$TaskSplitIndex857: 1 24 java.util.concurrent.TimeUnit$6858: 1 24 org.apache.hadoop.mapred.SortedRanges859: 1 24 org.apache.hadoop.io.retry.RetryPolicies$ExceptionDependentRetry860: 1 24 java.util.concurrent.TimeUnit$3861: 1 24 org.apache.log4j.helpers.FormattingInfo862: 1 24 org.apache.hadoop.mapred.MapTask$OldOutputCollector863: 1 16 org.codehaus.jackson.map.ser.StdSerializers$UtilDateSerializer864: 1 16 com.sun.org.apache.xerces.internal.jaxp.datatype.DatatypeFactoryImpl865: 1 16 org.apache.hadoop.io.Text$1866: 1 16 org.apache.hadoop.io.NullWritable867: 1 16 java.security.ProtectionDomain$2868: 1 16 org.apache.hadoop.hdfs.protocol.DatanodeInfo$1869: 1 16 org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool870: 1 16 org.codehaus.jackson.map.deser.BeanDeserializerFactory871: 1 16 java.util.regex.Pattern$5872: 1 16 org.mortbay.log.Slf4jLog873: 1 16 com.ngmoco.ngpipes.utils.bucketingeventcounting.DefaultPublicEventHandler874: 1 16 org.codehaus.jackson.map.deser.StdKeyDeserializer$BoolKD875: 1 16 org.codehaus.jackson.map.ser.StdSerializers$SqlTimeSerializer876: 1 16 org.codehaus.jackson.map.ser.StdSerializers$StringSerializer877: 1 16 org.codehaus.jackson.map.type.TypeFactory878: 1 16 sun.misc.FloatingDecimal$1879: 1 16 org.codehaus.jackson.map.ser.BeanSerializerFactory880: 1 16 org.codehaus.jackson.map.deser.StdDeserializer$StringDeserializer881: 1 16 org.apache.hadoop.util.ReflectionUtils$1882: 1 16 org.codehaus.jackson.map.deser.FromStringDeserializer$PatternDeserializer883: 1 16 sun.nio.ch.SocketDispatcher884: 1 16 org.apache.hadoop.mapred.lib.HashPartitioner885: 1 16 com.ngmoco.ngpipes.utils.bucketingeventcounting.UINavPublicEventHandler886: 1 16 java.util.Hashtable$EmptyIterator887: 1 16 java.lang.InheritableThreadLocal888: 1 16 java.text.FieldPosition[]889: 1 16 org.apache.commons.logging.LogFactory$1890: 1 16 com.sun.security.auth.UnixPrincipal891: 1 16 org.apache.avro.file.NullCodec892: 1 16 com.sun.org.apache.xerces.internal.parsers.SecuritySupport893: 1 16 org.codehaus.jackson.map.deser.ArrayDeserializers$FloatDeser894: 1 16 java.util.regex.Pattern$LastNode895: 1 16 java.util.Hashtable$EmptyEnumerator896: 1 16 org.apache.hadoop.hdfs.protocol.HdfsFileStatus$1897: 1 16 com.sun.org.apache.xerces.internal.impl.dv.dtd.NMTOKENDatatypeValidator898: 1 16 sun.reflect.ReflectionFactory899: 1 16 org.apache.hadoop.hdfs.protocol.LocatedBlock$1900: 1 16 org.codehaus.jackson.map.ser.StdSerializerProvider$1901: 1 16 org.apache.avro.generic.GenericDatumReader$1902: 1 16 org.apache.avro.io.BinaryData$1903: 1 16 java.nio.charset.CoderResult$1904: 1 16 org.codehaus.jackson.map.ext.CoreXMLDeserializers$DOMDocumentDeserializer905: 1 16 sun.reflect.GeneratedConstructorAccessor4906: 1 16 org.codehaus.jackson.map.introspect.BasicClassIntrospector907: 1 16 org.codehaus.jackson.map.deser.StdKeyDeserializer$IntKD908: 1 16 org.codehaus.jackson.map.ser.StdSerializers$IntLikeSerializer909: 1 16 org.codehaus.jackson.map.ser.JdkSerializers$FileSerializer910: 1 16 org.codehaus.jackson.map.deser.StdKeyDeserializer$LongKD911: 1 16 com.sun.org.apache.xerces.internal.dom.CharacterDataImpl$1912: 1 16 java.nio.channels.spi.AbstractSelector$1913: 1 16 org.apache.log4j.helpers.AppenderAttachableImpl914: 1 16 java.util.concurrent.ThreadPoolExecutor$AbortPolicy915: 1 16 org.codehaus.jackson.map.deser.StdDeserializer$TokenBufferDeserializer916: 1 16 java.util.ResourceBundle$Control917: 1 16 org.codehaus.jackson.map.deser.ArrayDeserializers$ByteDeser918: 1 16 org.codehaus.jackson.map.deser.FromStringDeserializer$URIDeserializer919: 1 16 org.codehaus.jackson.map.ser.StdSerializers$SqlDateSerializer920: 1 16 org.apache.hadoop.io.Text$2921: 1 16 java.nio.charset.CoderResult$2922: 1 16 com.sun.org.apache.xerces.internal.impl.dv.dtd.NOTATIONDatatypeValidator923: 1 16 org.codehaus.jackson.map.deser.ArrayDeserializers$IntDeser924: 1 16 org.apache.hadoop.mapred.FileOutputCommitter925: 1 16 java.lang.ref.Reference$Lock926: 1 16 java.security.AccessControlContext$1927: 1 16 java.lang.Runtime928: 1 16 org.apache.hadoop.io.retry.RetryPolicies$TryOnceThenFail929: 1 16 java.lang.System$2930: 1 16 javax.xml.datatype.SecuritySupport931: 1 16 org.slf4j.impl.StaticLoggerBinder932: 1 16 org.codehaus.jackson.map.deser.ArrayDeserializers$StringDeser933: 1 16 java.net.URLClassLoader$7934: 1 16 org.codehaus.jackson.map.deser.ArrayDeserializers935: 1 16 org.codehaus.jackson.map.deser.StdDeserializer$BigDecimalDeserializer936: 1 16 java.util.WeakHashMap$KeySet937: 1 16 $Proxy0938: 1 16 com.ngmoco.ngpipes.utils.bucketingeventcounting.SessionDurationBucketingHandler939: 1 16 org.codehaus.jackson.map.ser.StdSerializers$IntegerSerializer940: 1 16 sun.reflect.GeneratedConstructorAccessor5941: 1 16 org.codehaus.jackson.map.deser.StdKeyDeserializer$ByteKD942: 1 16 java.util.jar.JavaUtilJarAccessImpl943: 1 16 com.ngmoco.ngpipes.utils.bucketingeventcounting.PlusUINavPrivateEventHandler944: 1 16 sun.misc.ASCIICaseInsensitiveComparator945: 1 16 org.codehaus.jackson.map.introspect.JacksonAnnotationIntrospector946: 1 16 org.codehaus.jackson.map.introspect.BasicClassIntrospector$GetterMethodFilter947: 1 16 sun.reflect.GeneratedConstructorAccessor9948: 1 16 sun.net.www.protocol.jar.JarFileFactory949: 1 16 com.sun.org.apache.xerces.internal.impl.dv.SecuritySupport950: 1 16 com.ngmoco.ngpipes.utils.bucketingeventcounting.DefaultPrivateEventHandler951: 1 16 com.sun.org.apache.xerces.internal.impl.dv.dtd.IDREFDatatypeValidator952: 1 16 org.codehaus.jackson.map.ext.CoreXMLDeserializers$GregorianCalendarDeserializer953: 1 16 com.sun.org.apache.xerces.internal.impl.dv.dtd.StringDatatypeValidator954: 1 16 org.apache.hadoop.util.QuickSort955: 1 16 java.lang.reflect.ReflectAccess956: 1 16 java.text.DontCareFieldPosition$1957: 1 16 sun.jkernel.DownloadManager$1958: 1 16 com.sun.org.apache.xml.internal.serializer.SecuritySupport12959: 1 16 org.apache.hadoop.util.HeapSort960: 1 16 org.apache.avro.file.NullCodec$Option961: 1 16 org.apache.hadoop.security.UserGroupInformation$HadoopConfiguration962: 1 16 org.codehaus.jackson.map.deser.StdDeserializer$ClassDeserializer963: 1 16 java.util.Collections$UnmodifiableSet964: 1 16 org.codehaus.jackson.map.ser.StdSerializers$FloatSerializer965: 1 16 org.apache.log4j.helpers.NullEnumeration966: 1 16 org.slf4j.impl.Log4jLoggerFactory967: 1 16 org.codehaus.jackson.map.deser.FromStringDeserializer$UUIDDeserializer968: 1 16 sun.misc.Launcher$Factory969: 1 16 sun.nio.ch.Util$1970: 1 16 sun.nio.ch.Util$2971: 1 16 java.util.Hashtable$EntrySet972: 1 16 java.beans.EventSetDescriptor[]973: 1 16 sun.util.calendar.Gregorian974: 1 16 org.codehaus.jackson.map.ser.ArraySerializers$ByteArraySerializer975: 1 16 org.codehaus.jackson.map.deser.FromStringDeserializer$URLDeserializer976: 1 16 org.apache.hadoop.mapred.FileInputFormat$1977: 1 16 org.apache.hadoop.io.retry.RetryPolicies$TryOnceDontFail978: 1 16 org.codehaus.jackson.map.deser.StdKeyDeserializer$FloatKD979: 1 16 sun.reflect.GeneratedConstructorAccessor3980: 1 16 org.codehaus.jackson.map.ser.NullSerializer981: 1 16 java.util.Collections$EmptySet982: 1 16 javax.xml.parsers.SecuritySupport983: 1 16 org.codehaus.jackson.map.deser.ArrayDeserializers$LongDeser984: 1 16 org.codehaus.jackson.map.ser.StdSerializers$CalendarSerializer985: 1 16 org.codehaus.jackson.map.ser.ToStringSerializer986: 1 16 com.sun.org.apache.xerces.internal.impl.dv.dtd.IDDatatypeValidator987: 1 16 org.apache.log4j.or.DefaultRenderer988: 1 16 java.lang.String$CaseInsensitiveComparator989: 1 16 org.apache.hadoop.fs.permission.FsPermission$1990: 1 16 org.codehaus.jackson.map.deser.ArrayDeserializers$BooleanDeser991: 1 16 java.lang.ApplicationShutdownHooks$1992: 1 16 org.slf4j.impl.Log4jLoggerAdapter993: 1 16 org.apache.avro.generic.GenericData994: 1 16 com.sun.security.auth.UnixNumericUserPrincipal995: 1 16 org.apache.hadoop.security.token.SecretManager$1996: 1 16 java.util.concurrent.SynchronousQueue$TransferStack997: 1 16 org.apache.log4j.spi.DefaultRepositorySelector998: 1 16 org.codehaus.jackson.map.ser.BasicSerializerFactory999: 1 16 com.ngmoco.ngpipes.utils.bucketingeventcounting.SessionSSTHardwareProfileBucketingHandler1000: 1 16 org.codehaus.jackson.map.ext.CoreXMLDeserializers$DurationDeserializer1001: 1 16 org.codehaus.jackson.map.deser.ArrayDeserializers$CharDeser1002: 1 16 org.apache.avro.Schema$21003: 1 16 org.codehaus.jackson.map.ser.JdkSerializers$ClassSerializer1004: 1 16 org.apache.hadoop.hdfs.protocol.Block$11005: 1 16 org.codehaus.jackson.map.deser.StdDeserializer$SqlDateDeserializer1006: 1 16 org.codehaus.jackson.map.ext.CoreXMLDeserializers$DOMNodeDeserializer1007: 1 16 java.net.InetAddress$11008: 1 16 sun.util.resources.LocaleData$LocaleDataResourceBundleControl1009: 1 16 org.apache.log4j.or.RendererMap1010: 1 16 java.lang.StackTraceElement[]1011: 1 16 org.codehaus.jackson.map.ser.StdSerializers$TokenBufferSerializer1012: 1 16 org.apache.hadoop.metrics.util.MetricsRegistry1013: 1 16 sun.reflect.GeneratedConstructorAccessor21014: 1 16 org.codehaus.jackson.map.ser.ArraySerializers$CharArraySerializer1015: 1 16 com.sun.org.apache.xerces.internal.dom.SecuritySupport1016: 1 16 org.apache.avro.Schema$11017: 1 16 javax.xml.transform.SecuritySupport1018: 1 16 org.apache.avro.file.DeflateCodec$Option1019: 1 16 org.apache.hadoop.security.ShellBasedUnixGroupsMapping1020: 1 16 java.net.UnknownContentHandler1021: 1 16 org.apache.log4j.DefaultCategoryFactory1022: 1 16 sun.reflect.Label1023: 1 16 org.apache.hadoop.ipc.RPC$ClientCache1024: 1 16 org.codehaus.jackson.map.deser.StdKeyDeserializer$CharKD1025: 1 16 java.io.FileDescriptor$11026: 1 16 sun.reflect.GeneratedConstructorAccessor11027: 1 16 sun.reflect.GeneratedMethodAccessor11028: 1 16 org.apache.hadoop.fs.ChecksumFileSystem$11029: 1 16 org.codehaus.jackson.map.deser.DateDeserializer1030: 1 16 sun.reflect.GeneratedConstructorAccessor61031: 1 16 org.codehaus.jackson.map.deser.StdKeyDeserializer$DoubleKD1032: 1 16 sun.net.DefaultProgressMeteringPolicy1033: 1 16 org.codehaus.jackson.map.deser.StdDeserializer$BigIntegerDeserializer1034: 1 16 sun.net.www.protocol.file.Handler1035: 1 16 $Proxy61036: 1 16 java.util.IdentityHashMap$Values1037: 1 16 sun.reflect.GeneratedMethodAccessor21038: 1 16 sun.reflect.ClassDefiner$11039: 1 16 org.apache.hadoop.hdfs.protocol.DatanodeID[]1040: 1 16 org.apache.hadoop.hdfs.protocol.LocatedBlocks$21041: 1 16 com.ngmoco.ngpipes.utils.bucketingeventcounting.VirtualEconomyChangeEventHandler1042: 1 16 org.apache.hadoop.mapred.Reporter$11043: 1 16 org.codehaus.jackson.map.deser.StdDeserializer$NumberDeserializer1044: 1 16 java.util.TreeMap$KeySet1045: 1 16 java.security.cert.Certificate[]1046: 1 16 org.apache.hadoop.io.UTF8$11047: 1 16 com.ngmoco.ngpipes.utils.bucketingeventcounting.SessionHardwareProfileBucketingHandler1048: 1 16 org.apache.hadoop.fs.FileSystem$31049: 1 16 org.codehaus.jackson.map.ser.StdSerializers$LongSerializer1050: 1 16 com.ngmoco.ngpipes.utils.bucketingeventcounting.PlusPrivatePresenceEventHandler1051: 1 16 java.util.Collections$EmptyList1052: 1 16 java.lang.Terminator$11053: 1 16 org.codehaus.jackson.map.ser.StdSerializers$DoubleSerializer1054: 1 16 org.codehaus.jackson.map.type.TypeParser1055: 1 16 com.sun.beans.WeakCache1056: 1 16 sun.reflect.GeneratedConstructorAccessor81057: 1 16 sun.misc.Unsafe1058: 1 16 java.util.regex.Pattern$Node1059: 1 16 sun.reflect.GeneratedConstructorAccessor71060: 1 16 org.codehaus.jackson.map.deser.UntypedObjectDeserializer1061: 1 16 sun.nio.ch.EPollSelectorProvider1062: 1 16 com.ngmoco.ngpipes.utils.bucketingeventcounting.FunnelEventHandler1063: 1 16 org.codehaus.jackson.map.deser.ArrayDeserializers$DoubleDeser1064: 1 16 $Proxy51065: 1 16 org.codehaus.jackson.map.ext.CoreXMLDeserializers$QNameDeserializer1066: 1 16 org.apache.hadoop.io.retry.RetryPolicies$RetryForever1067: 1 16 org.codehaus.jackson.map.deser.StdDeserializer$StackTraceElementDeserializer1068: 1 16 org.apache.avro.io.BinaryData$21069: 1 16 org.codehaus.jackson.map.deser.ArrayDeserializers$ShortDeser1070: 1 16 java.math.BigDecimal$11071: 1 16 org.codehaus.jackson.node.JsonNodeFactory1072: 1 16 org.apache.hadoop.security.UserGroupInformation$HadoopLoginModule1073: 1 16 com.ngmoco.ngpipes.utils.bucketingeventcounting.PlusFriendSearchEventHandler1074: 1 16 org.apache.hadoop.metrics.jvm.EventCounter$EventCounts1075: 1 16 sun.reflect.GeneratedConstructorAccessor101076: 1 16 com.ngmoco.ngpipes.utils.bucketingeventcounting.RevenueEventHandler1077: 1 16 com.hadoop.compression.lzo.LzoCodec1078: 1 16 java.util.Collections$ReverseComparator1079: 1 16 sun.misc.Launcher1080: 1 16 com.sun.org.apache.xalan.internal.xsltc.dom.SecuritySupport121081: 1 16 java.lang.FloatTotal : 3125077 482600080Heap traversal took 13.835 seconds.
From: scott@richrelevance.com
To: user@avro.apache.org
Date: Wed, 1 Jun 2011 13:45:53 -0700
Subject: Re: avro object reuse
Lower down this list of object counts, what are the top org.apache.avro.** object counts?
How many AvroSerialization objects? How many AvroMapper, AvroWrapper, etc?
What about org.apache.hadoop.** objects?
On 6/1/11 11:34 AM, "ey-chih chow" <ey...@hotmail.com> wrote:
We ran jmap on one of our mapper and found the top usage as follows:
num #instances #bytes Class description--------------------------------------------------------------------------1: 24405 291733256 byte[]2: 6056 40228984 int[]3: 388799 19966776 char[]4: 101779 16284640 org.codehaus.jackson.impl.ReaderBasedParser5: 369623 11827936 java.lang.String6: 111059 8769424 java.util.HashMap$Entry[]7: 204083 8163320 org.codehaus.jackson.impl.JsonReadContext8: 211374 6763968 java.util.HashMap$Entry9: 102551 5742856 org.codehaus.jackson.util.TextBuffer10: 105854 5080992 java.nio.HeapByteBuffer11: 105821 5079408 java.nio.HeapCharBuffer12: 104578 5019744 java.util.HashMap13: 102551 4922448 org.codehaus.jackson.io.IOContext14: 101782 4885536 org.codehaus.jackson.map.DeserializationConfig15: 101783 4071320 org.codehaus.jackson.sym.CharsToNameCanonicalizer16: 101779 4071160 org.codehaus.jackson.map.deser.StdDeserializationContext17: 101779 4071160 java.io.StringReader18: 101754 4070160 java.util.HashMap$KeyIterator
It looks like Jackson eats up a lot of memory. Our mapper reads in files of the avro format. Does avro use Jackson a lot in reading the avro files? Is there any way to improve this? Thanks.
Ey-Chih Chow
From: scott@richrelevance.com
To: user@avro.apache.org
Date: Tue, 31 May 2011 18:26:23 -0700
Subject: Re: avro object reuse
All of those instances are short-lived. If you are running out of memory, its not likely due to object reuse. This tends to cause more CPU time in the garbage collector, but not out of memory conditions. This can be hard to do on a cluster, but grabbing 'jmap –histo' output from a JVM that has a larger-than-expected JVM heap usage can often be used to quickly identify the cause of memory consumption issues.
I'm not sure if AvroUtf8InputFormat can safely re-use its instances of Utf8 or not.
On 5/31/11 5:40 PM, "ey-chih chow" <ey...@hotmail.com> wrote:
I actually looked into Avro code to find out how Avro does object reuse. I looked at AvroUtf8InputFormat and got the following question. Why a new Utf8 object has to be created each time the method next(AvroWrapper<Utf8> key, NullWritable value) is called ? Will this eat up too much memory when we call next(key, value) many times? Since Utf8 is mutable, can we just create one Utf8 object for all the calls to next(key, value)? Will this save memory? Thanks.
Ey-Chih Chow
From: eychih@hotmail.com
To: user@avro.apache.org
Subject: avro object reuse
Date: Tue, 31 May 2011 10:38:39 -0700
Hi,
We have several mapreduce jobs using avro. They take too much memory when running on production. Can anybody suggest some object reuse techniques to cut down memory usage? Thanks.
Ey-Chih Chow
Re: avro object reuse
Posted by Scott Carey <sc...@richrelevance.com>.
It is just a hunch that an OOME can happen if a corrupted array size is read (since I have seen this before). Without the OOME stack trace, I can't say either way. Sometimes the OOME stack trace is useless, because other things leaked leading to it, and other times it can show the source of the problem because it happens during an attempt to allocate a very large object or object graph.
Because OOME is not of type Exception, but rather (Throwable/Error), it usually gets printed out somewhere (check the std err logs of the map job) even when logging is turned down.
On 6/10/11 12:07 PM, "ey-chih chow" <ey...@hotmail.com>> wrote:
We have many MR jobs running on production, but only one of them shows this kind of behavior. Is there any specific condition that corruption will occur?
________________________________
From: scott@richrelevance.com<ma...@richrelevance.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Date: Fri, 10 Jun 2011 11:11:55 -0700
Subject: Re: avro object reuse
Corruption can occur in I/O busses and RAM. Does this tend to fail on the same nodes, or any node randomly? Since it does not fail consistently, this makes me suspect some sort of corruption even more.
I suggest turning on stack traces for fatal throwables. This shouldn't hurt production performance since they don't happen regularly and break the task anyway.
Of the heap dumps seen so far, the primary consumption is byte[] and no more than 300MB. How large are your java heaps?
On 6/10/11 10:53 AM, "ey-chih chow" <ey...@hotmail.com>> wrote:
Since this was in production, we did not turn on stack trace. Also, it was highly unlikely that there was any data corrupted because, if one mapper failed due to out of memory, the system started another one and went through all the data.
________________________________
From: scott@richrelevance.com<ma...@richrelevance.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Date: Thu, 9 Jun 2011 17:43:02 -0700
Subject: Re: avro object reuse
If the exception is happening while decoding, it could be due to corrupt data. Avro allocates a List preallocated to the size encoded, and I've seen corrupted data cause attempted allocations of arrays too large for the heap.
On 6/9/11 4:58 PM, "Scott Carey" <sc...@richrelevance.com>> wrote:
What is the stack trace on the out of memory exception?
On 6/9/11 4:45 PM, "ey-chih chow" <ey...@hotmail.com>> wrote:
We configure more than 100MB for MapReduce to do sorting. Memory we allocate for doing other things in the mapper actually is larger, but, for this job, we always get out-of-meory exceptions and the job can not complete. We try to find out if there is a way to avoid this problem.
Ey-Chih Chow
________________________________
From: scott@richrelevance.com<ma...@richrelevance.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Date: Thu, 9 Jun 2011 15:42:10 -0700
Subject: Re: avro object reuse
The most likely candidate for creating many instances of BufferAccessor and ByteArrayByteSource is BinaryData.compare() and BinaryData.hashCode(). Each call will create one of each (hash) or two of each (compare). These are only 32 bytes per instance and quickly become garbage that is easily cleaned up by the GC.
The below have only 32 bytes each and 8MB total.
On the other hand, the byte[]'s appear to be about 24K each on average and are using 100MB. Is this the size of your configured MapReduce sort MB?
On 6/9/11 3:08 PM, "ey-chih chow" <ey...@hotmail.com>> wrote:
We did more monitoring. At one instance, we got the following histogram via Jmap. The question is why there are so many instances of BinaryDecoder$BufferAccessor and BinaryDecoder$ByteArrayByteSource. How to avoid this? Thanks.
Object Histogram:
num #instances #bytes Class description
--------------------------------------------------------------------------
1: 4199 100241168 byte[]
2: 272948 8734336 org.apache.avro.io.BinaryDecoder$BufferAccessor
3: 272945 8734240 org.apache.avro.io.BinaryDecoder$ByteArrayByteSource
4: 2093 5387976 int[]
5: 23762 2822864 * ConstMethodKlass
6: 23762 1904760 * MethodKlass
7: 39295 1688992 * SymbolKlass
8: 2127 1216976 * ConstantPoolKlass
9: 2127 882760 * InstanceKlassKlass
10: 1847 742936 * ConstantPoolCacheKlass
11: 9602 715608 char[]
12: 1072 299584 * MethodDataKlass
13: 9698 232752 java.lang.String
14: 2317 222432 java.lang.Class
15: 3288 204440 short[]
16: 3167 156664 * System ObjArray
17: 2401 57624 java.util.HashMap$Entry
18: 666 53280 java.lang.reflect.Method
19: 161 52808 * ObjArrayKlassKlass
20: 1808 43392 java.util.Hashtable$Entry
________________________________
From: eychih@hotmail.com<ma...@hotmail.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Subject: RE: avro object reuse
Date: Wed, 1 Jun 2011 15:14:03 -0700
We use a lot of toString() call on the avro Utf8 object. Will this cause Jackson call? Thanks.
Ey-Chih
________________________________
From: scott@richrelevance.com<ma...@richrelevance.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Date: Wed, 1 Jun 2011 13:38:39 -0700
Subject: Re: avro object reuse
This is great info.
Jackson should only be used once when the file is opened, so this is confusing from that point of view.
Is something else using Jackson or initializing an Avro JsonDecoder frequently? There are over 100000 Jackson DeserializationConfig objects.
Another place that parses the schema is in AvroSerialization.java. Does the Hadoop getDeserializer() API method get called once per job, or per record? If this is called more than once per map job, it might explain this.
In principle, Jackson is only used by a mapper during initialization. The below indicates that this may not be the case or that something outside of Avro is causing a lot of Jackson JSON parsing.
Are you using something that is converting the Avro data to Json form? toString() on most Avro datum objects will do a lot of work with Jackson, for example — but the below are deserializer objects not serializer objects so that is not likely the issue.
On 6/1/11 11:34 AM, "ey-chih chow" <ey...@hotmail.com>> wrote:
We ran jmap on one of our mapper and found the top usage as follows:
num #instances #bytes Class description
--------------------------------------------------------------------------
1: 24405 291733256 byte[]
2: 6056 40228984 int[]
3: 388799 19966776 char[]
4: 101779 16284640 org.codehaus.jackson.impl.ReaderBasedParser
5: 369623 11827936 java.lang.String
6: 111059 8769424 java.util.HashMap$Entry[]
7: 204083 8163320 org.codehaus.jackson.impl.JsonReadContext
8: 211374 6763968 java.util.HashMap$Entry
9: 102551 5742856 org.codehaus.jackson.util.TextBuffer
10: 105854 5080992 java.nio.HeapByteBuffer
11: 105821 5079408 java.nio.HeapCharBuffer
12: 104578 5019744 java.util.HashMap
13: 102551 4922448 org.codehaus.jackson.io.IOContext
14: 101782 4885536 org.codehaus.jackson.map.DeserializationConfig
15: 101783 4071320 org.codehaus.jackson.sym.CharsToNameCanonicalizer
16: 101779 4071160 org.codehaus.jackson.map.deser.StdDeserializationContext
17: 101779 4071160 java.io.StringReader
18: 101754 4070160 java.util.HashMap$KeyIterator
It looks like Jackson eats up a lot of memory. Our mapper reads in files of the avro format. Does avro use Jackson a lot in reading the avro files? Is there any way to improve this? Thanks.
Ey-Chih Chow
________________________________
From: scott@richrelevance.com<ma...@richrelevance.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Date: Tue, 31 May 2011 18:26:23 -0700
Subject: Re: avro object reuse
All of those instances are short-lived. If you are running out of memory, its not likely due to object reuse. This tends to cause more CPU time in the garbage collector, but not out of memory conditions. This can be hard to do on a cluster, but grabbing 'jmap –histo' output from a JVM that has a larger-than-expected JVM heap usage can often be used to quickly identify the cause of memory consumption issues.
I'm not sure if AvroUtf8InputFormat can safely re-use its instances of Utf8 or not.
On 5/31/11 5:40 PM, "ey-chih chow" <ey...@hotmail.com>> wrote:
I actually looked into Avro code to find out how Avro does object reuse. I looked at AvroUtf8InputFormat and got the following question. Why a new Utf8 object has to be created each time the method next(AvroWrapper<Utf8> key, NullWritable value) is called ? Will this eat up too much memory when we call next(key, value) many times? Since Utf8 is mutable, can we just create one Utf8 object for all the calls to next(key, value)? Will this save memory? Thanks.
Ey-Chih Chow
________________________________
From: eychih@hotmail.com<ma...@hotmail.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Subject: avro object reuse
Date: Tue, 31 May 2011 10:38:39 -0700
Hi,
We have several mapreduce jobs using avro. They take too much memory when running on production. Can anybody suggest some object reuse techniques to cut down memory usage? Thanks.
Ey-Chih Chow
RE: avro object reuse
Posted by ey-chih chow <ey...@hotmail.com>.
We have many MR jobs running on production, but only one of them shows this kind of behavior. Is there any specific condition that corruption will occur?
From: scott@richrelevance.com
To: user@avro.apache.org
Date: Fri, 10 Jun 2011 11:11:55 -0700
Subject: Re: avro object reuse
Corruption can occur in I/O busses and RAM. Does this tend to fail on the same nodes, or any node randomly? Since it does not fail consistently, this makes me suspect some sort of corruption even more.
I suggest turning on stack traces for fatal throwables. This shouldn't hurt production performance since they don't happen regularly and break the task anyway.
Of the heap dumps seen so far, the primary consumption is byte[] and no more than 300MB. How large are your java heaps?
On 6/10/11 10:53 AM, "ey-chih chow" <ey...@hotmail.com> wrote:
Since this was in production, we did not turn on stack trace. Also, it was highly unlikely that there was any data corrupted because, if one mapper failed due to out of memory, the system started another one and went through all the data.
From: scott@richrelevance.com
To: user@avro.apache.org
Date: Thu, 9 Jun 2011 17:43:02 -0700
Subject: Re: avro object reuse
If the exception is happening while decoding, it could be due to corrupt data. Avro allocates a List preallocated to the size encoded, and I've seen corrupted data cause attempted allocations of arrays too large for the heap.
On 6/9/11 4:58 PM, "Scott Carey" <sc...@richrelevance.com> wrote:
What is the stack trace on the out of memory exception?
On 6/9/11 4:45 PM, "ey-chih chow" <ey...@hotmail.com> wrote:
We configure more than 100MB for MapReduce to do sorting. Memory we allocate for doing other things in the mapper actually is larger, but, for this job, we always get out-of-meory exceptions and the job can not complete. We try to find out if there is a way to avoid this problem.
Ey-Chih Chow
From: scott@richrelevance.com
To: user@avro.apache.org
Date: Thu, 9 Jun 2011 15:42:10 -0700
Subject: Re: avro object reuse
The most likely candidate for creating many instances of BufferAccessor and ByteArrayByteSource is BinaryData.compare() and BinaryData.hashCode(). Each call will create one of each (hash) or two of each (compare). These are only 32 bytes per instance and quickly become garbage that is easily cleaned up by the GC.
The below have only 32 bytes each and 8MB total.On the other hand, the byte[]'s appear to be about 24K each on average and are using 100MB. Is this the size of your configured MapReduce sort MB?
On 6/9/11 3:08 PM, "ey-chih chow" <ey...@hotmail.com> wrote:
We did more monitoring. At one instance, we got the following histogram via Jmap. The question is why there are so many instances of BinaryDecoder$BufferAccessor and BinaryDecoder$ByteArrayByteSource. How to avoid this? Thanks.
Object Histogram:
num #instances #bytes Class description
--------------------------------------------------------------------------
1: 4199 100241168 byte[]
2: 272948 8734336 org.apache.avro.io.BinaryDecoder$BufferAccessor
3: 272945 8734240 org.apache.avro.io.BinaryDecoder$ByteArrayByteSource
4: 2093 5387976 int[]
5: 23762 2822864 * ConstMethodKlass
6: 23762 1904760 * MethodKlass
7: 39295 1688992 * SymbolKlass
8: 2127 1216976 * ConstantPoolKlass
9: 2127 882760 * InstanceKlassKlass
10: 1847 742936 * ConstantPoolCacheKlass
11: 9602 715608 char[]
12: 1072 299584 * MethodDataKlass
13: 9698 232752 java.lang.String
14: 2317 222432 java.lang.Class
15: 3288 204440 short[]
16: 3167 156664 * System ObjArray
17: 2401 57624 java.util.HashMap$Entry
18: 666 53280 java.lang.reflect.Method
19: 161 52808 * ObjArrayKlassKlass
20: 1808 43392 java.util.Hashtable$Entry
From: eychih@hotmail.com
To: user@avro.apache.org
Subject: RE: avro object reuse
Date: Wed, 1 Jun 2011 15:14:03 -0700
We use a lot of toString() call on the avro Utf8 object. Will this cause Jackson call? Thanks.
Ey-Chih
From: scott@richrelevance.com
To: user@avro.apache.org
Date: Wed, 1 Jun 2011 13:38:39 -0700
Subject: Re: avro object reuse
This is great info.
Jackson should only be used once when the file is opened, so this is confusing from that point of view. Is something else using Jackson or initializing an Avro JsonDecoder frequently? There are over 100000 Jackson DeserializationConfig objects.
Another place that parses the schema is in AvroSerialization.java. Does the Hadoop getDeserializer() API method get called once per job, or per record? If this is called more than once per map job, it might explain this.
In principle, Jackson is only used by a mapper during initialization. The below indicates that this may not be the case or that something outside of Avro is causing a lot of Jackson JSON parsing.
Are you using something that is converting the Avro data to Json form? toString() on most Avro datum objects will do a lot of work with Jackson, for example — but the below are deserializer objects not serializer objects so that is not likely the issue.
On 6/1/11 11:34 AM, "ey-chih chow" <ey...@hotmail.com> wrote:
We ran jmap on one of our mapper and found the top usage as follows:
num #instances #bytes Class description--------------------------------------------------------------------------1: 24405 291733256 byte[]2: 6056 40228984 int[]3: 388799 19966776 char[]4: 101779 16284640 org.codehaus.jackson.impl.ReaderBasedParser5: 369623 11827936 java.lang.String6: 111059 8769424 java.util.HashMap$Entry[]7: 204083 8163320 org.codehaus.jackson.impl.JsonReadContext8: 211374 6763968 java.util.HashMap$Entry9: 102551 5742856 org.codehaus.jackson.util.TextBuffer10: 105854 5080992 java.nio.HeapByteBuffer11: 105821 5079408 java.nio.HeapCharBuffer12: 104578 5019744 java.util.HashMap13: 102551 4922448 org.codehaus.jackson.io.IOContext14: 101782 4885536 org.codehaus.jackson.map.DeserializationConfig15: 101783 4071320 org.codehaus.jackson.sym.CharsToNameCanonicalizer16: 101779 4071160 org.codehaus.jackson.map.deser.StdDeserializationContext17: 101779 4071160 java.io.StringReader18: 101754 4070160 java.util.HashMap$KeyIterator
It looks like Jackson eats up a lot of memory. Our mapper reads in files of the avro format. Does avro use Jackson a lot in reading the avro files? Is there any way to improve this? Thanks.
Ey-Chih Chow
From: scott@richrelevance.com
To: user@avro.apache.org
Date: Tue, 31 May 2011 18:26:23 -0700
Subject: Re: avro object reuse
All of those instances are short-lived. If you are running out of memory, its not likely due to object reuse. This tends to cause more CPU time in the garbage collector, but not out of memory conditions. This can be hard to do on a cluster, but grabbing 'jmap –histo' output from a JVM that has a larger-than-expected JVM heap usage can often be used to quickly identify the cause of memory consumption issues.
I'm not sure if AvroUtf8InputFormat can safely re-use its instances of Utf8 or not.
On 5/31/11 5:40 PM, "ey-chih chow" <ey...@hotmail.com> wrote:
I actually looked into Avro code to find out how Avro does object reuse. I looked at AvroUtf8InputFormat and got the following question. Why a new Utf8 object has to be created each time the method next(AvroWrapper<Utf8> key, NullWritable value) is called ? Will this eat up too much memory when we call next(key, value) many times? Since Utf8 is mutable, can we just create one Utf8 object for all the calls to next(key, value)? Will this save memory? Thanks.
Ey-Chih Chow
From: eychih@hotmail.com
To: user@avro.apache.org
Subject: avro object reuse
Date: Tue, 31 May 2011 10:38:39 -0700
Hi,
We have several mapreduce jobs using avro. They take too much memory when running on production. Can anybody suggest some object reuse techniques to cut down memory usage? Thanks.
Ey-Chih Chow
Re: avro object reuse
Posted by Scott Carey <sc...@richrelevance.com>.
Corruption can occur in I/O busses and RAM. Does this tend to fail on the same nodes, or any node randomly? Since it does not fail consistently, this makes me suspect some sort of corruption even more.
I suggest turning on stack traces for fatal throwables. This shouldn't hurt production performance since they don't happen regularly and break the task anyway.
Of the heap dumps seen so far, the primary consumption is byte[] and no more than 300MB. How large are your java heaps?
On 6/10/11 10:53 AM, "ey-chih chow" <ey...@hotmail.com>> wrote:
Since this was in production, we did not turn on stack trace. Also, it was highly unlikely that there was any data corrupted because, if one mapper failed due to out of memory, the system started another one and went through all the data.
________________________________
From: scott@richrelevance.com<ma...@richrelevance.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Date: Thu, 9 Jun 2011 17:43:02 -0700
Subject: Re: avro object reuse
If the exception is happening while decoding, it could be due to corrupt data. Avro allocates a List preallocated to the size encoded, and I've seen corrupted data cause attempted allocations of arrays too large for the heap.
On 6/9/11 4:58 PM, "Scott Carey" <sc...@richrelevance.com>> wrote:
What is the stack trace on the out of memory exception?
On 6/9/11 4:45 PM, "ey-chih chow" <ey...@hotmail.com>> wrote:
We configure more than 100MB for MapReduce to do sorting. Memory we allocate for doing other things in the mapper actually is larger, but, for this job, we always get out-of-meory exceptions and the job can not complete. We try to find out if there is a way to avoid this problem.
Ey-Chih Chow
________________________________
From: scott@richrelevance.com<ma...@richrelevance.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Date: Thu, 9 Jun 2011 15:42:10 -0700
Subject: Re: avro object reuse
The most likely candidate for creating many instances of BufferAccessor and ByteArrayByteSource is BinaryData.compare() and BinaryData.hashCode(). Each call will create one of each (hash) or two of each (compare). These are only 32 bytes per instance and quickly become garbage that is easily cleaned up by the GC.
The below have only 32 bytes each and 8MB total.
On the other hand, the byte[]'s appear to be about 24K each on average and are using 100MB. Is this the size of your configured MapReduce sort MB?
On 6/9/11 3:08 PM, "ey-chih chow" <ey...@hotmail.com>> wrote:
We did more monitoring. At one instance, we got the following histogram via Jmap. The question is why there are so many instances of BinaryDecoder$BufferAccessor and BinaryDecoder$ByteArrayByteSource. How to avoid this? Thanks.
Object Histogram:
num #instances #bytes Class description
--------------------------------------------------------------------------
1: 4199 100241168 byte[]
2: 272948 8734336 org.apache.avro.io.BinaryDecoder$BufferAccessor
3: 272945 8734240 org.apache.avro.io.BinaryDecoder$ByteArrayByteSource
4: 2093 5387976 int[]
5: 23762 2822864 * ConstMethodKlass
6: 23762 1904760 * MethodKlass
7: 39295 1688992 * SymbolKlass
8: 2127 1216976 * ConstantPoolKlass
9: 2127 882760 * InstanceKlassKlass
10: 1847 742936 * ConstantPoolCacheKlass
11: 9602 715608 char[]
12: 1072 299584 * MethodDataKlass
13: 9698 232752 java.lang.String
14: 2317 222432 java.lang.Class
15: 3288 204440 short[]
16: 3167 156664 * System ObjArray
17: 2401 57624 java.util.HashMap$Entry
18: 666 53280 java.lang.reflect.Method
19: 161 52808 * ObjArrayKlassKlass
20: 1808 43392 java.util.Hashtable$Entry
________________________________
From: eychih@hotmail.com<ma...@hotmail.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Subject: RE: avro object reuse
Date: Wed, 1 Jun 2011 15:14:03 -0700
We use a lot of toString() call on the avro Utf8 object. Will this cause Jackson call? Thanks.
Ey-Chih
________________________________
From: scott@richrelevance.com<ma...@richrelevance.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Date: Wed, 1 Jun 2011 13:38:39 -0700
Subject: Re: avro object reuse
This is great info.
Jackson should only be used once when the file is opened, so this is confusing from that point of view.
Is something else using Jackson or initializing an Avro JsonDecoder frequently? There are over 100000 Jackson DeserializationConfig objects.
Another place that parses the schema is in AvroSerialization.java. Does the Hadoop getDeserializer() API method get called once per job, or per record? If this is called more than once per map job, it might explain this.
In principle, Jackson is only used by a mapper during initialization. The below indicates that this may not be the case or that something outside of Avro is causing a lot of Jackson JSON parsing.
Are you using something that is converting the Avro data to Json form? toString() on most Avro datum objects will do a lot of work with Jackson, for example — but the below are deserializer objects not serializer objects so that is not likely the issue.
On 6/1/11 11:34 AM, "ey-chih chow" <ey...@hotmail.com>> wrote:
We ran jmap on one of our mapper and found the top usage as follows:
num #instances #bytes Class description
--------------------------------------------------------------------------
1: 24405 291733256 byte[]
2: 6056 40228984 int[]
3: 388799 19966776 char[]
4: 101779 16284640 org.codehaus.jackson.impl.ReaderBasedParser
5: 369623 11827936 java.lang.String
6: 111059 8769424 java.util.HashMap$Entry[]
7: 204083 8163320 org.codehaus.jackson.impl.JsonReadContext
8: 211374 6763968 java.util.HashMap$Entry
9: 102551 5742856 org.codehaus.jackson.util.TextBuffer
10: 105854 5080992 java.nio.HeapByteBuffer
11: 105821 5079408 java.nio.HeapCharBuffer
12: 104578 5019744 java.util.HashMap
13: 102551 4922448 org.codehaus.jackson.io.IOContext
14: 101782 4885536 org.codehaus.jackson.map.DeserializationConfig
15: 101783 4071320 org.codehaus.jackson.sym.CharsToNameCanonicalizer
16: 101779 4071160 org.codehaus.jackson.map.deser.StdDeserializationContext
17: 101779 4071160 java.io.StringReader
18: 101754 4070160 java.util.HashMap$KeyIterator
It looks like Jackson eats up a lot of memory. Our mapper reads in files of the avro format. Does avro use Jackson a lot in reading the avro files? Is there any way to improve this? Thanks.
Ey-Chih Chow
________________________________
From: scott@richrelevance.com<ma...@richrelevance.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Date: Tue, 31 May 2011 18:26:23 -0700
Subject: Re: avro object reuse
All of those instances are short-lived. If you are running out of memory, its not likely due to object reuse. This tends to cause more CPU time in the garbage collector, but not out of memory conditions. This can be hard to do on a cluster, but grabbing 'jmap –histo' output from a JVM that has a larger-than-expected JVM heap usage can often be used to quickly identify the cause of memory consumption issues.
I'm not sure if AvroUtf8InputFormat can safely re-use its instances of Utf8 or not.
On 5/31/11 5:40 PM, "ey-chih chow" <ey...@hotmail.com>> wrote:
I actually looked into Avro code to find out how Avro does object reuse. I looked at AvroUtf8InputFormat and got the following question. Why a new Utf8 object has to be created each time the method next(AvroWrapper<Utf8> key, NullWritable value) is called ? Will this eat up too much memory when we call next(key, value) many times? Since Utf8 is mutable, can we just create one Utf8 object for all the calls to next(key, value)? Will this save memory? Thanks.
Ey-Chih Chow
________________________________
From: eychih@hotmail.com<ma...@hotmail.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Subject: avro object reuse
Date: Tue, 31 May 2011 10:38:39 -0700
Hi,
We have several mapreduce jobs using avro. They take too much memory when running on production. Can anybody suggest some object reuse techniques to cut down memory usage? Thanks.
Ey-Chih Chow
RE: avro object reuse
Posted by ey-chih chow <ey...@hotmail.com>.
Since this was in production, we did not turn on stack trace. Also, it was highly unlikely that there was any data corrupted because, if one mapper failed due to out of memory, the system started another one and went through all the data.
From: scott@richrelevance.com
To: user@avro.apache.org
Date: Thu, 9 Jun 2011 17:43:02 -0700
Subject: Re: avro object reuse
If the exception is happening while decoding, it could be due to corrupt data. Avro allocates a List preallocated to the size encoded, and I've seen corrupted data cause attempted allocations of arrays too large for the heap.
On 6/9/11 4:58 PM, "Scott Carey" <sc...@richrelevance.com> wrote:
What is the stack trace on the out of memory exception?
On 6/9/11 4:45 PM, "ey-chih chow" <ey...@hotmail.com> wrote:
We configure more than 100MB for MapReduce to do sorting. Memory we allocate for doing other things in the mapper actually is larger, but, for this job, we always get out-of-meory exceptions and the job can not complete. We try to find out if there is a way to avoid this problem.
Ey-Chih Chow
From: scott@richrelevance.com
To: user@avro.apache.org
Date: Thu, 9 Jun 2011 15:42:10 -0700
Subject: Re: avro object reuse
The most likely candidate for creating many instances of BufferAccessor and ByteArrayByteSource is BinaryData.compare() and BinaryData.hashCode(). Each call will create one of each (hash) or two of each (compare). These are only 32 bytes per instance and quickly become garbage that is easily cleaned up by the GC.
The below have only 32 bytes each and 8MB total.On the other hand, the byte[]'s appear to be about 24K each on average and are using 100MB. Is this the size of your configured MapReduce sort MB?
On 6/9/11 3:08 PM, "ey-chih chow" <ey...@hotmail.com> wrote:
We did more monitoring. At one instance, we got the following histogram via Jmap. The question is why there are so many instances of BinaryDecoder$BufferAccessor and BinaryDecoder$ByteArrayByteSource. How to avoid this? Thanks.
Object Histogram:
num #instances #bytes Class description
--------------------------------------------------------------------------
1: 4199 100241168 byte[]
2: 272948 8734336 org.apache.avro.io.BinaryDecoder$BufferAccessor
3: 272945 8734240 org.apache.avro.io.BinaryDecoder$ByteArrayByteSource
4: 2093 5387976 int[]
5: 23762 2822864 * ConstMethodKlass
6: 23762 1904760 * MethodKlass
7: 39295 1688992 * SymbolKlass
8: 2127 1216976 * ConstantPoolKlass
9: 2127 882760 * InstanceKlassKlass
10: 1847 742936 * ConstantPoolCacheKlass
11: 9602 715608 char[]
12: 1072 299584 * MethodDataKlass
13: 9698 232752 java.lang.String
14: 2317 222432 java.lang.Class
15: 3288 204440 short[]
16: 3167 156664 * System ObjArray
17: 2401 57624 java.util.HashMap$Entry
18: 666 53280 java.lang.reflect.Method
19: 161 52808 * ObjArrayKlassKlass
20: 1808 43392 java.util.Hashtable$Entry
From: eychih@hotmail.com
To: user@avro.apache.org
Subject: RE: avro object reuse
Date: Wed, 1 Jun 2011 15:14:03 -0700
We use a lot of toString() call on the avro Utf8 object. Will this cause Jackson call? Thanks.
Ey-Chih
From: scott@richrelevance.com
To: user@avro.apache.org
Date: Wed, 1 Jun 2011 13:38:39 -0700
Subject: Re: avro object reuse
This is great info.
Jackson should only be used once when the file is opened, so this is confusing from that point of view. Is something else using Jackson or initializing an Avro JsonDecoder frequently? There are over 100000 Jackson DeserializationConfig objects.
Another place that parses the schema is in AvroSerialization.java. Does the Hadoop getDeserializer() API method get called once per job, or per record? If this is called more than once per map job, it might explain this.
In principle, Jackson is only used by a mapper during initialization. The below indicates that this may not be the case or that something outside of Avro is causing a lot of Jackson JSON parsing.
Are you using something that is converting the Avro data to Json form? toString() on most Avro datum objects will do a lot of work with Jackson, for example — but the below are deserializer objects not serializer objects so that is not likely the issue.
On 6/1/11 11:34 AM, "ey-chih chow" <ey...@hotmail.com> wrote:
We ran jmap on one of our mapper and found the top usage as follows:
num #instances #bytes Class description--------------------------------------------------------------------------1: 24405 291733256 byte[]2: 6056 40228984 int[]3: 388799 19966776 char[]4: 101779 16284640 org.codehaus.jackson.impl.ReaderBasedParser5: 369623 11827936 java.lang.String6: 111059 8769424 java.util.HashMap$Entry[]7: 204083 8163320 org.codehaus.jackson.impl.JsonReadContext8: 211374 6763968 java.util.HashMap$Entry9: 102551 5742856 org.codehaus.jackson.util.TextBuffer10: 105854 5080992 java.nio.HeapByteBuffer11: 105821 5079408 java.nio.HeapCharBuffer12: 104578 5019744 java.util.HashMap13: 102551 4922448 org.codehaus.jackson.io.IOContext14: 101782 4885536 org.codehaus.jackson.map.DeserializationConfig15: 101783 4071320 org.codehaus.jackson.sym.CharsToNameCanonicalizer16: 101779 4071160 org.codehaus.jackson.map.deser.StdDeserializationContext17: 101779 4071160 java.io.StringReader18: 101754 4070160 java.util.HashMap$KeyIterator
It looks like Jackson eats up a lot of memory. Our mapper reads in files of the avro format. Does avro use Jackson a lot in reading the avro files? Is there any way to improve this? Thanks.
Ey-Chih Chow
From: scott@richrelevance.com
To: user@avro.apache.org
Date: Tue, 31 May 2011 18:26:23 -0700
Subject: Re: avro object reuse
All of those instances are short-lived. If you are running out of memory, its not likely due to object reuse. This tends to cause more CPU time in the garbage collector, but not out of memory conditions. This can be hard to do on a cluster, but grabbing 'jmap –histo' output from a JVM that has a larger-than-expected JVM heap usage can often be used to quickly identify the cause of memory consumption issues.
I'm not sure if AvroUtf8InputFormat can safely re-use its instances of Utf8 or not.
On 5/31/11 5:40 PM, "ey-chih chow" <ey...@hotmail.com> wrote:
I actually looked into Avro code to find out how Avro does object reuse. I looked at AvroUtf8InputFormat and got the following question. Why a new Utf8 object has to be created each time the method next(AvroWrapper<Utf8> key, NullWritable value) is called ? Will this eat up too much memory when we call next(key, value) many times? Since Utf8 is mutable, can we just create one Utf8 object for all the calls to next(key, value)? Will this save memory? Thanks.
Ey-Chih Chow
From: eychih@hotmail.com
To: user@avro.apache.org
Subject: avro object reuse
Date: Tue, 31 May 2011 10:38:39 -0700
Hi,
We have several mapreduce jobs using avro. They take too much memory when running on production. Can anybody suggest some object reuse techniques to cut down memory usage? Thanks.
Ey-Chih Chow
Re: avro object reuse
Posted by Scott Carey <sc...@richrelevance.com>.
If the exception is happening while decoding, it could be due to corrupt data. Avro allocates a List preallocated to the size encoded, and I've seen corrupted data cause attempted allocations of arrays too large for the heap.
On 6/9/11 4:58 PM, "Scott Carey" <sc...@richrelevance.com>> wrote:
What is the stack trace on the out of memory exception?
On 6/9/11 4:45 PM, "ey-chih chow" <ey...@hotmail.com>> wrote:
We configure more than 100MB for MapReduce to do sorting. Memory we allocate for doing other things in the mapper actually is larger, but, for this job, we always get out-of-meory exceptions and the job can not complete. We try to find out if there is a way to avoid this problem.
Ey-Chih Chow
________________________________
From: scott@richrelevance.com<ma...@richrelevance.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Date: Thu, 9 Jun 2011 15:42:10 -0700
Subject: Re: avro object reuse
The most likely candidate for creating many instances of BufferAccessor and ByteArrayByteSource is BinaryData.compare() and BinaryData.hashCode(). Each call will create one of each (hash) or two of each (compare). These are only 32 bytes per instance and quickly become garbage that is easily cleaned up by the GC.
The below have only 32 bytes each and 8MB total.
On the other hand, the byte[]'s appear to be about 24K each on average and are using 100MB. Is this the size of your configured MapReduce sort MB?
On 6/9/11 3:08 PM, "ey-chih chow" <ey...@hotmail.com>> wrote:
We did more monitoring. At one instance, we got the following histogram via Jmap. The question is why there are so many instances of BinaryDecoder$BufferAccessor and BinaryDecoder$ByteArrayByteSource. How to avoid this? Thanks.
Object Histogram:
num #instances #bytes Class description
--------------------------------------------------------------------------
1: 4199 100241168 byte[]
2: 272948 8734336 org.apache.avro.io.BinaryDecoder$BufferAccessor
3: 272945 8734240 org.apache.avro.io.BinaryDecoder$ByteArrayByteSource
4: 2093 5387976 int[]
5: 23762 2822864 * ConstMethodKlass
6: 23762 1904760 * MethodKlass
7: 39295 1688992 * SymbolKlass
8: 2127 1216976 * ConstantPoolKlass
9: 2127 882760 * InstanceKlassKlass
10: 1847 742936 * ConstantPoolCacheKlass
11: 9602 715608 char[]
12: 1072 299584 * MethodDataKlass
13: 9698 232752 java.lang.String
14: 2317 222432 java.lang.Class
15: 3288 204440 short[]
16: 3167 156664 * System ObjArray
17: 2401 57624 java.util.HashMap$Entry
18: 666 53280 java.lang.reflect.Method
19: 161 52808 * ObjArrayKlassKlass
20: 1808 43392 java.util.Hashtable$Entry
________________________________
From: eychih@hotmail.com<ma...@hotmail.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Subject: RE: avro object reuse
Date: Wed, 1 Jun 2011 15:14:03 -0700
We use a lot of toString() call on the avro Utf8 object. Will this cause Jackson call? Thanks.
Ey-Chih
________________________________
From: scott@richrelevance.com<ma...@richrelevance.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Date: Wed, 1 Jun 2011 13:38:39 -0700
Subject: Re: avro object reuse
This is great info.
Jackson should only be used once when the file is opened, so this is confusing from that point of view.
Is something else using Jackson or initializing an Avro JsonDecoder frequently? There are over 100000 Jackson DeserializationConfig objects.
Another place that parses the schema is in AvroSerialization.java. Does the Hadoop getDeserializer() API method get called once per job, or per record? If this is called more than once per map job, it might explain this.
In principle, Jackson is only used by a mapper during initialization. The below indicates that this may not be the case or that something outside of Avro is causing a lot of Jackson JSON parsing.
Are you using something that is converting the Avro data to Json form? toString() on most Avro datum objects will do a lot of work with Jackson, for example — but the below are deserializer objects not serializer objects so that is not likely the issue.
On 6/1/11 11:34 AM, "ey-chih chow" <ey...@hotmail.com>> wrote:
We ran jmap on one of our mapper and found the top usage as follows:
num #instances #bytes Class description
--------------------------------------------------------------------------
1: 24405 291733256 byte[]
2: 6056 40228984 int[]
3: 388799 19966776 char[]
4: 101779 16284640 org.codehaus.jackson.impl.ReaderBasedParser
5: 369623 11827936 java.lang.String
6: 111059 8769424 java.util.HashMap$Entry[]
7: 204083 8163320 org.codehaus.jackson.impl.JsonReadContext
8: 211374 6763968 java.util.HashMap$Entry
9: 102551 5742856 org.codehaus.jackson.util.TextBuffer
10: 105854 5080992 java.nio.HeapByteBuffer
11: 105821 5079408 java.nio.HeapCharBuffer
12: 104578 5019744 java.util.HashMap
13: 102551 4922448 org.codehaus.jackson.io.IOContext
14: 101782 4885536 org.codehaus.jackson.map.DeserializationConfig
15: 101783 4071320 org.codehaus.jackson.sym.CharsToNameCanonicalizer
16: 101779 4071160 org.codehaus.jackson.map.deser.StdDeserializationContext
17: 101779 4071160 java.io.StringReader
18: 101754 4070160 java.util.HashMap$KeyIterator
It looks like Jackson eats up a lot of memory. Our mapper reads in files of the avro format. Does avro use Jackson a lot in reading the avro files? Is there any way to improve this? Thanks.
Ey-Chih Chow
________________________________
From: scott@richrelevance.com<ma...@richrelevance.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Date: Tue, 31 May 2011 18:26:23 -0700
Subject: Re: avro object reuse
All of those instances are short-lived. If you are running out of memory, its not likely due to object reuse. This tends to cause more CPU time in the garbage collector, but not out of memory conditions. This can be hard to do on a cluster, but grabbing 'jmap –histo' output from a JVM that has a larger-than-expected JVM heap usage can often be used to quickly identify the cause of memory consumption issues.
I'm not sure if AvroUtf8InputFormat can safely re-use its instances of Utf8 or not.
On 5/31/11 5:40 PM, "ey-chih chow" <ey...@hotmail.com>> wrote:
I actually looked into Avro code to find out how Avro does object reuse. I looked at AvroUtf8InputFormat and got the following question. Why a new Utf8 object has to be created each time the method next(AvroWrapper<Utf8> key, NullWritable value) is called ? Will this eat up too much memory when we call next(key, value) many times? Since Utf8 is mutable, can we just create one Utf8 object for all the calls to next(key, value)? Will this save memory? Thanks.
Ey-Chih Chow
________________________________
From: eychih@hotmail.com<ma...@hotmail.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Subject: avro object reuse
Date: Tue, 31 May 2011 10:38:39 -0700
Hi,
We have several mapreduce jobs using avro. They take too much memory when running on production. Can anybody suggest some object reuse techniques to cut down memory usage? Thanks.
Ey-Chih Chow
Re: avro object reuse
Posted by Scott Carey <sc...@richrelevance.com>.
What is the stack trace on the out of memory exception?
On 6/9/11 4:45 PM, "ey-chih chow" <ey...@hotmail.com>> wrote:
We configure more than 100MB for MapReduce to do sorting. Memory we allocate for doing other things in the mapper actually is larger, but, for this job, we always get out-of-meory exceptions and the job can not complete. We try to find out if there is a way to avoid this problem.
Ey-Chih Chow
________________________________
From: scott@richrelevance.com<ma...@richrelevance.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Date: Thu, 9 Jun 2011 15:42:10 -0700
Subject: Re: avro object reuse
The most likely candidate for creating many instances of BufferAccessor and ByteArrayByteSource is BinaryData.compare() and BinaryData.hashCode(). Each call will create one of each (hash) or two of each (compare). These are only 32 bytes per instance and quickly become garbage that is easily cleaned up by the GC.
The below have only 32 bytes each and 8MB total.
On the other hand, the byte[]'s appear to be about 24K each on average and are using 100MB. Is this the size of your configured MapReduce sort MB?
On 6/9/11 3:08 PM, "ey-chih chow" <ey...@hotmail.com>> wrote:
We did more monitoring. At one instance, we got the following histogram via Jmap. The question is why there are so many instances of BinaryDecoder$BufferAccessor and BinaryDecoder$ByteArrayByteSource. How to avoid this? Thanks.
Object Histogram:
num #instances #bytes Class description
--------------------------------------------------------------------------
1: 4199 100241168 byte[]
2: 272948 8734336 org.apache.avro.io.BinaryDecoder$BufferAccessor
3: 272945 8734240 org.apache.avro.io.BinaryDecoder$ByteArrayByteSource
4: 2093 5387976 int[]
5: 23762 2822864 * ConstMethodKlass
6: 23762 1904760 * MethodKlass
7: 39295 1688992 * SymbolKlass
8: 2127 1216976 * ConstantPoolKlass
9: 2127 882760 * InstanceKlassKlass
10: 1847 742936 * ConstantPoolCacheKlass
11: 9602 715608 char[]
12: 1072 299584 * MethodDataKlass
13: 9698 232752 java.lang.String
14: 2317 222432 java.lang.Class
15: 3288 204440 short[]
16: 3167 156664 * System ObjArray
17: 2401 57624 java.util.HashMap$Entry
18: 666 53280 java.lang.reflect.Method
19: 161 52808 * ObjArrayKlassKlass
20: 1808 43392 java.util.Hashtable$Entry
________________________________
From: eychih@hotmail.com<ma...@hotmail.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Subject: RE: avro object reuse
Date: Wed, 1 Jun 2011 15:14:03 -0700
We use a lot of toString() call on the avro Utf8 object. Will this cause Jackson call? Thanks.
Ey-Chih
________________________________
From: scott@richrelevance.com<ma...@richrelevance.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Date: Wed, 1 Jun 2011 13:38:39 -0700
Subject: Re: avro object reuse
This is great info.
Jackson should only be used once when the file is opened, so this is confusing from that point of view.
Is something else using Jackson or initializing an Avro JsonDecoder frequently? There are over 100000 Jackson DeserializationConfig objects.
Another place that parses the schema is in AvroSerialization.java. Does the Hadoop getDeserializer() API method get called once per job, or per record? If this is called more than once per map job, it might explain this.
In principle, Jackson is only used by a mapper during initialization. The below indicates that this may not be the case or that something outside of Avro is causing a lot of Jackson JSON parsing.
Are you using something that is converting the Avro data to Json form? toString() on most Avro datum objects will do a lot of work with Jackson, for example — but the below are deserializer objects not serializer objects so that is not likely the issue.
On 6/1/11 11:34 AM, "ey-chih chow" <ey...@hotmail.com>> wrote:
We ran jmap on one of our mapper and found the top usage as follows:
num #instances #bytes Class description
--------------------------------------------------------------------------
1: 24405 291733256 byte[]
2: 6056 40228984 int[]
3: 388799 19966776 char[]
4: 101779 16284640 org.codehaus.jackson.impl.ReaderBasedParser
5: 369623 11827936 java.lang.String
6: 111059 8769424 java.util.HashMap$Entry[]
7: 204083 8163320 org.codehaus.jackson.impl.JsonReadContext
8: 211374 6763968 java.util.HashMap$Entry
9: 102551 5742856 org.codehaus.jackson.util.TextBuffer
10: 105854 5080992 java.nio.HeapByteBuffer
11: 105821 5079408 java.nio.HeapCharBuffer
12: 104578 5019744 java.util.HashMap
13: 102551 4922448 org.codehaus.jackson.io.IOContext
14: 101782 4885536 org.codehaus.jackson.map.DeserializationConfig
15: 101783 4071320 org.codehaus.jackson.sym.CharsToNameCanonicalizer
16: 101779 4071160 org.codehaus.jackson.map.deser.StdDeserializationContext
17: 101779 4071160 java.io.StringReader
18: 101754 4070160 java.util.HashMap$KeyIterator
It looks like Jackson eats up a lot of memory. Our mapper reads in files of the avro format. Does avro use Jackson a lot in reading the avro files? Is there any way to improve this? Thanks.
Ey-Chih Chow
________________________________
From: scott@richrelevance.com<ma...@richrelevance.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Date: Tue, 31 May 2011 18:26:23 -0700
Subject: Re: avro object reuse
All of those instances are short-lived. If you are running out of memory, its not likely due to object reuse. This tends to cause more CPU time in the garbage collector, but not out of memory conditions. This can be hard to do on a cluster, but grabbing 'jmap –histo' output from a JVM that has a larger-than-expected JVM heap usage can often be used to quickly identify the cause of memory consumption issues.
I'm not sure if AvroUtf8InputFormat can safely re-use its instances of Utf8 or not.
On 5/31/11 5:40 PM, "ey-chih chow" <ey...@hotmail.com>> wrote:
I actually looked into Avro code to find out how Avro does object reuse. I looked at AvroUtf8InputFormat and got the following question. Why a new Utf8 object has to be created each time the method next(AvroWrapper<Utf8> key, NullWritable value) is called ? Will this eat up too much memory when we call next(key, value) many times? Since Utf8 is mutable, can we just create one Utf8 object for all the calls to next(key, value)? Will this save memory? Thanks.
Ey-Chih Chow
________________________________
From: eychih@hotmail.com<ma...@hotmail.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Subject: avro object reuse
Date: Tue, 31 May 2011 10:38:39 -0700
Hi,
We have several mapreduce jobs using avro. They take too much memory when running on production. Can anybody suggest some object reuse techniques to cut down memory usage? Thanks.
Ey-Chih Chow
RE: avro object reuse
Posted by ey-chih chow <ey...@hotmail.com>.
We configure more than 100MB for MapReduce to do sorting. Memory we allocate for doing other things in the mapper actually is larger, but, for this job, we always get out-of-meory exceptions and the job can not complete. We try to find out if there is a way to avoid this problem.
Ey-Chih Chow
From: scott@richrelevance.com
To: user@avro.apache.org
Date: Thu, 9 Jun 2011 15:42:10 -0700
Subject: Re: avro object reuse
The most likely candidate for creating many instances of BufferAccessor and ByteArrayByteSource is BinaryData.compare() and BinaryData.hashCode(). Each call will create one of each (hash) or two of each (compare). These are only 32 bytes per instance and quickly become garbage that is easily cleaned up by the GC.
The below have only 32 bytes each and 8MB total.On the other hand, the byte[]'s appear to be about 24K each on average and are using 100MB. Is this the size of your configured MapReduce sort MB?
On 6/9/11 3:08 PM, "ey-chih chow" <ey...@hotmail.com> wrote:
We did more monitoring. At one instance, we got the following histogram via Jmap. The question is why there are so many instances of BinaryDecoder$BufferAccessor and BinaryDecoder$ByteArrayByteSource. How to avoid this? Thanks.
Object Histogram:
num #instances #bytes Class description
--------------------------------------------------------------------------
1: 4199 100241168 byte[]
2: 272948 8734336 org.apache.avro.io.BinaryDecoder$BufferAccessor
3: 272945 8734240 org.apache.avro.io.BinaryDecoder$ByteArrayByteSource
4: 2093 5387976 int[]
5: 23762 2822864 * ConstMethodKlass
6: 23762 1904760 * MethodKlass
7: 39295 1688992 * SymbolKlass
8: 2127 1216976 * ConstantPoolKlass
9: 2127 882760 * InstanceKlassKlass
10: 1847 742936 * ConstantPoolCacheKlass
11: 9602 715608 char[]
12: 1072 299584 * MethodDataKlass
13: 9698 232752 java.lang.String
14: 2317 222432 java.lang.Class
15: 3288 204440 short[]
16: 3167 156664 * System ObjArray
17: 2401 57624 java.util.HashMap$Entry
18: 666 53280 java.lang.reflect.Method
19: 161 52808 * ObjArrayKlassKlass
20: 1808 43392 java.util.Hashtable$Entry
From: eychih@hotmail.com
To: user@avro.apache.org
Subject: RE: avro object reuse
Date: Wed, 1 Jun 2011 15:14:03 -0700
We use a lot of toString() call on the avro Utf8 object. Will this cause Jackson call? Thanks.
Ey-Chih
From: scott@richrelevance.com
To: user@avro.apache.org
Date: Wed, 1 Jun 2011 13:38:39 -0700
Subject: Re: avro object reuse
This is great info.
Jackson should only be used once when the file is opened, so this is confusing from that point of view. Is something else using Jackson or initializing an Avro JsonDecoder frequently? There are over 100000 Jackson DeserializationConfig objects.
Another place that parses the schema is in AvroSerialization.java. Does the Hadoop getDeserializer() API method get called once per job, or per record? If this is called more than once per map job, it might explain this.
In principle, Jackson is only used by a mapper during initialization. The below indicates that this may not be the case or that something outside of Avro is causing a lot of Jackson JSON parsing.
Are you using something that is converting the Avro data to Json form? toString() on most Avro datum objects will do a lot of work with Jackson, for example — but the below are deserializer objects not serializer objects so that is not likely the issue.
On 6/1/11 11:34 AM, "ey-chih chow" <ey...@hotmail.com> wrote:
We ran jmap on one of our mapper and found the top usage as follows:
num #instances #bytes Class description--------------------------------------------------------------------------1: 24405 291733256 byte[]2: 6056 40228984 int[]3: 388799 19966776 char[]4: 101779 16284640 org.codehaus.jackson.impl.ReaderBasedParser5: 369623 11827936 java.lang.String6: 111059 8769424 java.util.HashMap$Entry[]7: 204083 8163320 org.codehaus.jackson.impl.JsonReadContext8: 211374 6763968 java.util.HashMap$Entry9: 102551 5742856 org.codehaus.jackson.util.TextBuffer10: 105854 5080992 java.nio.HeapByteBuffer11: 105821 5079408 java.nio.HeapCharBuffer12: 104578 5019744 java.util.HashMap13: 102551 4922448 org.codehaus.jackson.io.IOContext14: 101782 4885536 org.codehaus.jackson.map.DeserializationConfig15: 101783 4071320 org.codehaus.jackson.sym.CharsToNameCanonicalizer16: 101779 4071160 org.codehaus.jackson.map.deser.StdDeserializationContext17: 101779 4071160 java.io.StringReader18: 101754 4070160 java.util.HashMap$KeyIterator
It looks like Jackson eats up a lot of memory. Our mapper reads in files of the avro format. Does avro use Jackson a lot in reading the avro files? Is there any way to improve this? Thanks.
Ey-Chih Chow
From: scott@richrelevance.com
To: user@avro.apache.org
Date: Tue, 31 May 2011 18:26:23 -0700
Subject: Re: avro object reuse
All of those instances are short-lived. If you are running out of memory, its not likely due to object reuse. This tends to cause more CPU time in the garbage collector, but not out of memory conditions. This can be hard to do on a cluster, but grabbing 'jmap –histo' output from a JVM that has a larger-than-expected JVM heap usage can often be used to quickly identify the cause of memory consumption issues.
I'm not sure if AvroUtf8InputFormat can safely re-use its instances of Utf8 or not.
On 5/31/11 5:40 PM, "ey-chih chow" <ey...@hotmail.com> wrote:
I actually looked into Avro code to find out how Avro does object reuse. I looked at AvroUtf8InputFormat and got the following question. Why a new Utf8 object has to be created each time the method next(AvroWrapper<Utf8> key, NullWritable value) is called ? Will this eat up too much memory when we call next(key, value) many times? Since Utf8 is mutable, can we just create one Utf8 object for all the calls to next(key, value)? Will this save memory? Thanks.
Ey-Chih Chow
From: eychih@hotmail.com
To: user@avro.apache.org
Subject: avro object reuse
Date: Tue, 31 May 2011 10:38:39 -0700
Hi,
We have several mapreduce jobs using avro. They take too much memory when running on production. Can anybody suggest some object reuse techniques to cut down memory usage? Thanks.
Ey-Chih Chow
Re: avro object reuse
Posted by Scott Carey <sc...@richrelevance.com>.
If you do just 'jmap –histo' it shows you all of the objects on the heap. Many of these objects may be garbage and unreferenced. This is quick, and does not block the app or force a GC.
If you do 'jmap –histo:live' it will GC and only show the objects that are 'live' (currently referenced).
These are different because a GC ran and removed all the BinaryData inner class temporary objects.
On 6/9/11 3:26 PM, "ey-chih chow" <ey...@hotmail.com>> wrote:
In addition, we ran the same MR job once again and got the following histogram. Whey this is different from the previous one? Thanks.
Ey-Chih Chow
Object Histogram:
num #instances #bytes Class description
--------------------------------------------------------------------------
1: 4327 100242096 byte[]
2: 2050 5381496 int[]
3: 23762 2822864 * ConstMethodKlass
4: 23762 1904760 * MethodKlass
5: 39295 1688992 * SymbolKlass
6: 2127 1216976 * ConstantPoolKlass
7: 2127 882760 * InstanceKlassKlass
8: 11298 773008 char[]
9: 1847 742936 * ConstantPoolCacheKlass
10: 1064 297448 * MethodDataKlass
11: 11387 273288 java.lang.String
12: 2317 222432 java.lang.Class
13: 3288 204440 short[]
14: 3167 156664 * System ObjArray
15: 1360 86720 java.util.HashMap$Entry[]
16: 535 85600 org.codehaus.jackson.impl.ReaderBasedParser
17: 3498 83952 java.util.HashMap$Entry
18: 666 53280 java.lang.reflect.Method
19: 161 52808 * ObjArrayKlassKlass
20: 1267 44704 java.lang.Object[]
21: 1808 43392 java.util.Hashtable$Entry
22: 1070 42800 org.codehaus.jackson.impl.JsonReadContext
23: 777 31080 java.util.HashMap
24: 535 29960 org.codehaus.jackson.util.TextBuffer
25: 567 27216 java.nio.HeapByteBuffer
26: 553 26544 org.apache.avro.Schema$Props
27: 549 26352 java.nio.HeapCharBuffer
28: 538 25824 org.codehaus.jackson.map.DeserializationConfig
29: 535 25680 org.codehaus.jackson.io.IOContext
30: 1554 24864 org.codehaus.jackson.sym.CharsToNameCanonicalizer$Bucket
31: 539 21560 org.codehaus.jackson.sym.CharsToNameCanonicalizer
________________________________
From: eychih@hotmail.com<ma...@hotmail.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Subject: RE: avro object reuse
Date: Thu, 9 Jun 2011 15:16:29 -0700
I forgot to mention that the histogram in my previous message was extracted from a mapper of one of our MR job.
Ey-Chih Chow
________________________________
From: eychih@hotmail.com<ma...@hotmail.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Subject: RE: avro object reuse
Date: Thu, 9 Jun 2011 15:08:02 -0700
We did more monitoring. At one instance, we got the following histogram via Jmap. The question is why there are so many instances of BinaryDecoder$BufferAccessor and BinaryDecoder$ByteArrayByteSource. How to avoid this? Thanks.
Object Histogram:
num #instances #bytes Class description
--------------------------------------------------------------------------
1: 4199 100241168 byte[]
2: 272948 8734336 org.apache.avro.io.BinaryDecoder$BufferAccessor
3: 272945 8734240 org.apache.avro.io.BinaryDecoder$ByteArrayByteSource
4: 2093 5387976 int[]
5: 23762 2822864 * ConstMethodKlass
6: 23762 1904760 * MethodKlass
7: 39295 1688992 * SymbolKlass
8: 2127 1216976 * ConstantPoolKlass
9: 2127 882760 * InstanceKlassKlass
10: 1847 742936 * ConstantPoolCacheKlass
11: 9602 715608 char[]
12: 1072 299584 * MethodDataKlass
13: 9698 232752 java.lang.String
14: 2317 222432 java.lang.Class
15: 3288 204440 short[]
16: 3167 156664 * System ObjArray
17: 2401 57624 java.util.HashMap$Entry
18: 666 53280 java.lang.reflect.Method
19: 161 52808 * ObjArrayKlassKlass
20: 1808 43392 java.util.Hashtable$Entry
________________________________
From: eychih@hotmail.com<ma...@hotmail.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Subject: RE: avro object reuse
Date: Wed, 1 Jun 2011 15:14:03 -0700
We use a lot of toString() call on the avro Utf8 object. Will this cause Jackson call? Thanks.
Ey-Chih
________________________________
From: scott@richrelevance.com<ma...@richrelevance.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Date: Wed, 1 Jun 2011 13:38:39 -0700
Subject: Re: avro object reuse
This is great info.
Jackson should only be used once when the file is opened, so this is confusing from that point of view.
Is something else using Jackson or initializing an Avro JsonDecoder frequently? There are over 100000 Jackson DeserializationConfig objects.
Another place that parses the schema is in AvroSerialization.java. Does the Hadoop getDeserializer() API method get called once per job, or per record? If this is called more than once per map job, it might explain this.
In principle, Jackson is only used by a mapper during initialization. The below indicates that this may not be the case or that something outside of Avro is causing a lot of Jackson JSON parsing.
Are you using something that is converting the Avro data to Json form? toString() on most Avro datum objects will do a lot of work with Jackson, for example — but the below are deserializer objects not serializer objects so that is not likely the issue.
On 6/1/11 11:34 AM, "ey-chih chow" <ey...@hotmail.com>> wrote:
We ran jmap on one of our mapper and found the top usage as follows:
num #instances #bytes Class description
--------------------------------------------------------------------------
1: 24405 291733256 byte[]
2: 6056 40228984 int[]
3: 388799 19966776 char[]
4: 101779 16284640 org.codehaus.jackson.impl.ReaderBasedParser
5: 369623 11827936 java.lang.String
6: 111059 8769424 java.util.HashMap$Entry[]
7: 204083 8163320 org.codehaus.jackson.impl.JsonReadContext
8: 211374 6763968 java.util.HashMap$Entry
9: 102551 5742856 org.codehaus.jackson.util.TextBuffer
10: 105854 5080992 java.nio.HeapByteBuffer
11: 105821 5079408 java.nio.HeapCharBuffer
12: 104578 5019744 java.util.HashMap
13: 102551 4922448 org.codehaus.jackson.io.IOContext
14: 101782 4885536 org.codehaus.jackson.map.DeserializationConfig
15: 101783 4071320 org.codehaus.jackson.sym.CharsToNameCanonicalizer
16: 101779 4071160 org.codehaus.jackson.map.deser.StdDeserializationContext
17: 101779 4071160 java.io.StringReader
18: 101754 4070160 java.util.HashMap$KeyIterator
It looks like Jackson eats up a lot of memory. Our mapper reads in files of the avro format. Does avro use Jackson a lot in reading the avro files? Is there any way to improve this? Thanks.
Ey-Chih Chow
________________________________
From: scott@richrelevance.com<ma...@richrelevance.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Date: Tue, 31 May 2011 18:26:23 -0700
Subject: Re: avro object reuse
All of those instances are short-lived. If you are running out of memory, its not likely due to object reuse. This tends to cause more CPU time in the garbage collector, but not out of memory conditions. This can be hard to do on a cluster, but grabbing 'jmap –histo' output from a JVM that has a larger-than-expected JVM heap usage can often be used to quickly identify the cause of memory consumption issues.
I'm not sure if AvroUtf8InputFormat can safely re-use its instances of Utf8 or not.
On 5/31/11 5:40 PM, "ey-chih chow" <ey...@hotmail.com>> wrote:
I actually looked into Avro code to find out how Avro does object reuse. I looked at AvroUtf8InputFormat and got the following question. Why a new Utf8 object has to be created each time the method next(AvroWrapper<Utf8> key, NullWritable value) is called ? Will this eat up too much memory when we call next(key, value) many times? Since Utf8 is mutable, can we just create one Utf8 object for all the calls to next(key, value)? Will this save memory? Thanks.
Ey-Chih Chow
________________________________
From: eychih@hotmail.com<ma...@hotmail.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Subject: avro object reuse
Date: Tue, 31 May 2011 10:38:39 -0700
Hi,
We have several mapreduce jobs using avro. They take too much memory when running on production. Can anybody suggest some object reuse techniques to cut down memory usage? Thanks.
Ey-Chih Chow
RE: avro object reuse
Posted by ey-chih chow <ey...@hotmail.com>.
In addition, we ran the same MR job once again and got the following histogram. Whey this is different from the previous one? Thanks.
Ey-Chih Chow
Object Histogram:
num #instances #bytes Class description
--------------------------------------------------------------------------
1: 4327 100242096 byte[]
2: 2050 5381496 int[]
3: 23762 2822864 * ConstMethodKlass
4: 23762 1904760 * MethodKlass
5: 39295 1688992 * SymbolKlass
6: 2127 1216976 * ConstantPoolKlass
7: 2127 882760 * InstanceKlassKlass
8: 11298 773008 char[]
9: 1847 742936 * ConstantPoolCacheKlass
10: 1064 297448 * MethodDataKlass
11: 11387 273288 java.lang.String
12: 2317 222432 java.lang.Class
13: 3288 204440 short[]
14: 3167 156664 * System ObjArray
15: 1360 86720 java.util.HashMap$Entry[]
16: 535 85600 org.codehaus.jackson.impl.ReaderBasedParser
17: 3498 83952 java.util.HashMap$Entry
18: 666 53280 java.lang.reflect.Method
19: 161 52808 * ObjArrayKlassKlass
20: 1267 44704 java.lang.Object[]
21: 1808 43392 java.util.Hashtable$Entry
22: 1070 42800 org.codehaus.jackson.impl.JsonReadContext
23: 777 31080 java.util.HashMap
24: 535 29960 org.codehaus.jackson.util.TextBuffer
25: 567 27216 java.nio.HeapByteBuffer
26: 553 26544 org.apache.avro.Schema$Props
27: 549 26352 java.nio.HeapCharBuffer
28: 538 25824 org.codehaus.jackson.map.DeserializationConfig
29: 535 25680 org.codehaus.jackson.io.IOContext
30: 1554 24864 org.codehaus.jackson.sym.CharsToNameCanonicalizer$Bucket
31: 539 21560 org.codehaus.jackson.sym.CharsToNameCanonicalizer
From: eychih@hotmail.com
To: user@avro.apache.org
Subject: RE: avro object reuse
Date: Thu, 9 Jun 2011 15:16:29 -0700
I forgot to mention that the histogram in my previous message was extracted from a mapper of one of our MR job.
Ey-Chih Chow
From: eychih@hotmail.com
To: user@avro.apache.org
Subject: RE: avro object reuse
Date: Thu, 9 Jun 2011 15:08:02 -0700
We did more monitoring. At one instance, we got the following histogram via Jmap. The question is why there are so many instances of BinaryDecoder$BufferAccessor and BinaryDecoder$ByteArrayByteSource. How to avoid this? Thanks.
Object Histogram:
num #instances #bytes Class description
--------------------------------------------------------------------------
1: 4199 100241168 byte[]
2: 272948 8734336 org.apache.avro.io.BinaryDecoder$BufferAccessor
3: 272945 8734240 org.apache.avro.io.BinaryDecoder$ByteArrayByteSource
4: 2093 5387976 int[]
5: 23762 2822864 * ConstMethodKlass
6: 23762 1904760 * MethodKlass
7: 39295 1688992 * SymbolKlass
8: 2127 1216976 * ConstantPoolKlass
9: 2127 882760 * InstanceKlassKlass
10: 1847 742936 * ConstantPoolCacheKlass
11: 9602 715608 char[]
12: 1072 299584 * MethodDataKlass
13: 9698 232752 java.lang.String
14: 2317 222432 java.lang.Class
15: 3288 204440 short[]
16: 3167 156664 * System ObjArray
17: 2401 57624 java.util.HashMap$Entry
18: 666 53280 java.lang.reflect.Method
19: 161 52808 * ObjArrayKlassKlass
20: 1808 43392 java.util.Hashtable$Entry
From: eychih@hotmail.com
To: user@avro.apache.org
Subject: RE: avro object reuse
Date: Wed, 1 Jun 2011 15:14:03 -0700
We use a lot of toString() call on the avro Utf8 object. Will this cause Jackson call? Thanks.
Ey-Chih
From: scott@richrelevance.com
To: user@avro.apache.org
Date: Wed, 1 Jun 2011 13:38:39 -0700
Subject: Re: avro object reuse
This is great info.
Jackson should only be used once when the file is opened, so this is confusing from that point of view. Is something else using Jackson or initializing an Avro JsonDecoder frequently? There are over 100000 Jackson DeserializationConfig objects.
Another place that parses the schema is in AvroSerialization.java. Does the Hadoop getDeserializer() API method get called once per job, or per record? If this is called more than once per map job, it might explain this.
In principle, Jackson is only used by a mapper during initialization. The below indicates that this may not be the case or that something outside of Avro is causing a lot of Jackson JSON parsing.
Are you using something that is converting the Avro data to Json form? toString() on most Avro datum objects will do a lot of work with Jackson, for example — but the below are deserializer objects not serializer objects so that is not likely the issue.
On 6/1/11 11:34 AM, "ey-chih chow" <ey...@hotmail.com> wrote:
We ran jmap on one of our mapper and found the top usage as follows:
num #instances #bytes Class description--------------------------------------------------------------------------1: 24405 291733256 byte[]2: 6056 40228984 int[]3: 388799 19966776 char[]4: 101779 16284640 org.codehaus.jackson.impl.ReaderBasedParser5: 369623 11827936 java.lang.String6: 111059 8769424 java.util.HashMap$Entry[]7: 204083 8163320 org.codehaus.jackson.impl.JsonReadContext8: 211374 6763968 java.util.HashMap$Entry9: 102551 5742856 org.codehaus.jackson.util.TextBuffer10: 105854 5080992 java.nio.HeapByteBuffer11: 105821 5079408 java.nio.HeapCharBuffer12: 104578 5019744 java.util.HashMap13: 102551 4922448 org.codehaus.jackson.io.IOContext14: 101782 4885536 org.codehaus.jackson.map.DeserializationConfig15: 101783 4071320 org.codehaus.jackson.sym.CharsToNameCanonicalizer16: 101779 4071160 org.codehaus.jackson.map.deser.StdDeserializationContext17: 101779 4071160 java.io.StringReader18: 101754 4070160 java.util.HashMap$KeyIterator
It looks like Jackson eats up a lot of memory. Our mapper reads in files of the avro format. Does avro use Jackson a lot in reading the avro files? Is there any way to improve this? Thanks.
Ey-Chih Chow
From: scott@richrelevance.com
To: user@avro.apache.org
Date: Tue, 31 May 2011 18:26:23 -0700
Subject: Re: avro object reuse
All of those instances are short-lived. If you are running out of memory, its not likely due to object reuse. This tends to cause more CPU time in the garbage collector, but not out of memory conditions. This can be hard to do on a cluster, but grabbing 'jmap –histo' output from a JVM that has a larger-than-expected JVM heap usage can often be used to quickly identify the cause of memory consumption issues.
I'm not sure if AvroUtf8InputFormat can safely re-use its instances of Utf8 or not.
On 5/31/11 5:40 PM, "ey-chih chow" <ey...@hotmail.com> wrote:
I actually looked into Avro code to find out how Avro does object reuse. I looked at AvroUtf8InputFormat and got the following question. Why a new Utf8 object has to be created each time the method next(AvroWrapper<Utf8> key, NullWritable value) is called ? Will this eat up too much memory when we call next(key, value) many times? Since Utf8 is mutable, can we just create one Utf8 object for all the calls to next(key, value)? Will this save memory? Thanks.
Ey-Chih Chow
From: eychih@hotmail.com
To: user@avro.apache.org
Subject: avro object reuse
Date: Tue, 31 May 2011 10:38:39 -0700
Hi,
We have several mapreduce jobs using avro. They take too much memory when running on production. Can anybody suggest some object reuse techniques to cut down memory usage? Thanks.
Ey-Chih Chow
RE: avro object reuse
Posted by ey-chih chow <ey...@hotmail.com>.
I forgot to mention that the histogram in my previous message was extracted from a mapper of one of our MR job.
Ey-Chih Chow
From: eychih@hotmail.com
To: user@avro.apache.org
Subject: RE: avro object reuse
Date: Thu, 9 Jun 2011 15:08:02 -0700
We did more monitoring. At one instance, we got the following histogram via Jmap. The question is why there are so many instances of BinaryDecoder$BufferAccessor and BinaryDecoder$ByteArrayByteSource. How to avoid this? Thanks.
Object Histogram:
num #instances #bytes Class description
--------------------------------------------------------------------------
1: 4199 100241168 byte[]
2: 272948 8734336 org.apache.avro.io.BinaryDecoder$BufferAccessor
3: 272945 8734240 org.apache.avro.io.BinaryDecoder$ByteArrayByteSource
4: 2093 5387976 int[]
5: 23762 2822864 * ConstMethodKlass
6: 23762 1904760 * MethodKlass
7: 39295 1688992 * SymbolKlass
8: 2127 1216976 * ConstantPoolKlass
9: 2127 882760 * InstanceKlassKlass
10: 1847 742936 * ConstantPoolCacheKlass
11: 9602 715608 char[]
12: 1072 299584 * MethodDataKlass
13: 9698 232752 java.lang.String
14: 2317 222432 java.lang.Class
15: 3288 204440 short[]
16: 3167 156664 * System ObjArray
17: 2401 57624 java.util.HashMap$Entry
18: 666 53280 java.lang.reflect.Method
19: 161 52808 * ObjArrayKlassKlass
20: 1808 43392 java.util.Hashtable$Entry
From: eychih@hotmail.com
To: user@avro.apache.org
Subject: RE: avro object reuse
Date: Wed, 1 Jun 2011 15:14:03 -0700
We use a lot of toString() call on the avro Utf8 object. Will this cause Jackson call? Thanks.
Ey-Chih
From: scott@richrelevance.com
To: user@avro.apache.org
Date: Wed, 1 Jun 2011 13:38:39 -0700
Subject: Re: avro object reuse
This is great info.
Jackson should only be used once when the file is opened, so this is confusing from that point of view. Is something else using Jackson or initializing an Avro JsonDecoder frequently? There are over 100000 Jackson DeserializationConfig objects.
Another place that parses the schema is in AvroSerialization.java. Does the Hadoop getDeserializer() API method get called once per job, or per record? If this is called more than once per map job, it might explain this.
In principle, Jackson is only used by a mapper during initialization. The below indicates that this may not be the case or that something outside of Avro is causing a lot of Jackson JSON parsing.
Are you using something that is converting the Avro data to Json form? toString() on most Avro datum objects will do a lot of work with Jackson, for example — but the below are deserializer objects not serializer objects so that is not likely the issue.
On 6/1/11 11:34 AM, "ey-chih chow" <ey...@hotmail.com> wrote:
We ran jmap on one of our mapper and found the top usage as follows:
num #instances #bytes Class description--------------------------------------------------------------------------1: 24405 291733256 byte[]2: 6056 40228984 int[]3: 388799 19966776 char[]4: 101779 16284640 org.codehaus.jackson.impl.ReaderBasedParser5: 369623 11827936 java.lang.String6: 111059 8769424 java.util.HashMap$Entry[]7: 204083 8163320 org.codehaus.jackson.impl.JsonReadContext8: 211374 6763968 java.util.HashMap$Entry9: 102551 5742856 org.codehaus.jackson.util.TextBuffer10: 105854 5080992 java.nio.HeapByteBuffer11: 105821 5079408 java.nio.HeapCharBuffer12: 104578 5019744 java.util.HashMap13: 102551 4922448 org.codehaus.jackson.io.IOContext14: 101782 4885536 org.codehaus.jackson.map.DeserializationConfig15: 101783 4071320 org.codehaus.jackson.sym.CharsToNameCanonicalizer16: 101779 4071160 org.codehaus.jackson.map.deser.StdDeserializationContext17: 101779 4071160 java.io.StringReader18: 101754 4070160 java.util.HashMap$KeyIterator
It looks like Jackson eats up a lot of memory. Our mapper reads in files of the avro format. Does avro use Jackson a lot in reading the avro files? Is there any way to improve this? Thanks.
Ey-Chih Chow
From: scott@richrelevance.com
To: user@avro.apache.org
Date: Tue, 31 May 2011 18:26:23 -0700
Subject: Re: avro object reuse
All of those instances are short-lived. If you are running out of memory, its not likely due to object reuse. This tends to cause more CPU time in the garbage collector, but not out of memory conditions. This can be hard to do on a cluster, but grabbing 'jmap –histo' output from a JVM that has a larger-than-expected JVM heap usage can often be used to quickly identify the cause of memory consumption issues.
I'm not sure if AvroUtf8InputFormat can safely re-use its instances of Utf8 or not.
On 5/31/11 5:40 PM, "ey-chih chow" <ey...@hotmail.com> wrote:
I actually looked into Avro code to find out how Avro does object reuse. I looked at AvroUtf8InputFormat and got the following question. Why a new Utf8 object has to be created each time the method next(AvroWrapper<Utf8> key, NullWritable value) is called ? Will this eat up too much memory when we call next(key, value) many times? Since Utf8 is mutable, can we just create one Utf8 object for all the calls to next(key, value)? Will this save memory? Thanks.
Ey-Chih Chow
From: eychih@hotmail.com
To: user@avro.apache.org
Subject: avro object reuse
Date: Tue, 31 May 2011 10:38:39 -0700
Hi,
We have several mapreduce jobs using avro. They take too much memory when running on production. Can anybody suggest some object reuse techniques to cut down memory usage? Thanks.
Ey-Chih Chow
Re: avro object reuse
Posted by Scott Carey <sc...@richrelevance.com>.
The most likely candidate for creating many instances of BufferAccessor and ByteArrayByteSource is BinaryData.compare() and BinaryData.hashCode(). Each call will create one of each (hash) or two of each (compare). These are only 32 bytes per instance and quickly become garbage that is easily cleaned up by the GC.
The below have only 32 bytes each and 8MB total.
On the other hand, the byte[]'s appear to be about 24K each on average and are using 100MB. Is this the size of your configured MapReduce sort MB?
On 6/9/11 3:08 PM, "ey-chih chow" <ey...@hotmail.com>> wrote:
We did more monitoring. At one instance, we got the following histogram via Jmap. The question is why there are so many instances of BinaryDecoder$BufferAccessor and BinaryDecoder$ByteArrayByteSource. How to avoid this? Thanks.
Object Histogram:
num #instances #bytes Class description
--------------------------------------------------------------------------
1: 4199 100241168 byte[]
2: 272948 8734336 org.apache.avro.io.BinaryDecoder$BufferAccessor
3: 272945 8734240 org.apache.avro.io.BinaryDecoder$ByteArrayByteSource
4: 2093 5387976 int[]
5: 23762 2822864 * ConstMethodKlass
6: 23762 1904760 * MethodKlass
7: 39295 1688992 * SymbolKlass
8: 2127 1216976 * ConstantPoolKlass
9: 2127 882760 * InstanceKlassKlass
10: 1847 742936 * ConstantPoolCacheKlass
11: 9602 715608 char[]
12: 1072 299584 * MethodDataKlass
13: 9698 232752 java.lang.String
14: 2317 222432 java.lang.Class
15: 3288 204440 short[]
16: 3167 156664 * System ObjArray
17: 2401 57624 java.util.HashMap$Entry
18: 666 53280 java.lang.reflect.Method
19: 161 52808 * ObjArrayKlassKlass
20: 1808 43392 java.util.Hashtable$Entry
________________________________
From: eychih@hotmail.com<ma...@hotmail.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Subject: RE: avro object reuse
Date: Wed, 1 Jun 2011 15:14:03 -0700
We use a lot of toString() call on the avro Utf8 object. Will this cause Jackson call? Thanks.
Ey-Chih
________________________________
From: scott@richrelevance.com<ma...@richrelevance.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Date: Wed, 1 Jun 2011 13:38:39 -0700
Subject: Re: avro object reuse
This is great info.
Jackson should only be used once when the file is opened, so this is confusing from that point of view.
Is something else using Jackson or initializing an Avro JsonDecoder frequently? There are over 100000 Jackson DeserializationConfig objects.
Another place that parses the schema is in AvroSerialization.java. Does the Hadoop getDeserializer() API method get called once per job, or per record? If this is called more than once per map job, it might explain this.
In principle, Jackson is only used by a mapper during initialization. The below indicates that this may not be the case or that something outside of Avro is causing a lot of Jackson JSON parsing.
Are you using something that is converting the Avro data to Json form? toString() on most Avro datum objects will do a lot of work with Jackson, for example — but the below are deserializer objects not serializer objects so that is not likely the issue.
On 6/1/11 11:34 AM, "ey-chih chow" <ey...@hotmail.com>> wrote:
We ran jmap on one of our mapper and found the top usage as follows:
num #instances #bytes Class description
--------------------------------------------------------------------------
1: 24405 291733256 byte[]
2: 6056 40228984 int[]
3: 388799 19966776 char[]
4: 101779 16284640 org.codehaus.jackson.impl.ReaderBasedParser
5: 369623 11827936 java.lang.String
6: 111059 8769424 java.util.HashMap$Entry[]
7: 204083 8163320 org.codehaus.jackson.impl.JsonReadContext
8: 211374 6763968 java.util.HashMap$Entry
9: 102551 5742856 org.codehaus.jackson.util.TextBuffer
10: 105854 5080992 java.nio.HeapByteBuffer
11: 105821 5079408 java.nio.HeapCharBuffer
12: 104578 5019744 java.util.HashMap
13: 102551 4922448 org.codehaus.jackson.io.IOContext
14: 101782 4885536 org.codehaus.jackson.map.DeserializationConfig
15: 101783 4071320 org.codehaus.jackson.sym.CharsToNameCanonicalizer
16: 101779 4071160 org.codehaus.jackson.map.deser.StdDeserializationContext
17: 101779 4071160 java.io.StringReader
18: 101754 4070160 java.util.HashMap$KeyIterator
It looks like Jackson eats up a lot of memory. Our mapper reads in files of the avro format. Does avro use Jackson a lot in reading the avro files? Is there any way to improve this? Thanks.
Ey-Chih Chow
________________________________
From: scott@richrelevance.com<ma...@richrelevance.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Date: Tue, 31 May 2011 18:26:23 -0700
Subject: Re: avro object reuse
All of those instances are short-lived. If you are running out of memory, its not likely due to object reuse. This tends to cause more CPU time in the garbage collector, but not out of memory conditions. This can be hard to do on a cluster, but grabbing 'jmap –histo' output from a JVM that has a larger-than-expected JVM heap usage can often be used to quickly identify the cause of memory consumption issues.
I'm not sure if AvroUtf8InputFormat can safely re-use its instances of Utf8 or not.
On 5/31/11 5:40 PM, "ey-chih chow" <ey...@hotmail.com>> wrote:
I actually looked into Avro code to find out how Avro does object reuse. I looked at AvroUtf8InputFormat and got the following question. Why a new Utf8 object has to be created each time the method next(AvroWrapper<Utf8> key, NullWritable value) is called ? Will this eat up too much memory when we call next(key, value) many times? Since Utf8 is mutable, can we just create one Utf8 object for all the calls to next(key, value)? Will this save memory? Thanks.
Ey-Chih Chow
________________________________
From: eychih@hotmail.com<ma...@hotmail.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Subject: avro object reuse
Date: Tue, 31 May 2011 10:38:39 -0700
Hi,
We have several mapreduce jobs using avro. They take too much memory when running on production. Can anybody suggest some object reuse techniques to cut down memory usage? Thanks.
Ey-Chih Chow
RE: avro object reuse
Posted by ey-chih chow <ey...@hotmail.com>.
We did more monitoring. At one instance, we got the following histogram via Jmap. The question is why there are so many instances of BinaryDecoder$BufferAccessor and BinaryDecoder$ByteArrayByteSource. How to avoid this? Thanks.
Object Histogram:
num #instances #bytes Class description
--------------------------------------------------------------------------
1: 4199 100241168 byte[]
2: 272948 8734336 org.apache.avro.io.BinaryDecoder$BufferAccessor
3: 272945 8734240 org.apache.avro.io.BinaryDecoder$ByteArrayByteSource
4: 2093 5387976 int[]
5: 23762 2822864 * ConstMethodKlass
6: 23762 1904760 * MethodKlass
7: 39295 1688992 * SymbolKlass
8: 2127 1216976 * ConstantPoolKlass
9: 2127 882760 * InstanceKlassKlass
10: 1847 742936 * ConstantPoolCacheKlass
11: 9602 715608 char[]
12: 1072 299584 * MethodDataKlass
13: 9698 232752 java.lang.String
14: 2317 222432 java.lang.Class
15: 3288 204440 short[]
16: 3167 156664 * System ObjArray
17: 2401 57624 java.util.HashMap$Entry
18: 666 53280 java.lang.reflect.Method
19: 161 52808 * ObjArrayKlassKlass
20: 1808 43392 java.util.Hashtable$Entry
From: eychih@hotmail.com
To: user@avro.apache.org
Subject: RE: avro object reuse
Date: Wed, 1 Jun 2011 15:14:03 -0700
We use a lot of toString() call on the avro Utf8 object. Will this cause Jackson call? Thanks.
Ey-Chih
From: scott@richrelevance.com
To: user@avro.apache.org
Date: Wed, 1 Jun 2011 13:38:39 -0700
Subject: Re: avro object reuse
This is great info.
Jackson should only be used once when the file is opened, so this is confusing from that point of view. Is something else using Jackson or initializing an Avro JsonDecoder frequently? There are over 100000 Jackson DeserializationConfig objects.
Another place that parses the schema is in AvroSerialization.java. Does the Hadoop getDeserializer() API method get called once per job, or per record? If this is called more than once per map job, it might explain this.
In principle, Jackson is only used by a mapper during initialization. The below indicates that this may not be the case or that something outside of Avro is causing a lot of Jackson JSON parsing.
Are you using something that is converting the Avro data to Json form? toString() on most Avro datum objects will do a lot of work with Jackson, for example — but the below are deserializer objects not serializer objects so that is not likely the issue.
On 6/1/11 11:34 AM, "ey-chih chow" <ey...@hotmail.com> wrote:
We ran jmap on one of our mapper and found the top usage as follows:
num #instances #bytes Class description--------------------------------------------------------------------------1: 24405 291733256 byte[]2: 6056 40228984 int[]3: 388799 19966776 char[]4: 101779 16284640 org.codehaus.jackson.impl.ReaderBasedParser5: 369623 11827936 java.lang.String6: 111059 8769424 java.util.HashMap$Entry[]7: 204083 8163320 org.codehaus.jackson.impl.JsonReadContext8: 211374 6763968 java.util.HashMap$Entry9: 102551 5742856 org.codehaus.jackson.util.TextBuffer10: 105854 5080992 java.nio.HeapByteBuffer11: 105821 5079408 java.nio.HeapCharBuffer12: 104578 5019744 java.util.HashMap13: 102551 4922448 org.codehaus.jackson.io.IOContext14: 101782 4885536 org.codehaus.jackson.map.DeserializationConfig15: 101783 4071320 org.codehaus.jackson.sym.CharsToNameCanonicalizer16: 101779 4071160 org.codehaus.jackson.map.deser.StdDeserializationContext17: 101779 4071160 java.io.StringReader18: 101754 4070160 java.util.HashMap$KeyIterator
It looks like Jackson eats up a lot of memory. Our mapper reads in files of the avro format. Does avro use Jackson a lot in reading the avro files? Is there any way to improve this? Thanks.
Ey-Chih Chow
From: scott@richrelevance.com
To: user@avro.apache.org
Date: Tue, 31 May 2011 18:26:23 -0700
Subject: Re: avro object reuse
All of those instances are short-lived. If you are running out of memory, its not likely due to object reuse. This tends to cause more CPU time in the garbage collector, but not out of memory conditions. This can be hard to do on a cluster, but grabbing 'jmap –histo' output from a JVM that has a larger-than-expected JVM heap usage can often be used to quickly identify the cause of memory consumption issues.
I'm not sure if AvroUtf8InputFormat can safely re-use its instances of Utf8 or not.
On 5/31/11 5:40 PM, "ey-chih chow" <ey...@hotmail.com> wrote:
I actually looked into Avro code to find out how Avro does object reuse. I looked at AvroUtf8InputFormat and got the following question. Why a new Utf8 object has to be created each time the method next(AvroWrapper<Utf8> key, NullWritable value) is called ? Will this eat up too much memory when we call next(key, value) many times? Since Utf8 is mutable, can we just create one Utf8 object for all the calls to next(key, value)? Will this save memory? Thanks.
Ey-Chih Chow
From: eychih@hotmail.com
To: user@avro.apache.org
Subject: avro object reuse
Date: Tue, 31 May 2011 10:38:39 -0700
Hi,
We have several mapreduce jobs using avro. They take too much memory when running on production. Can anybody suggest some object reuse techniques to cut down memory usage? Thanks.
Ey-Chih Chow
Re: avro object reuse
Posted by Scott Carey <sc...@richrelevance.com>.
no, and even GenericData.Record simply writes using a StringBuilder; I doubt this is the culprit.
On 6/1/11 3:14 PM, "ey-chih chow" <ey...@hotmail.com>> wrote:
We use a lot of toString() call on the avro Utf8 object. Will this cause Jackson call? Thanks.
Ey-Chih
________________________________
From: scott@richrelevance.com<ma...@richrelevance.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Date: Wed, 1 Jun 2011 13:38:39 -0700
Subject: Re: avro object reuse
This is great info.
Jackson should only be used once when the file is opened, so this is confusing from that point of view.
Is something else using Jackson or initializing an Avro JsonDecoder frequently? There are over 100000 Jackson DeserializationConfig objects.
Another place that parses the schema is in AvroSerialization.java. Does the Hadoop getDeserializer() API method get called once per job, or per record? If this is called more than once per map job, it might explain this.
In principle, Jackson is only used by a mapper during initialization. The below indicates that this may not be the case or that something outside of Avro is causing a lot of Jackson JSON parsing.
Are you using something that is converting the Avro data to Json form? toString() on most Avro datum objects will do a lot of work with Jackson, for example — but the below are deserializer objects not serializer objects so that is not likely the issue.
On 6/1/11 11:34 AM, "ey-chih chow" <ey...@hotmail.com>> wrote:
We ran jmap on one of our mapper and found the top usage as follows:
num #instances #bytes Class description
--------------------------------------------------------------------------
1: 24405 291733256 byte[]
2: 6056 40228984 int[]
3: 388799 19966776 char[]
4: 101779 16284640 org.codehaus.jackson.impl.ReaderBasedParser
5: 369623 11827936 java.lang.String
6: 111059 8769424 java.util.HashMap$Entry[]
7: 204083 8163320 org.codehaus.jackson.impl.JsonReadContext
8: 211374 6763968 java.util.HashMap$Entry
9: 102551 5742856 org.codehaus.jackson.util.TextBuffer
10: 105854 5080992 java.nio.HeapByteBuffer
11: 105821 5079408 java.nio.HeapCharBuffer
12: 104578 5019744 java.util.HashMap
13: 102551 4922448 org.codehaus.jackson.io.IOContext
14: 101782 4885536 org.codehaus.jackson.map.DeserializationConfig
15: 101783 4071320 org.codehaus.jackson.sym.CharsToNameCanonicalizer
16: 101779 4071160 org.codehaus.jackson.map.deser.StdDeserializationContext
17: 101779 4071160 java.io.StringReader
18: 101754 4070160 java.util.HashMap$KeyIterator
It looks like Jackson eats up a lot of memory. Our mapper reads in files of the avro format. Does avro use Jackson a lot in reading the avro files? Is there any way to improve this? Thanks.
Ey-Chih Chow
________________________________
From: scott@richrelevance.com<ma...@richrelevance.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Date: Tue, 31 May 2011 18:26:23 -0700
Subject: Re: avro object reuse
All of those instances are short-lived. If you are running out of memory, its not likely due to object reuse. This tends to cause more CPU time in the garbage collector, but not out of memory conditions. This can be hard to do on a cluster, but grabbing 'jmap –histo' output from a JVM that has a larger-than-expected JVM heap usage can often be used to quickly identify the cause of memory consumption issues.
I'm not sure if AvroUtf8InputFormat can safely re-use its instances of Utf8 or not.
On 5/31/11 5:40 PM, "ey-chih chow" <ey...@hotmail.com>> wrote:
I actually looked into Avro code to find out how Avro does object reuse. I looked at AvroUtf8InputFormat and got the following question. Why a new Utf8 object has to be created each time the method next(AvroWrapper<Utf8> key, NullWritable value) is called ? Will this eat up too much memory when we call next(key, value) many times? Since Utf8 is mutable, can we just create one Utf8 object for all the calls to next(key, value)? Will this save memory? Thanks.
Ey-Chih Chow
________________________________
From: eychih@hotmail.com<ma...@hotmail.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Subject: avro object reuse
Date: Tue, 31 May 2011 10:38:39 -0700
Hi,
We have several mapreduce jobs using avro. They take too much memory when running on production. Can anybody suggest some object reuse techniques to cut down memory usage? Thanks.
Ey-Chih Chow
RE: avro object reuse
Posted by ey-chih chow <ey...@hotmail.com>.
We use a lot of toString() call on the avro Utf8 object. Will this cause Jackson call? Thanks.
Ey-Chih
From: scott@richrelevance.com
To: user@avro.apache.org
Date: Wed, 1 Jun 2011 13:38:39 -0700
Subject: Re: avro object reuse
This is great info.
Jackson should only be used once when the file is opened, so this is confusing from that point of view. Is something else using Jackson or initializing an Avro JsonDecoder frequently? There are over 100000 Jackson DeserializationConfig objects.
Another place that parses the schema is in AvroSerialization.java. Does the Hadoop getDeserializer() API method get called once per job, or per record? If this is called more than once per map job, it might explain this.
In principle, Jackson is only used by a mapper during initialization. The below indicates that this may not be the case or that something outside of Avro is causing a lot of Jackson JSON parsing.
Are you using something that is converting the Avro data to Json form? toString() on most Avro datum objects will do a lot of work with Jackson, for example — but the below are deserializer objects not serializer objects so that is not likely the issue.
On 6/1/11 11:34 AM, "ey-chih chow" <ey...@hotmail.com> wrote:
We ran jmap on one of our mapper and found the top usage as follows:
num #instances #bytes Class description--------------------------------------------------------------------------1: 24405 291733256 byte[]2: 6056 40228984 int[]3: 388799 19966776 char[]4: 101779 16284640 org.codehaus.jackson.impl.ReaderBasedParser5: 369623 11827936 java.lang.String6: 111059 8769424 java.util.HashMap$Entry[]7: 204083 8163320 org.codehaus.jackson.impl.JsonReadContext8: 211374 6763968 java.util.HashMap$Entry9: 102551 5742856 org.codehaus.jackson.util.TextBuffer10: 105854 5080992 java.nio.HeapByteBuffer11: 105821 5079408 java.nio.HeapCharBuffer12: 104578 5019744 java.util.HashMap13: 102551 4922448 org.codehaus.jackson.io.IOContext14: 101782 4885536 org.codehaus.jackson.map.DeserializationConfig15: 101783 4071320 org.codehaus.jackson.sym.CharsToNameCanonicalizer16: 101779 4071160 org.codehaus.jackson.map.deser.StdDeserializationContext17: 101779 4071160 java.io.StringReader18: 101754 4070160 java.util.HashMap$KeyIterator
It looks like Jackson eats up a lot of memory. Our mapper reads in files of the avro format. Does avro use Jackson a lot in reading the avro files? Is there any way to improve this? Thanks.
Ey-Chih Chow
From: scott@richrelevance.com
To: user@avro.apache.org
Date: Tue, 31 May 2011 18:26:23 -0700
Subject: Re: avro object reuse
All of those instances are short-lived. If you are running out of memory, its not likely due to object reuse. This tends to cause more CPU time in the garbage collector, but not out of memory conditions. This can be hard to do on a cluster, but grabbing 'jmap –histo' output from a JVM that has a larger-than-expected JVM heap usage can often be used to quickly identify the cause of memory consumption issues.
I'm not sure if AvroUtf8InputFormat can safely re-use its instances of Utf8 or not.
On 5/31/11 5:40 PM, "ey-chih chow" <ey...@hotmail.com> wrote:
I actually looked into Avro code to find out how Avro does object reuse. I looked at AvroUtf8InputFormat and got the following question. Why a new Utf8 object has to be created each time the method next(AvroWrapper<Utf8> key, NullWritable value) is called ? Will this eat up too much memory when we call next(key, value) many times? Since Utf8 is mutable, can we just create one Utf8 object for all the calls to next(key, value)? Will this save memory? Thanks.
Ey-Chih Chow
From: eychih@hotmail.com
To: user@avro.apache.org
Subject: avro object reuse
Date: Tue, 31 May 2011 10:38:39 -0700
Hi,
We have several mapreduce jobs using avro. They take too much memory when running on production. Can anybody suggest some object reuse techniques to cut down memory usage? Thanks.
Ey-Chih Chow
Re: avro object reuse
Posted by Scott Carey <sc...@richrelevance.com>.
This is great info.
Jackson should only be used once when the file is opened, so this is confusing from that point of view.
Is something else using Jackson or initializing an Avro JsonDecoder frequently? There are over 100000 Jackson DeserializationConfig objects.
Another place that parses the schema is in AvroSerialization.java. Does the Hadoop getDeserializer() API method get called once per job, or per record? If this is called more than once per map job, it might explain this.
In principle, Jackson is only used by a mapper during initialization. The below indicates that this may not be the case or that something outside of Avro is causing a lot of Jackson JSON parsing.
Are you using something that is converting the Avro data to Json form? toString() on most Avro datum objects will do a lot of work with Jackson, for example — but the below are deserializer objects not serializer objects so that is not likely the issue.
On 6/1/11 11:34 AM, "ey-chih chow" <ey...@hotmail.com>> wrote:
We ran jmap on one of our mapper and found the top usage as follows:
num #instances #bytes Class description
--------------------------------------------------------------------------
1: 24405 291733256 byte[]
2: 6056 40228984 int[]
3: 388799 19966776 char[]
4: 101779 16284640 org.codehaus.jackson.impl.ReaderBasedParser
5: 369623 11827936 java.lang.String
6: 111059 8769424 java.util.HashMap$Entry[]
7: 204083 8163320 org.codehaus.jackson.impl.JsonReadContext
8: 211374 6763968 java.util.HashMap$Entry
9: 102551 5742856 org.codehaus.jackson.util.TextBuffer
10: 105854 5080992 java.nio.HeapByteBuffer
11: 105821 5079408 java.nio.HeapCharBuffer
12: 104578 5019744 java.util.HashMap
13: 102551 4922448 org.codehaus.jackson.io.IOContext
14: 101782 4885536 org.codehaus.jackson.map.DeserializationConfig
15: 101783 4071320 org.codehaus.jackson.sym.CharsToNameCanonicalizer
16: 101779 4071160 org.codehaus.jackson.map.deser.StdDeserializationContext
17: 101779 4071160 java.io.StringReader
18: 101754 4070160 java.util.HashMap$KeyIterator
It looks like Jackson eats up a lot of memory. Our mapper reads in files of the avro format. Does avro use Jackson a lot in reading the avro files? Is there any way to improve this? Thanks.
Ey-Chih Chow
________________________________
From: scott@richrelevance.com<ma...@richrelevance.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Date: Tue, 31 May 2011 18:26:23 -0700
Subject: Re: avro object reuse
All of those instances are short-lived. If you are running out of memory, its not likely due to object reuse. This tends to cause more CPU time in the garbage collector, but not out of memory conditions. This can be hard to do on a cluster, but grabbing 'jmap –histo' output from a JVM that has a larger-than-expected JVM heap usage can often be used to quickly identify the cause of memory consumption issues.
I'm not sure if AvroUtf8InputFormat can safely re-use its instances of Utf8 or not.
On 5/31/11 5:40 PM, "ey-chih chow" <ey...@hotmail.com>> wrote:
I actually looked into Avro code to find out how Avro does object reuse. I looked at AvroUtf8InputFormat and got the following question. Why a new Utf8 object has to be created each time the method next(AvroWrapper<Utf8> key, NullWritable value) is called ? Will this eat up too much memory when we call next(key, value) many times? Since Utf8 is mutable, can we just create one Utf8 object for all the calls to next(key, value)? Will this save memory? Thanks.
Ey-Chih Chow
________________________________
From: eychih@hotmail.com<ma...@hotmail.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Subject: avro object reuse
Date: Tue, 31 May 2011 10:38:39 -0700
Hi,
We have several mapreduce jobs using avro. They take too much memory when running on production. Can anybody suggest some object reuse techniques to cut down memory usage? Thanks.
Ey-Chih Chow
Re: avro object reuse
Posted by Scott Carey <sc...@richrelevance.com>.
Lower down this list of object counts, what are the top org.apache.avro.** object counts?
How many AvroSerialization objects? How many AvroMapper, AvroWrapper, etc?
What about org.apache.hadoop.** objects?
On 6/1/11 11:34 AM, "ey-chih chow" <ey...@hotmail.com>> wrote:
We ran jmap on one of our mapper and found the top usage as follows:
num #instances #bytes Class description
--------------------------------------------------------------------------
1: 24405 291733256 byte[]
2: 6056 40228984 int[]
3: 388799 19966776 char[]
4: 101779 16284640 org.codehaus.jackson.impl.ReaderBasedParser
5: 369623 11827936 java.lang.String
6: 111059 8769424 java.util.HashMap$Entry[]
7: 204083 8163320 org.codehaus.jackson.impl.JsonReadContext
8: 211374 6763968 java.util.HashMap$Entry
9: 102551 5742856 org.codehaus.jackson.util.TextBuffer
10: 105854 5080992 java.nio.HeapByteBuffer
11: 105821 5079408 java.nio.HeapCharBuffer
12: 104578 5019744 java.util.HashMap
13: 102551 4922448 org.codehaus.jackson.io.IOContext
14: 101782 4885536 org.codehaus.jackson.map.DeserializationConfig
15: 101783 4071320 org.codehaus.jackson.sym.CharsToNameCanonicalizer
16: 101779 4071160 org.codehaus.jackson.map.deser.StdDeserializationContext
17: 101779 4071160 java.io.StringReader
18: 101754 4070160 java.util.HashMap$KeyIterator
It looks like Jackson eats up a lot of memory. Our mapper reads in files of the avro format. Does avro use Jackson a lot in reading the avro files? Is there any way to improve this? Thanks.
Ey-Chih Chow
________________________________
From: scott@richrelevance.com<ma...@richrelevance.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Date: Tue, 31 May 2011 18:26:23 -0700
Subject: Re: avro object reuse
All of those instances are short-lived. If you are running out of memory, its not likely due to object reuse. This tends to cause more CPU time in the garbage collector, but not out of memory conditions. This can be hard to do on a cluster, but grabbing 'jmap –histo' output from a JVM that has a larger-than-expected JVM heap usage can often be used to quickly identify the cause of memory consumption issues.
I'm not sure if AvroUtf8InputFormat can safely re-use its instances of Utf8 or not.
On 5/31/11 5:40 PM, "ey-chih chow" <ey...@hotmail.com>> wrote:
I actually looked into Avro code to find out how Avro does object reuse. I looked at AvroUtf8InputFormat and got the following question. Why a new Utf8 object has to be created each time the method next(AvroWrapper<Utf8> key, NullWritable value) is called ? Will this eat up too much memory when we call next(key, value) many times? Since Utf8 is mutable, can we just create one Utf8 object for all the calls to next(key, value)? Will this save memory? Thanks.
Ey-Chih Chow
________________________________
From: eychih@hotmail.com<ma...@hotmail.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Subject: avro object reuse
Date: Tue, 31 May 2011 10:38:39 -0700
Hi,
We have several mapreduce jobs using avro. They take too much memory when running on production. Can anybody suggest some object reuse techniques to cut down memory usage? Thanks.
Ey-Chih Chow
RE: avro object reuse
Posted by ey-chih chow <ey...@hotmail.com>.
We ran jmap on one of our mapper and found the top usage as follows:
num #instances #bytes Class description--------------------------------------------------------------------------1: 24405 291733256 byte[]2: 6056 40228984 int[]3: 388799 19966776 char[]4: 101779 16284640 org.codehaus.jackson.impl.ReaderBasedParser5: 369623 11827936 java.lang.String6: 111059 8769424 java.util.HashMap$Entry[]7: 204083 8163320 org.codehaus.jackson.impl.JsonReadContext8: 211374 6763968 java.util.HashMap$Entry9: 102551 5742856 org.codehaus.jackson.util.TextBuffer10: 105854 5080992 java.nio.HeapByteBuffer11: 105821 5079408 java.nio.HeapCharBuffer12: 104578 5019744 java.util.HashMap13: 102551 4922448 org.codehaus.jackson.io.IOContext14: 101782 4885536 org.codehaus.jackson.map.DeserializationConfig15: 101783 4071320 org.codehaus.jackson.sym.CharsToNameCanonicalizer16: 101779 4071160 org.codehaus.jackson.map.deser.StdDeserializationContext17: 101779 4071160 java.io.StringReader18: 101754 4070160 java.util.HashMap$KeyIterator
It looks like Jackson eats up a lot of memory. Our mapper reads in files of the avro format. Does avro use Jackson a lot in reading the avro files? Is there any way to improve this? Thanks.
Ey-Chih Chow
From: scott@richrelevance.com
To: user@avro.apache.org
Date: Tue, 31 May 2011 18:26:23 -0700
Subject: Re: avro object reuse
All of those instances are short-lived. If you are running out of memory, its not likely due to object reuse. This tends to cause more CPU time in the garbage collector, but not out of memory conditions. This can be hard to do on a cluster, but grabbing 'jmap –histo' output from a JVM that has a larger-than-expected JVM heap usage can often be used to quickly identify the cause of memory consumption issues.
I'm not sure if AvroUtf8InputFormat can safely re-use its instances of Utf8 or not.
On 5/31/11 5:40 PM, "ey-chih chow" <ey...@hotmail.com> wrote:
I actually looked into Avro code to find out how Avro does object reuse. I looked at AvroUtf8InputFormat and got the following question. Why a new Utf8 object has to be created each time the method next(AvroWrapper<Utf8> key, NullWritable value) is called ? Will this eat up too much memory when we call next(key, value) many times? Since Utf8 is mutable, can we just create one Utf8 object for all the calls to next(key, value)? Will this save memory? Thanks.
Ey-Chih Chow
From: eychih@hotmail.com
To: user@avro.apache.org
Subject: avro object reuse
Date: Tue, 31 May 2011 10:38:39 -0700
Hi,
We have several mapreduce jobs using avro. They take too much memory when running on production. Can anybody suggest some object reuse techniques to cut down memory usage? Thanks.
Ey-Chih Chow
Re: avro object reuse
Posted by Scott Carey <sc...@richrelevance.com>.
All of those instances are short-lived. If you are running out of memory, its not likely due to object reuse. This tends to cause more CPU time in the garbage collector, but not out of memory conditions. This can be hard to do on a cluster, but grabbing 'jmap –histo' output from a JVM that has a larger-than-expected JVM heap usage can often be used to quickly identify the cause of memory consumption issues.
I'm not sure if AvroUtf8InputFormat can safely re-use its instances of Utf8 or not.
On 5/31/11 5:40 PM, "ey-chih chow" <ey...@hotmail.com>> wrote:
I actually looked into Avro code to find out how Avro does object reuse. I looked at AvroUtf8InputFormat and got the following question. Why a new Utf8 object has to be created each time the method next(AvroWrapper<Utf8> key, NullWritable value) is called ? Will this eat up too much memory when we call next(key, value) many times? Since Utf8 is mutable, can we just create one Utf8 object for all the calls to next(key, value)? Will this save memory? Thanks.
Ey-Chih Chow
________________________________
From: eychih@hotmail.com<ma...@hotmail.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Subject: avro object reuse
Date: Tue, 31 May 2011 10:38:39 -0700
Hi,
We have several mapreduce jobs using avro. They take too much memory when running on production. Can anybody suggest some object reuse techniques to cut down memory usage? Thanks.
Ey-Chih Chow
RE: avro object reuse
Posted by ey-chih chow <ey...@hotmail.com>.
I actually looked into Avro code to find out how Avro does object reuse. I looked at AvroUtf8InputFormat and got the following question. Why a new Utf8 object has to be created each time the method next(AvroWrapper<Utf8> key, NullWritable value) is called ? Will this eat up too much memory when we call next(key, value) many times? Since Utf8 is mutable, can we just create one Utf8 object for all the calls to next(key, value)? Will this save memory? Thanks.
Ey-Chih Chow
From: eychih@hotmail.com
To: user@avro.apache.org
Subject: avro object reuse
Date: Tue, 31 May 2011 10:38:39 -0700
Hi,
We have several mapreduce jobs using avro. They take too much memory when running on production. Can anybody suggest some object reuse techniques to cut down memory usage? Thanks.
Ey-Chih Chow