You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@avro.apache.org by ey-chih chow <ey...@hotmail.com> on 2011/05/31 19:38:39 UTC

avro object reuse

Hi, 
We have several mapreduce jobs using avro.  They take too much memory when running on production.  Can anybody suggest some object reuse techniques to cut down memory usage?  Thanks.
Ey-Chih Chow

Re: avro object reuse

Posted by Scott Carey <sc...@richrelevance.com>.

No, that should not trigger Jackson parsing.   Schema.parse() and Protocol.parse() do.



On 6/2/11 10:23 AM, "ey-chih chow" <ey...@hotmail.com>> wrote:

We create GenericData.Record a lot in our code via new GenericData.Record(schema).  Will this generates Jackson calls?  Thanks.

Ey-Chih Chow

> From: scott@richrelevance.com<ma...@richrelevance.com>
> To: user@avro.apache.org<ma...@avro.apache.org>
> Date: Wed, 1 Jun 2011 18:48:15 -0700
> Subject: Re: avro object reuse
>
> One thing we do right now that might be related is the following:
>
> We keep Avro default Schema values as JsonNode objects. While traversing
> the JSON Avro schema representation using ObjectMapper.readTree() we
> remember JsonNodes that are "default" properties on fields and keep them
> on the Schema object.
> If these keep references to the parent (and the whole JSON tree, or worse,
> the ObjectMapper and input stream) it would be poor use of Jackson by us;
> although we'd need a way to keep a detached JsonNode or equivalent.
>
> However, even if that is the case (which it does not seem to be -- the
> jmap output has no JsonNode instances), it doesn't explain why we would be
> calling ObjectMapper frequently. We only call
> ObjectMapper.readTree(JsonParser) when creating a Schema from JSON. We
> call JsonNode methods from extracted fragments for everything else.
>
>
> This brings me to the following suspicion based on the data:
> Somewhere, Schema objects are being created frequently via one of the
> Schema.parse() or Protocol.parse() static methods.
>
> On 6/1/11 5:48 PM, "Tatu Saloranta" <ts...@gmail.com>> wrote:
>
> >On Wed, Jun 1, 2011 at 5:45 PM, Scott Carey <sc...@richrelevance.com>>
> >wrote:
> >> It would be useful to get a 'jmap -histo:live' report as well, which
> >>will
> >> only have items that remain after a full GC.
> >>
> >> However, a high churn of short lived Jackson objects is not expected
> >>here
> >> unless the user is reading Json serialized files and not Avro binary.
> >> Avro Data Files only contain binary encoded Avro content.
> >>
> >> It would be surprising to see many Jackson objects here if reading Avro
> >> Data Files, because we expect to use Jackson to parse an Avro schema
> >>from
> >> json only once or twice per file. After the schema is parsed, Jackson
> >> shouldn't be used. A hundred thousand DeserializationConfig instances
> >> means that isn't the case.
> >
> >Right -- it indicates that something (else) is using Jackson; and
> >there will typically be one instance of DeserializationConfig for each
> >data-binding call (ObjectMapper.readValue()), as a read-only copy is
> >made for operation.
> >... or if something is reading schema that many times, that sounds
> >like a problem in itself.
> >
> >-+ Tatu +-
>

RE: avro object reuse

Posted by ey-chih chow <ey...@hotmail.com>.

We create GenericData.Record a lot in our code via new GenericData.Record(schema).  Will this generates Jackson calls?  Thanks.
Ey-Chih Chow

> From: scott@richrelevance.com
> To: user@avro.apache.org
> Date: Wed, 1 Jun 2011 18:48:15 -0700
> Subject: Re: avro object reuse
> 
> One thing we do right now that might be related is the following:
> 
> We keep Avro default Schema values as JsonNode objects. While traversing
> the JSON Avro schema representation using ObjectMapper.readTree() we
> remember JsonNodes that are "default" properties on fields and keep them
> on the Schema object.
> If these keep references to the parent (and the whole JSON tree, or worse,
> the ObjectMapper and input stream) it would be poor use of Jackson by us;
> although we'd need a way to keep a detached JsonNode or equivalent.
> 
> However, even if that is the case (which it does not seem to be -- the
> jmap output has no JsonNode instances), it doesn't explain why we would be
> calling ObjectMapper frequently.  We only call
> ObjectMapper.readTree(JsonParser) when creating a Schema from JSON.  We
> call JsonNode methods from extracted fragments for everything else.
> 
> 
> This brings me to the following suspicion based on the data:
> Somewhere, Schema objects are being created frequently via one of the
> Schema.parse() or Protocol.parse() static methods.
> 
> On 6/1/11 5:48 PM, "Tatu Saloranta" <ts...@gmail.com> wrote:
> 
> >On Wed, Jun 1, 2011 at 5:45 PM, Scott Carey <sc...@richrelevance.com>
> >wrote:
> >> It would be useful to get a 'jmap -histo:live' report as well, which
> >>will
> >> only have items that remain after a full GC.
> >>
> >> However, a high churn of short lived Jackson objects is not expected
> >>here
> >> unless the user is reading Json serialized files and not Avro binary.
> >> Avro Data Files only contain binary encoded Avro content.
> >>
> >> It would be surprising to see many Jackson objects here if reading Avro
> >> Data Files, because we expect to use Jackson to parse an Avro schema
> >>from
> >> json only once or twice per file.  After the schema is parsed, Jackson
> >> shouldn't be used.   A hundred thousand DeserializationConfig instances
> >> means that isn't the case.
> >
> >Right -- it indicates that something (else) is using Jackson; and
> >there will typically be one instance of DeserializationConfig for each
> >data-binding call (ObjectMapper.readValue()), as a read-only copy is
> >made for operation.
> >... or if something is reading schema that many times, that sounds
> >like a problem in itself.
> >
> >-+ Tatu +-
>

Re: avro object reuse

Posted by Scott Carey <sc...@richrelevance.com>.

One thing we do right now that might be related is the following:

We keep Avro default Schema values as JsonNode objects. While traversing
the JSON Avro schema representation using ObjectMapper.readTree() we
remember JsonNodes that are "default" properties on fields and keep them
on the Schema object.
If these keep references to the parent (and the whole JSON tree, or worse,
the ObjectMapper and input stream) it would be poor use of Jackson by us;
although we'd need a way to keep a detached JsonNode or equivalent.

However, even if that is the case (which it does not seem to be -- the
jmap output has no JsonNode instances), it doesn't explain why we would be
calling ObjectMapper frequently.  We only call
ObjectMapper.readTree(JsonParser) when creating a Schema from JSON.  We
call JsonNode methods from extracted fragments for everything else.

This brings me to the following suspicion based on the data:
Somewhere, Schema objects are being created frequently via one of the
Schema.parse() or Protocol.parse() static methods.

On 6/1/11 5:48 PM, "Tatu Saloranta" <ts...@gmail.com> wrote:

>On Wed, Jun 1, 2011 at 5:45 PM, Scott Carey <sc...@richrelevance.com>
>wrote:
>> It would be useful to get a 'jmap -histo:live' report as well, which
>>will
>> only have items that remain after a full GC.
>>
>> However, a high churn of short lived Jackson objects is not expected
>>here
>> unless the user is reading Json serialized files and not Avro binary.
>> Avro Data Files only contain binary encoded Avro content.
>>
>> It would be surprising to see many Jackson objects here if reading Avro
>> Data Files, because we expect to use Jackson to parse an Avro schema
>>from
>> json only once or twice per file.  After the schema is parsed, Jackson
>> shouldn't be used.   A hundred thousand DeserializationConfig instances
>> means that isn't the case.
>
>Right -- it indicates that something (else) is using Jackson; and
>there will typically be one instance of DeserializationConfig for each
>data-binding call (ObjectMapper.readValue()), as a read-only copy is
>made for operation.
>... or if something is reading schema that many times, that sounds
>like a problem in itself.
>
>-+ Tatu +-

Re: avro object reuse

Posted by Tatu Saloranta <ts...@gmail.com>.

On Wed, Jun 1, 2011 at 5:45 PM, Scott Carey <sc...@richrelevance.com> wrote:
> It would be useful to get a 'jmap -histo:live' report as well, which will
> only have items that remain after a full GC.
>
> However, a high churn of short lived Jackson objects is not expected here
> unless the user is reading Json serialized files and not Avro binary.
> Avro Data Files only contain binary encoded Avro content.
>
> It would be surprising to see many Jackson objects here if reading Avro
> Data Files, because we expect to use Jackson to parse an Avro schema from
> json only once or twice per file.  After the schema is parsed, Jackson
> shouldn't be used.   A hundred thousand DeserializationConfig instances
> means that isn't the case.

Right -- it indicates that something (else) is using Jackson; and
there will typically be one instance of DeserializationConfig for each
data-binding call (ObjectMapper.readValue()), as a read-only copy is
made for operation.
... or if something is reading schema that many times, that sounds
like a problem in itself.

-+ Tatu +-

Re: avro object reuse

Posted by Scott Carey <sc...@richrelevance.com>.

It would be useful to get a 'jmap -histo:live' report as well, which will
only have items that remain after a full GC.

However, a high churn of short lived Jackson objects is not expected here
unless the user is reading Json serialized files and not Avro binary.
Avro Data Files only contain binary encoded Avro content.

It would be surprising to see many Jackson objects here if reading Avro
Data Files, because we expect to use Jackson to parse an Avro schema from
json only once or twice per file.  After the schema is parsed, Jackson
shouldn't be used.   A hundred thousand DeserializationConfig instances
means that isn't the case.

On 6/1/11 5:13 PM, "Tatu Saloranta" <ts...@gmail.com> wrote:

>On Wed, Jun 1, 2011 at 1:45 PM, Scott Carey <sc...@richrelevance.com>
>wrote:
>> Lower down this list of object counts, what are the top
>>org.apache.avro.**
>> object counts?
>> How many AvroSerialization objects?  How many AvroMapper,  AvroWrapper,
>>etc?
>> What about org.apache.hadoop.** objects?
>
>Also: is this jmap view of live objects, or just dump of ALL objects,
>live and dead?
>It seems like dump of latter, as most Jackson objects are short-term
>things created for per-invocation purposes, and discarded after
>process is complete. High count is not necessarily surprising for
>high-throughput systems; it is only odd if these are actual live
>objects.
>
>-+ Tatu +-

Re: avro object reuse

Posted by Tatu Saloranta <ts...@gmail.com>.

On Wed, Jun 1, 2011 at 1:45 PM, Scott Carey <sc...@richrelevance.com> wrote:
> Lower down this list of object counts, what are the top org.apache.avro.**
> object counts?
> How many AvroSerialization objects?  How many AvroMapper,  AvroWrapper, etc?
> What about org.apache.hadoop.** objects?

Also: is this jmap view of live objects, or just dump of ALL objects,
live and dead?
It seems like dump of latter, as most Jackson objects are short-term
things created for per-invocation purposes, and discarded after
process is complete. High count is not necessarily surprising for
high-throughput systems; it is only odd if these are actual live
objects.

-+ Tatu +-

RE: avro object reuse

Posted by ey-chih chow <ey...@hotmail.com>.

What follows is the whole output of our jmap.  Hope this can help you identify the problem.

num 	  #instances	#bytes	Class description--------------------------------------------------------------------------1:		24405	291733256	byte[]2:		6056	40228984	int[]3:		388799	19966776	char[]4:		101779	16284640	org.codehaus.jackson.impl.ReaderBasedParser5:		369623	11827936	java.lang.String6:		111059	8769424	java.util.HashMap$Entry[]7:		204083	8163320	org.codehaus.jackson.impl.JsonReadContext8:		211374	6763968	java.util.HashMap$Entry9:		102551	5742856	org.codehaus.jackson.util.TextBuffer10:		105854	5080992	java.nio.HeapByteBuffer11:		105821	5079408	java.nio.HeapCharBuffer12:		104578	5019744	java.util.HashMap13:		102551	4922448	org.codehaus.jackson.io.IOContext14:		101782	4885536	org.codehaus.jackson.map.DeserializationConfig15:		101783	4071320	org.codehaus.jackson.sym.CharsToNameCanonicalizer16:		101779	4071160	org.codehaus.jackson.map.deser.StdDeserializationContext17:		101779	4071160	java.io.StringReader18:		101754	4070160	java.util.HashMap$KeyIterator19:		24001	3429704	* ConstMethodKlass20:		139087	3338088	java.lang.Long21:		115338	3215280	java.lang.Object[]22:		24001	2887768	* MethodKlass23:		2147	2414896	* ConstantPoolKlass24:		39532	2017320	* SymbolKlass25:		102735	1643760	java.util.HashMap$KeySet26:		2147	1596304	* InstanceKlassKlass27:		1865	1482184	* ConstantPoolCacheKlass28:		15780	1136160	com.sun.org.apache.xerces.internal.dom.DeferredElementNSImpl29:		27860	1114400	java.util.HashMap$EntryIterator30:		27585	1103400	com.sun.org.apache.xerces.internal.dom.DeferredTextImpl31:		1025	535536	* MethodDataKlass32:		5140	331816	short[]33:		5814	316424	java.lang.String[]34:		13135	315240	java.lang.StringBuilder35:		7723	247136	java.util.AbstractList$ListItr36:		1321	245632	org.apache.avro.io.parsing.Symbol[]37:		2332	242528	java.lang.Class38:		4712	226176	org.apache.avro.Schema$Props39:		6848	219136	java.util.AbstractList$Itr40:		12793	204688	java.lang.Integer41:		6033	193056	com.sun.org.apache.xerces.internal.xni.QName42:		4710	188400	java.util.LinkedHashMap$Entry43:		3190	171896	* System ObjArray44:		5228	167296	java.util.Hashtable$Entry45:		1789	114496	java.net.URL46:		777	100592	java.util.Hashtable$Entry[]47:		156	91104	* ObjArrayKlassKlass48:		3408	81792	java.util.ArrayList49:		450	64800	int[][]50:		90	64080	com.sun.org.apache.xerces.internal.util.SymbolTable$Entry[]51:		2513	60312	org.apache.avro.util.Utf852:		681	59928	java.lang.reflect.Method53:		1060	59360	java.util.LinkedHashMap54:		2160	51840	com.sun.org.apache.xerces.internal.util.XMLStringBuffer55:		1034	49632	org.apache.avro.Schema$Field56:		772	49408	org.codehaus.jackson.impl.WriterBasedGenerator57:		1980	47520	com.sun.org.apache.xerces.internal.xni.XMLString58:		775	43400	org.codehaus.jackson.map.ser.StdSerializerProvider59:		775	43400	org.codehaus.jackson.map.SerializationConfig60:		2596	41536	org.codehaus.jackson.node.TextNode61:		271	39128	java.lang.Object[][]62:		1564	37536	org.apache.avro.generic.GenericData$Record63:		900	36000	com.sun.org.apache.xerces.internal.xni.parser.XMLConfigurationException64:		360	34560	com.sun.org.apache.xerces.internal.xni.QName[]65:		720	34560	com.sun.org.apache.xerces.internal.util.XMLAttributesImpl$Attribute66:		1035	33120	java.util.LinkedHashMap$KeyIterator67:		2064	33024	java.util.HashMap$EntrySet68:		673	32304	java.util.Hashtable69:		90	30960	com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl70:		772	30880	org.codehaus.jackson.impl.ObjectWContext71:		949	30368	org.apache.avro.Schema$LockableArrayList72:		462	29568	java.util.regex.Matcher73:		1217	29208	java.lang.Double74:		900	28800	com.sun.org.apache.xerces.internal.util.AugmentationsImpl$SmallContainer75:		1077	25848	java.io.File76:		1035	24840	org.codehaus.jackson.node.ObjectNode77:		773	24736	org.codehaus.jackson.map.ser.ReadOnlyClassToSerializerMap78:		772	24704	org.codehaus.jackson.io.SegmentedStringWriter79:		772	24704	org.apache.avro.generic.GenericData$Array80:		772	24704	org.codehaus.jackson.impl.RootWContext81:		916	21984	org.apache.avro.Schema$ArraySchema82:		838	20112	org.apache.avro.Schema$StringSchema83:		620	19840	java.util.Vector84:		619	19808	org.apache.avro.io.parsing.Symbol$UnionAdjustAction85:		615	19680	com.sun.org.apache.xerces.internal.util.SymbolTable$Entry86:		180	18720	sun.net.www.protocol.file.FileURLConnection87:		776	18624	org.codehaus.jackson.map.ser.SerializerCache$UntypedKeyRaw88:		774	18576	org.apache.avro.Schema$UnionSchema89:		774	18576	org.codehaus.jackson.map.ser.SerializerCache$TypedKeyRaw90:		772	18528	org.apache.avro.mapred.Pair91:		772	18528	org.apache.avro.io.parsing.Symbol$Sequence92:		772	18528	org.apache.avro.Schema$SeenPair93:		770	18480	org.apache.avro.Schema$NullSchema94:		90	18000	com.sun.org.apache.xerces.internal.dom.DeferredDocumentImpl95:		544	17408	java.util.Stack96:		720	17280	com.sun.org.apache.xerces.internal.dom.DeferredDocumentImpl$RefCount97:		90	17280	com.sun.org.apache.xerces.internal.impl.XMLDTDScannerImpl98:		90	17280	com.sun.org.apache.xerces.internal.parsers.XIncludeAwareParserConfiguration99:		707	16968	org.codehaus.jackson.sym.CharsToNameCanonicalizer$Bucket100:		690	16560	org.codehaus.jackson.node.ArrayNode101:		754	16192	java.lang.Class[]102:		90	15840	com.sun.org.apache.xerces.internal.impl.dtd.XMLNSDTDValidator103:		90	15120	com.sun.org.apache.xerces.internal.xinclude.XIncludeHandler104:		605	14520	java.lang.StringBuffer105:		389	14472	boolean[]106:		450	14400	com.sun.org.apache.xerces.internal.util.XMLResourceIdentifierImpl107:		900	14400	com.sun.org.apache.xerces.internal.util.AugmentationsImpl108:		570	13680	java.net.URLClassLoader$2109:		184	13248	java.lang.reflect.Field110:		92	13248	org.codehaus.jackson.sym.CharsToNameCanonicalizer$Bucket[]111:		90	12960	com.sun.org.apache.xerces.internal.parsers.DOMParser112:		773	12368	org.codehaus.jackson.map.ser.SerializerCache$TypedKeyFull113:		171	12312	java.lang.reflect.Constructor114:		307	12280	java.lang.ref.SoftReference115:		293	11720	java.lang.ref.Finalizer116:		284	11360	java.util.concurrent.ConcurrentHashMap$Segment117:		90	10800	com.sun.org.apache.xerces.internal.impl.XMLEntityManager118:		131	10480	java.util.jar.JarFile$JarFileEntry119:		262	10480	org.apache.avro.util.WeakIdentityHashMap$IdentityWeakReference120:		161	10304	java.util.regex.Pattern121:		299	9568	org.apache.avro.io.parsing.Symbol$Alternative122:		232	9280	sun.misc.FloatingDecimal123:		288	9216	java.util.concurrent.locks.ReentrantLock$NonfairSync124:		90	8640	com.sun.org.apache.xerces.internal.impl.dtd.XMLDTDProcessor125:		90	8640	com.sun.xml.internal.stream.Entity$ScannedEntity126:		152	8512	java.util.regex.Pattern$GroupHead[]127:		288	8472	java.util.concurrent.ConcurrentHashMap$HashEntry[]128:		175	8400	org.apache.avro.Schema$RecordSchema129:		498	7968	java.util.HashSet130:		98	7840	java.net.URI131:		180	7200	com.sun.org.apache.xerces.internal.impl.dtd.XMLSimpleType132:		180	7200	com.sun.org.apache.xerces.internal.impl.dtd.XMLEntityDecl133:		295	7080	org.apache.avro.Schema$Name134:		93	5952	java.util.zip.ZipEntry135:		180	5760	com.sun.org.apache.xerces.internal.util.NamespaceSupport136:		180	5760	com.sun.org.apache.xerces.internal.impl.XMLEntityManager$CharacterBuffer[]137:		180	5760	com.sun.org.apache.xerces.internal.dom.NodeListCache138:		180	5760	com.sun.org.apache.xerces.internal.util.XMLAttributesImpl$Attribute[]139:		239	5736	org.apache.avro.io.parsing.Symbol$WriterUnionAction140:		80	5408	java.lang.reflect.Method[]141:		168	5376	java.lang.ref.WeakReference142:		90	5040	com.sun.org.apache.xerces.internal.impl.XMLEntityScanner143:		90	5040	org.apache.avro.Schema$Names144:		100	4800	org.apache.avro.io.DirectBinaryDecoder145:		117	4680	org.apache.hadoop.io.DataInputBuffer146:		8	4672	* TypeArrayKlassKlass147:		82	4592	java.lang.Package148:		187	4488	java.util.LinkedList$Entry149:		138	4416	java.lang.ThreadLocal$ThreadLocalMap$Entry150:		181	4344	com.sun.org.apache.xerces.internal.impl.Constants$ArrayEnumeration151:		180	4320	com.sun.org.apache.xerces.internal.impl.dv.SecuritySupport$3152:		180	4320	javax.xml.parsers.SecuritySupport$4153:		90	4320	com.sun.org.apache.xerces.internal.impl.XMLEntityManager$RewindableInputStream154:		90	4320	com.sun.org.apache.xerces.internal.util.URI155:		180	4320	com.sun.org.apache.xerces.internal.parsers.SecuritySupport$3156:		90	4320	com.sun.org.apache.xerces.internal.impl.io.UTF8Reader157:		180	4320	sun.net.www.MessageHeader158:		90	4320	com.sun.org.apache.xerces.internal.util.XMLAttributesIteratorImpl159:		90	4320	com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$ElementStack160:		90	4320	com.sun.org.apache.xerces.internal.xinclude.XIncludeNamespaceSupport161:		90	4320	com.sun.org.apache.xerces.internal.dom.DeferredProcessingInstructionImpl162:		180	4320	com.sun.org.apache.xerces.internal.util.IntStack163:		134	4288	org.apache.hadoop.io.DataInputBuffer$Buffer164:		178	4272	java.io.FileInputStream165:		256	4096	java.lang.Byte166:		256	4096	java.lang.Short167:		45	3960	sun.net.www.protocol.jar.JarURLConnection168:		161	3864	java.util.regex.Pattern$Start169:		68	3808	java.beans.MethodDescriptor170:		155	3720	org.apache.avro.Schema$LongSchema171:		116	3712	java.lang.ref.ReferenceQueue172:		154	3696	java.util.regex.Pattern$Slice173:		91	3640	com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl174:		151	3624	java.util.regex.Pattern$TreeInfo175:		45	3600	sun.net.www.protocol.jar.URLJarFile$URLJarFileEntry176:		90	3600	com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$ElementStack2177:		90	3600	com.sun.org.apache.xerces.internal.util.XMLAttributesImpl178:		90	3600	com.sun.org.apache.xerces.internal.xni.parser.XMLInputSource179:		90	3600	com.sun.org.apache.xerces.internal.impl.dtd.XMLDTDDescription180:		90	3600	com.sun.org.apache.xerces.internal.impl.validation.ValidationState181:		90	3600	com.sun.org.apache.xerces.internal.impl.XMLVersionDetector182:		90	3600	com.sun.org.apache.xerces.internal.impl.XMLErrorReporter183:		90	3600	com.sun.xml.internal.stream.XMLEntityStorage184:		90	3600	short[][]185:		90	3600	com.sun.org.apache.xerces.internal.impl.XMLEntityManager$CharacterBufferPool186:		90	3600	com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl187:		88	3520	java.math.BigInteger188:		107	3424	* CompilerICHolderKlass189:		142	3408	java.util.jar.Attributes$Name190:		85	3400	java.util.WeakHashMap$Entry191:		101	3232	org.apache.avro.io.parsing.SkipParser192:		100	3200	java.util.concurrent.ConcurrentHashMap$HashEntry193:		196	3136	java.io.FileDescriptor194:		90	2880	com.sun.org.apache.xerces.internal.util.ParserConfigurationSettings195:		90	2880	org.xml.sax.InputSource196:		180	2880	javax.xml.parsers.SecuritySupport$1197:		90	2880	com.sun.org.apache.xerces.internal.impl.dtd.XMLElementDecl198:		90	2880	com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl$NSContentDriver199:		117	2808	org.apache.log4j.CategoryKey200:		48	2688	java.util.zip.ZipFile$1201:		65	2600	org.apache.log4j.Logger202:		102	2448	org.apache.avro.util.WeakIdentityHashMap203:		76	2432	java.net.URI$Parser204:		101	2424	org.apache.avro.io.ResolvingDecoder205:		101	2424	org.apache.avro.io.parsing.Symbol$Root206:		60	2400	org.codehaus.jackson.map.type.SimpleType207:		42	2352	java.util.jar.JarFile208:		98	2352	com.sun.org.apache.xml.internal.serializer.EncodingInfo209:		12	2304	* KlassKlass210:		48	2304	java.util.zip.ZipFile$ZipFileInputStream211:		51	2192	org.apache.avro.Schema$Field[]212:		90	2160	com.sun.org.apache.xerces.internal.parsers.SecuritySupport$4213:		90	2160	com.sun.org.apache.xerces.internal.impl.dtd.XMLAttributeDecl214:		90	2160	javax.xml.parsers.SecuritySupport$2215:		90	2160	com.sun.org.apache.xerces.internal.dom.SecuritySupport$4216:		90	2160	com.sun.org.apache.xerces.internal.impl.msg.XMLMessageFormatter217:		90	2160	com.sun.org.apache.xerces.internal.impl.dtd.DTDGrammarBucket218:		90	2160	com.sun.org.apache.xerces.internal.util.SecurityManager219:		90	2160	com.sun.org.apache.xerces.internal.util.SymbolTable220:		90	2160	com.sun.org.apache.xerces.internal.xinclude.XIncludeMessageFormatter221:		90	2160	com.sun.org.apache.xerces.internal.impl.validation.ValidationManager222:		128	2048	java.lang.Character223:		49	1960	java.io.BufferedInputStream224:		48	1920	sun.misc.URLClassPath$JarLoader225:		118	1888	java.lang.ref.ReferenceQueue$Lock226:		76	1864	java.lang.reflect.Constructor[]227:		16	1792	java.lang.ThreadLocal$ThreadLocalMap$Entry[]228:		55	1760	java.io.FilePermission229:		51	1632	org.apache.avro.io.parsing.Symbol$FieldOrderAction230:		100	1600	org.apache.avro.io.DirectBinaryDecoder$ByteReader231:		11	1584	java.text.DecimalFormat232:		22	1584	java.beans.PropertyDescriptor233:		16	1536	org.apache.hadoop.mapred.IFile$Writer234:		48	1536	sun.misc.URLClassPath$JarLoader$2235:		48	1536	org.apache.log4j.ProvisionNode236:		23	1504	java.util.concurrent.ConcurrentHashMap$Segment[]237:		61	1464	org.apache.commons.logging.impl.Log4JLogger238:		90	1440	com.sun.org.apache.xerces.internal.parsers.SecuritySupport$2239:		90	1440	com.sun.org.apache.xerces.internal.parsers.SecuritySupport$1240:		45	1440	sun.misc.URLClassPath$FileLoader$1241:		90	1440	com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$XMLDeclDriver242:		90	1440	com.sun.org.apache.xerces.internal.impl.dv.dtd.DTDDVFactoryImpl243:		90	1440	com.sun.org.apache.xerces.internal.impl.dv.SecuritySupport$2244:		90	1440	com.sun.org.apache.xerces.internal.impl.dv.SecuritySupport$1245:		90	1440	com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver246:		30	1440	java.util.StringTokenizer247:		90	1440	com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$TrailingMiscDriver248:		25	1400	java.util.ResourceBundle$CacheKey249:		58	1392	java.util.LinkedList250:		29	1392	java.util.Properties251:		56	1344	sun.reflect.NativeConstructorAccessorImpl252:		42	1344	java.util.zip.Inflater253:		78	1248	java.lang.Object254:		25	1200	java.util.ResourceBundle$BundleReference255:		30	1200	java.math.BigDecimal256:		8	1192	long[]257:		4	1112	java.lang.Long[]258:		23	1104	java.util.concurrent.ConcurrentHashMap259:		34	1088	java.util.concurrent.locks.AbstractQueuedSynchronizer$Node260:		34	1088	org.apache.avro.mapred.AvroSerialization$AvroWrapperSerializer261:		45	1080	sun.net.www.protocol.jar.JarURLConnection$JarURLInputStream262:		67	1072	org.apache.hadoop.fs.Path263:		11	1072	java.util.WeakHashMap$Entry[]264:		2	1064	java.lang.Integer[]265:		33	1056	org.apache.hadoop.io.DataOutputBuffer266:		33	1056	java.util.concurrent.SynchronousQueue$TransferStack$SNode267:		1	1040	java.lang.Byte[]268:		18	1040	java.lang.reflect.Field[]269:		1	1040	java.lang.Short[]270:		43	1032	java.lang.ProcessEnvironment$Variable271:		43	1032	java.lang.ProcessEnvironment$Value272:		43	1032	com.hadoop.compression.lzo.LzoCompressor$CompressionStrategy273:		16	1024	org.apache.hadoop.mapred.Task$CombineValuesIterator274:		32	1024	org.apache.avro.mapred.AvroSerialization$AvroWrapperDeserializer275:		42	1008	java.util.zip.ZStreamRef276:		63	1008	sun.reflect.DelegatingConstructorAccessorImpl277:		17	952	org.apache.hadoop.mapred.MapTask$MapOutputBuffer$InMemValBytes278:		7	904	java.beans.MethodDescriptor[]279:		22	880	java.io.ObjectStreamField280:		35	840	org.apache.hadoop.io.nativeio.Errno281:		34	816	org.apache.avro.io.BinaryEncoder282:		3	816	org.codehaus.jackson.sym.Name[]283:		34	816	org.apache.avro.specific.SpecificDatumWriter284:		25	800	java.util.LinkedList$ListItr285:		25	800	java.util.ResourceBundle$LoaderReference286:		33	792	org.apache.hadoop.io.DataOutputBuffer$Buffer287:		33	792	org.apache.avro.specific.SpecificDatumReader288:		7	784	java.lang.Thread289:		49	784	org.apache.avro.mapred.AvroKey290:		16	768	java.util.concurrent.FutureTask$Sync291:		48	768	sun.net.www.ParseUtil292:		31	744	org.apache.hadoop.io.serializer.SerializationFactory293:		23	736	java.security.AccessControlContext294:		45	720	java.io.FilePermission$1295:		30	720	sun.reflect.generics.tree.SimpleClassTypeSignature296:		11	704	java.text.DecimalFormatSymbols297:		12	672	sun.reflect.DelegatingClassLoader298:		42	672	java.lang.ThreadLocal299:		16	640	org.apache.hadoop.ipc.Client$Call300:		16	640	org.apache.hadoop.conf.Configuration301:		16	640	org.apache.hadoop.io.compress.BlockCompressorStream302:		20	640	java.util.regex.Pattern$Curly303:		20	640	org.apache.hadoop.mapred.Counters$Counter304:		19	608	java.util.Locale305:		25	600	java.util.regex.Pattern$GroupHead306:		5	600	java.net.SocksSocketImpl307:		15	600	sun.nio.ch.SelectionKeyImpl308:		25	600	java.util.regex.Pattern$GroupTail309:		9	576	java.nio.DirectByteBuffer310:		18	576	org.apache.hadoop.fs.FSDataOutputStream311:		18	576	org.apache.hadoop.fs.FSDataOutputStream$PositionCache312:		5	560	java.util.GregorianCalendar313:		5	560	sun.nio.ch.SocketChannelImpl314:		34	544	org.apache.avro.io.BinaryEncoder$SimpleByteWriter315:		17	544	org.apache.hadoop.util.DataChecksum316:		33	528	org.apache.avro.mapred.AvroSerialization317:		11	528	sun.nio.cs.UTF_8$Encoder318:		22	528	sun.reflect.NativeMethodAccessorImpl319:		1	528	java.lang.Character[]320:		33	528	org.apache.avro.mapred.AvroValue321:		16	512	org.apache.hadoop.ipc.RPC$Invocation322:		16	512	org.apache.hadoop.mapred.IFileOutputStream323:		16	512	org.apache.hadoop.mapred.MapTask$MapOutputBuffer$MRResultIterator324:		16	512	com.ngmoco.ngpipes.sourcing.NgBucketingEventCountingReducer325:		16	512	com.ngmoco.ngpipes.sourcing.NgBucketingEventCountingCombiner326:		16	512	org.apache.avro.mapred.HadoopCombiner$PairCollector327:		30	504	sun.reflect.generics.tree.TypeArgument[]328:		31	496	org.apache.hadoop.io.serializer.WritableSerialization329:		6	480	org.apache.hadoop.fs.DF330:		30	480	org.apache.avro.io.parsing.ResolvingGrammarGenerator331:		5	480	sun.util.calendar.Gregorian$Date332:		12	480	java.security.ProtectionDomain333:		19	456	com.ngmoco.ngpipes.utils.NgPipesGlobals$EventClassCounter334:		4	440	java.math.BigInteger[]335:		11	440	java.text.DigitList336:		18	432	java.security.ProtectionDomain[]337:		18	432	java.text.DateFormat$Field338:		18	432	org.apache.avro.io.parsing.Symbol$Terminal339:		13	416	java.security.CodeSource340:		13	416	org.codehaus.jackson.JsonToken341:		17	408	java.util.regex.Pattern$Single342:		17	408	java.util.regex.Pattern$BitClass343:		1	408	com.sun.org.apache.xml.internal.serializer.EncodingInfo[]344:		2	400	org.apache.hadoop.ipc.Client$Connection345:		16	384	java.util.concurrent.Executors$RunnableAdapter346:		12	384	java.io.FileNotFoundException347:		8	384	java.util.TreeMap348:		16	384	org.apache.avro.mapred.HadoopReducerBase$ReduceIterable349:		8	384	java.util.WeakHashMap350:		12	384	java.net.Inet4Address351:		12	384	java.util.regex.Pattern$Branch352:		16	384	org.apache.hadoop.ipc.Client$Connection$3353:		16	384	org.apache.avro.mapred.HadoopCombiner354:		16	384	org.apache.hadoop.io.ObjectWritable355:		2	384	com.hadoop.compression.lzo.LzoCompressor$CompressionStrategy[]356:		23	368	sun.reflect.DelegatingMethodAccessorImpl357:		15	360	org.apache.hadoop.mapred.Task$Counter358:		9	360	sun.misc.Cleaner359:		15	360	java.io.Closeable[]360:		15	360	org.apache.avro.io.parsing.ResolvingGrammarGenerator$LitS2361:		15	360	sun.nio.ch.EPollArrayWrapper$Updator362:		15	360	java.lang.ThreadLocal$ThreadLocalMap363:		7	360	java.beans.PropertyDescriptor[]364:		11	352	java.security.Permissions365:		7	336	java.beans.BeanDescriptor366:		7	336	org.apache.hadoop.fs.permission.FsAction[]367:		14	336	org.apache.avro.Schema$Type368:		14	336	org.apache.hadoop.mapred.JvmTask369:		4	320	java.nio.ByteBuffer[]370:		10	320	java.security.BasicPermissionCollection371:		5	320	org.apache.hadoop.fs.RawLocalFileSystem$RawLocalFileStatus372:		13	312	java.util.concurrent.atomic.AtomicLong373:		3	312	double[]374:		13	312	java.lang.RuntimePermission375:		13	312	org.apache.log4j.Level376:		12	288	org.codehaus.jackson.map.SerializationConfig$Feature377:		18	288	org.apache.hadoop.util.PureJavaCrc32378:		12	288	java.util.Arrays$ArrayList379:		12	288	java.io.ExpiringCache$Entry380:		12	288	java.util.regex.Pattern$Node[]381:		12	288	java.util.regex.Pattern$CharProperty$1382:		11	288	java.io.ObjectStreamField[]383:		12	288	sun.reflect.annotation.AnnotationInvocationHandler384:		4	288	org.apache.log4j.spi.LoggingEvent385:		7	280	java.beans.GenericBeanInfo386:		11	264	org.apache.hadoop.security.UserGroupInformation387:		11	264	java.io.FileOutputStream388:		11	264	sun.misc.MetaIndex389:		11	264	org.apache.avro.Schema$FloatSchema390:		3	264	org.apache.hadoop.hdfs.protocol.DatanodeInfo391:		11	264	org.apache.avro.Schema$IntSchema392:		8	256	java.lang.OutOfMemoryError393:		16	256	java.util.concurrent.FutureTask394:		8	256	sun.misc.ProxyGenerator$PrimitiveTypeInfo395:		8	256	javax.security.auth.Subject$ClassSet396:		14	256	java.security.Principal[]397:		8	256	sun.reflect.UnsafeQualifiedStaticObjectFieldAccessorImpl398:		4	256	java.text.SimpleDateFormat399:		8	248	java.lang.Boolean[]400:		10	240	java.util.jar.Manifest401:		5	240	sun.nio.ch.SocketAdaptor402:		10	240	javax.security.auth.Subject$SecureSet$1403:		10	240	java.net.InetSocketAddress404:		10	240	java.io.FilePermissionCollection405:		6	240	sun.nio.cs.UTF_8$Decoder406:		10	240	java.util.Collections$SynchronizedSet407:		6	240	java.util.IdentityHashMap408:		10	240	org.codehaus.jackson.map.DeserializationConfig$Feature409:		7	224	java.util.Collections$UnmodifiableMap410:		4	224	sun.util.calendar.ZoneInfo411:		4	224	java.text.DateFormatSymbols412:		7	224	org.apache.avro.io.BinaryDecoder$BufferAccessor413:		7	224	java.lang.ClassLoader$NativeLibrary414:		9	216	java.util.logging.Level415:		9	216	org.apache.avro.Schema$BytesSchema416:		9	216	org.apache.hadoop.io.Text417:		3	216	sun.net.www.protocol.jar.URLJarFile418:		6	216	org.apache.hadoop.mapred.TaskLog$LogName[]419:		9	216	javax.security.auth.Subject$SecureSet420:		9	216	java.nio.DirectByteBuffer$Deallocator421:		13	208	java.util.jar.Attributes422:		5	200	java.util.HashMap$ValueIterator423:		5	200	java.util.TreeMap$Entry424:		6	192	java.util.Random425:		6	192	org.apache.hadoop.fs.permission.FsPermission$2426:		8	192	org.apache.hadoop.mapred.TaskStatus$State427:		4	192	org.apache.hadoop.mapred.JobConf428:		8	192	java.lang.annotation.ElementType429:		8	192	org.apache.hadoop.fs.permission.FsAction430:		8	192	java.util.regex.Pattern$8431:		8	192	com.ngmoco.ngpipes.sourcing.NgBucketingEventCountingReducer$TYPE_COUNTERS432:		8	192	java.math.RoundingMode433:		6	192	java.lang.annotation.ElementType[]434:		12	192	java.security.ProtectionDomain$Key435:		12	192	java.util.regex.Pattern$BranchConn436:		12	192	java.util.Formatter$Flags437:		2	184	java.text.DateFormat$Field[]438:		1	184	org.apache.hadoop.mapred.MapTask$MapOutputBuffer439:		11	176	java.text.NumberFormat$Field440:		7	168	org.codehaus.jackson.JsonParser$Feature441:		3	168	org.codehaus.jackson.sym.BytesToNameCanonicalizer442:		7	168	org.apache.avro.io.parsing.Symbol$Kind443:		7	168	java.io.BufferedOutputStream444:		7	168	org.codehaus.jackson.annotate.JsonMethod445:		3	168	org.codehaus.jackson.map.ObjectMapper446:		7	168	org.apache.avro.Schema$DoubleSchema447:		3	168	sun.nio.cs.StreamEncoder448:		5	160	org.apache.hadoop.mapred.TaskLog$LogFileDetail449:		5	160	org.codehaus.jackson.JsonGenerator$Feature450:		10	160	sun.reflect.BootstrapConstructorAccessorImpl451:		5	160	org.apache.hadoop.fs.FileSystem$Cache$Key452:		10	160	sun.reflect.generics.tree.ClassTypeSignature453:		1	160	org.apache.hadoop.io.nativeio.Errno[]454:		10	160	java.util.concurrent.atomic.AtomicInteger455:		5	160	java.nio.channels.SelectionKey[]456:		5	160	org.apache.hadoop.mapred.Counters$Group457:		5	160	sun.reflect.annotation.AnnotationType458:		5	160	org.apache.hadoop.fs.permission.FsPermission459:		2	144	java.math.BigDecimal[]460:		6	144	java.util.regex.Pattern$Ctype461:		1	144	sun.reflect.MethodAccessorGenerator462:		6	144	org.apache.avro.Schema$BooleanSchema463:		6	144	javax.security.auth.login.AppConfigurationEntry464:		6	144	org.codehaus.jackson.annotate.JsonAutoDetect$Visibility465:		6	144	java.net.URLClassLoader$1466:		6	144	org.apache.avro.Schema$MapSchema467:		6	144	org.apache.hadoop.security.UserGroupInformation$AuthenticationMethod468:		6	144	java.lang.StringCoding$StringEncoder469:		6	144	org.apache.hadoop.mapred.TaskStatus$Phase470:		4	128	sun.util.LocaleServiceProviderPool471:		2	128	java.util.logging.Logger472:		4	128	org.apache.avro.io.BinaryDecoder$ByteArrayByteSource473:		4	128	org.apache.log4j.helpers.PatternParser$LiteralPatternConverter474:		4	128	org.apache.avro.io.BinaryDecoder475:		4	128	sun.reflect.generics.reflectiveObjects.TypeVariableImpl476:		1	128	org.apache.hadoop.hdfs.DFSClient$BlockReader477:		1	128	org.apache.hadoop.mapred.MapTask478:		4	128	java.util.concurrent.atomic.AtomicReferenceFieldUpdater$AtomicReferenceFieldUpdaterImpl479:		4	128	sun.reflect.ClassFileAssembler480:		5	120	org.apache.hadoop.mapred.TaskLog$LogName481:		1	120	org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer482:		1	120	org.apache.hadoop.mapred.Child$3483:		1	120	org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread484:		5	120	java.util.logging.LogManager$LogNode485:		3	120	org.apache.hadoop.security.User486:		5	120	java.util.Date487:		1	120	org.apache.hadoop.mapred.Child$2488:		3	120	org.codehaus.jackson.annotate.JsonMethod[]489:		1	120	java.util.logging.LogManager$Cleaner490:		2	112	java.io.ExpiringCache$1491:		1	112	java.lang.ref.Finalizer$FinalizerThread492:		1	112	java.lang.ref.Reference$ReferenceHandler493:		4	96	org.codehaus.jackson.util.BufferRecycler$CharBufferType494:		2	96	org.apache.hadoop.fs.LocalFileSystem495:		4	96	org.apache.avro.io.parsing.Symbol$ImplicitAction496:		4	96	org.apache.hadoop.metrics.util.MetricsTimeVaryingRate$Metrics497:		2	96	org.apache.hadoop.mapred.Task$FileSystemStatisticUpdater498:		4	96	sun.reflect.generics.tree.FormalTypeParameter499:		3	96	org.apache.hadoop.security.SaslRpcServer$AuthMethod500:		1	96	com.ngmoco.ngpipes.utils.NgPipesGlobals$EventClassCounter[]501:		4	96	java.util.regex.Pattern$2502:		3	96	sun.misc.URLClassPath503:		4	96	sun.reflect.generics.tree.FieldTypeSignature[]504:		2	96	javax.security.auth.SubjectDomainCombiner$WeakKeyValueMap505:		4	96	com.ngmoco.ngpipes.sourcing.NgBucketingEventCountingCombiner$JobType506:		2	96	java.lang.ThreadGroup507:		3	96	java.util.RandomAccessSubList508:		2	96	org.apache.hadoop.ipc.Client$ConnectionId509:		6	96	java.util.TreeSet510:		3	96	java.security.PrivilegedActionException511:		6	96	sun.reflect.generics.tree.TypeVariableSignature512:		4	96	java.util.Formatter$FixedString513:		4	96	java.util.Formatter$FormatString[]514:		2	96	java.util.Formatter$FormatSpecifier515:		4	96	sun.reflect.ByteVectorImpl516:		6	96	java.util.concurrent.atomic.AtomicBoolean517:		4	96	sun.nio.ch.Util$BufferCache518:		3	96	java.io.OutputStreamWriter519:		2	96	org.apache.hadoop.metrics.spi.AbstractMetricsContext$TagMap520:		2	96	org.apache.hadoop.mapred.TaskStatus$State[]521:		1	96	org.apache.avro.file.DataFileReader522:		4	96	javax.security.auth.Subject$ClassSet$1523:		3	96	java.io.DataInputStream524:		3	96	java.lang.ClassNotFoundException525:		3	96	javax.security.auth.Subject526:		2	96	org.apache.hadoop.metrics.spi.AbstractMetricsContext$RecordMap527:		2	96	java.io.BufferedWriter528:		3	96	org.apache.hadoop.net.SocketInputStream$Reader529:		3	96	java.util.Collections$SynchronizedMap530:		3	96	java.io.DataOutputStream531:		1	88	org.apache.hadoop.hdfs.DFSClient$DFSInputStream532:		5	80	java.nio.channels.spi.AbstractInterruptibleChannel$1533:		2	80	org.apache.hadoop.fs.FileSystem$Statistics534:		1	80	org.apache.hadoop.hdfs.DFSClient535:		2	80	java.util.Formatter$Flags[]536:		2	80	java.util.PropertyResourceBundle537:		2	80	org.apache.hadoop.mapred.TaskStatus$Phase[]538:		1	80	sun.misc.Launcher$ExtClassLoader539:		1	80	com.hadoop.compression.lzo.LzoCompressor540:		2	80	java.io.ExpiringCache541:		2	80	org.codehaus.jackson.map.type.MapType542:		1	80	org.apache.hadoop.mapred.Task$Counter[]543:		2	80	org.apache.hadoop.metrics.spi.NullContext544:		1	80	java.util.concurrent.ThreadPoolExecutor545:		2	80	org.codehaus.jackson.annotate.JsonAutoDetect$Visibility[]546:		2	80	com.sun.xml.internal.stream.util.BufferAllocator547:		5	80	java.util.HashMap$Values548:		1	80	org.apache.hadoop.mapred.TaskLogAppender549:		1	80	org.apache.hadoop.mapred.MapTaskStatus550:		3	80	javax.security.auth.login.AppConfigurationEntry[]551:		2	80	java.lang.Thread[]552:		1	72	sun.misc.Launcher$AppClassLoader553:		3	72	java.util.Collections$UnmodifiableRandomAccessList554:		3	72	org.apache.hadoop.hdfs.protocol.FSConstants$SafeModeAction555:		1	72	java.util.logging.LogManager$RootLogger556:		3	72	org.codehaus.jackson.map.ser.BasicSerializerFactory$SerializerMapping557:		3	72	org.codehaus.jackson.map.ser.SerializerCache558:		3	72	org.apache.avro.io.parsing.Symbol$Repeater559:		3	72	org.codehaus.jackson.util.BufferRecycler$ByteBufferType560:		3	72	org.codehaus.jackson.map.deser.StdDeserializerProvider561:		3	72	org.apache.hadoop.mapred.JobID562:		1	72	org.apache.avro.Schema$Type[]563:		3	72	java.net.InetAddress[]564:		1	72	sun.nio.ch.EPollSelectorImpl565:		3	72	org.apache.hadoop.mapred.TaskID566:		3	72	org.apache.hadoop.ipc.Status567:		3	72	sun.misc.Signal568:		3	72	org.apache.hadoop.mapred.TaskAttemptID569:		3	72	org.apache.hadoop.hdfs.protocol.FSConstants$DatanodeReportType570:		3	72	java.net.InetAddress$CacheEntry571:		3	72	java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject572:		3	72	org.apache.hadoop.hdfs.protocol.FSConstants$UpgradeAction573:		3	72	org.apache.avro.Schema$Field$Order574:		3	72	java.lang.annotation.RetentionPolicy575:		3	72	java.util.SubList$1576:		3	72	org.apache.hadoop.hdfs.protocol.DatanodeInfo$AdminStates577:		3	72	sun.misc.URLClassPath$FileLoader578:		1	72	org.codehaus.jackson.JsonToken[]579:		2	64	org.apache.hadoop.net.SocketOutputStream$Writer580:		2	64	java.util.Formatter581:		1	64	org.apache.hadoop.metrics.jvm.JvmMetrics582:		4	64	sun.net.www.protocol.jar.Handler583:		1	64	float[]584:		2	64	org.apache.hadoop.security.token.Token585:		2	64	sun.reflect.generics.repository.ClassRepository586:		2	64	java.lang.ref.ReferenceQueue$Null587:		2	64	java.io.PrintStream588:		2	64	org.apache.avro.file.DataFileStream$DataBlock589:		2	64	java.lang.annotation.RetentionPolicy[]590:		2	64	org.apache.avro.Schema$Field$Order[]591:		2	64	org.apache.hadoop.fs.RawLocalFileSystem592:		2	64	javax.security.auth.SubjectDomainCombiner593:		2	64	org.apache.hadoop.metrics.util.MetricsTimeVaryingRate$MinMax594:		2	64	org.apache.hadoop.mapred.SortedRanges$Range595:		4	64	java.util.LinkedHashSet596:		1	64	com.ngmoco.ngpipes.utils.bucketingeventcounting.BucketingEventHandler[]597:		2	64	org.apache.hadoop.hdfs.protocol.DatanodeInfo$AdminStates[]598:		2	64	org.apache.log4j.helpers.PatternParser$BasicPatternConverter599:		4	64	$Proxy4600:		2	64	org.codehaus.jackson.map.MappingJsonFactory601:		4	64	java.util.concurrent.locks.ReentrantLock602:		4	64	com.sun.org.apache.xml.internal.serializer.CharInfo$CharKey603:		1	64	org.codehaus.jackson.map.SerializationConfig$Feature[]604:		2	64	org.apache.hadoop.metrics.spi.MetricsRecordImpl605:		2	64	org.apache.hadoop.metrics.util.MetricsTimeVaryingRate606:		4	64	javax.security.auth.login.AppConfigurationEntry$LoginModuleControlFlag607:		4	64	$Proxy3608:		1	56	java.lang.Runnable[]609:		1	56	com.sun.security.auth.module.UnixLoginModule610:		2	56	sun.reflect.generics.tree.ClassTypeSignature[]611:		1	56	java.nio.ByteBufferAsLongBufferB612:		1	56	sun.nio.ch.EPollArrayWrapper613:		1	56	sun.awt.AppContext614:		1	56	org.codehaus.jackson.util.InternCache615:		1	56	java.util.ResourceBundle$RBClassLoader616:		1	56	javax.security.auth.login.LoginContext617:		1	56	org.codehaus.jackson.map.DeserializationConfig$Feature[]618:		1	48	java.util.concurrent.TimeUnit[]619:		1	48	org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext620:		2	48	org.apache.log4j.helpers.OnlyOnceErrorHandler621:		2	48	org.codehaus.jackson.map.ser.StdSerializers$BooleanSerializer622:		3	48	org.apache.hadoop.fs.LocalDirAllocator623:		2	48	java.net.InetAddress$Cache$Type624:		2	48	org.codehaus.jackson.map.deser.StdDeserializer$CharacterDeserializer625:		2	48	sun.misc.NativeSignalHandler626:		2	48	javax.security.auth.login.LoginContext$ModuleInfo627:		1	48	org.codehaus.jackson.JsonParser$Feature[]628:		2	48	sun.reflect.generics.tree.ClassSignature629:		2	48	org.apache.avro.mapred.AvroKeyComparator630:		1	48	org.apache.hadoop.hdfs.DistributedFileSystem631:		2	48	org.codehaus.jackson.map.deser.StdDeserializer$LongDeserializer632:		1	48	sun.nio.cs.StreamDecoder633:		1	48	org.apache.log4j.Hierarchy634:		2	48	org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream635:		3	48	org.apache.hadoop.net.SocketInputStream636:		3	48	com.sun.org.apache.xerces.internal.impl.dv.dtd.ListDatatypeValidator637:		2	48	sun.reflect.generics.scope.ClassScope638:		2	48	sun.reflect.generics.tree.FormalTypeParameter[]639:		2	48	sun.misc.JarIndex640:		2	48	org.codehaus.jackson.map.deser.StdDeserializer$CalendarDeserializer641:		2	48	org.apache.hadoop.mapred.Counters642:		3	48	java.text.AttributedCharacterIterator$Attribute643:		2	48	org.apache.hadoop.ipc.ConnectionHeader644:		1	48	org.apache.log4j.helpers.PatternParser645:		2	48	org.codehaus.jackson.map.deser.StdDeserializer$FloatDeserializer646:		2	48	sun.awt.MostRecentKeyValue647:		2	48	java.lang.management.ManagementPermission648:		1	48	com.ngmoco.ngpipes.sourcing.NgBucketingEventCountingReducer$TYPE_COUNTERS[]649:		3	48	java.nio.charset.CodingErrorAction650:		2	48	java.nio.charset.CoderResult651:		2	48	java.lang.reflect.TypeVariable[]652:		2	48	com.sun.org.apache.xerces.internal.impl.RevalidationHandler[]653:		2	48	org.codehaus.jackson.map.deser.StdDeserializer$DoubleDeserializer654:		2	48	java.net.InetAddress$Cache655:		2	48	sun.reflect.Label$PatchInfo656:		2	48	java.util.Currency657:		2	48	org.codehaus.jackson.map.deser.StdDeserializer$ShortDeserializer658:		1	48	org.apache.avro.io.parsing.Symbol$Kind[]659:		2	48	sun.reflect.generics.factory.CoreReflectionFactory660:		1	48	java.io.BufferedReader661:		1	48	org.apache.hadoop.mapred.MapTask$TrackedRecordReader662:		2	48	org.apache.hadoop.fs.RawLocalFileSystem$TrackingFileInputStream663:		1	48	java.util.Hashtable$Enumerator664:		2	48	org.codehaus.jackson.map.deser.StdDeserializer$IntegerDeserializer665:		2	48	org.apache.hadoop.ipc.RPC$Invoker666:		1	48	java.math.RoundingMode[]667:		2	48	org.codehaus.jackson.map.deser.StdDeserializer$BooleanDeserializer668:		2	48	org.apache.hadoop.ipc.Client$Connection$PingInputStream669:		2	48	java.util.regex.Pattern$6670:		1	48	org.codehaus.jackson.map.deser.MapDeserializer671:		2	48	org.codehaus.jackson.map.deser.StdDeserializer$ByteDeserializer672:		1	40	sun.util.resources.CalendarData673:		1	40	sun.text.resources.FormatData_en_US674:		1	40	org.apache.hadoop.util.Progress675:		1	40	org.codehaus.jackson.map.ser.MapSerializer676:		1	40	java.util.ResourceBundle$1677:		1	40	org.apache.hadoop.security.UserGroupInformation$AuthenticationMethod[]678:		1	40	org.apache.log4j.helpers.PatternParser$CategoryPatternConverter679:		1	40	sun.text.resources.FormatData_en680:		1	40	org.apache.hadoop.mapred.FileSplit681:		1	40	sun.util.resources.CurrencyNames682:		1	40	org.apache.avro.mapred.HadoopMapper$MapCollector683:		1	40	org.apache.commons.logging.impl.LogFactoryImpl684:		1	40	org.apache.avro.mapred.AvroRecordReader685:		1	40	java.util.logging.LogManager686:		1	40	sun.nio.cs.StandardCharsets$Classes687:		1	40	org.apache.hadoop.mapred.IndexRecord688:		1	40	sun.nio.cs.StandardCharsets$Cache689:		1	40	org.apache.log4j.helpers.PatternParser$DatePatternConverter690:		1	40	sun.util.resources.CalendarData_en691:		1	40	org.apache.hadoop.mapred.Task$TaskReporter692:		1	40	org.apache.hadoop.security.KerberosName$Rule693:		1	40	org.codehaus.jackson.map.util.StdDateFormat694:		1	40	org.codehaus.jackson.JsonGenerator$Feature[]695:		1	40	sun.util.resources.CurrencyNames_en_US696:		1	40	org.apache.hadoop.mapred.TaskAttemptContext697:		1	40	org.apache.hadoop.metrics.jvm.EventCounter698:		1	40	org.apache.log4j.spi.RootLogger699:		1	40	com.sun.security.auth.module.UnixSystem700:		1	40	java.util.IdentityHashMap$ValueIterator701:		1	40	org.apache.hadoop.hdfs.protocol.Block702:		1	40	com.sun.org.apache.xerces.internal.dom.CoreDOMImplementationImpl703:		1	40	org.apache.hadoop.mapred.JobContext704:		1	40	org.apache.hadoop.fs.DF[]705:		1	40	java.util.concurrent.ThreadPoolExecutor$Worker706:		1	40	org.apache.hadoop.ipc.Client707:		1	40	com.sun.org.apache.xml.internal.serializer.CharInfo708:		1	40	sun.text.resources.FormatData709:		1	40	org.apache.hadoop.mapred.Task$OldCombinerRunner710:		2	40	java.io.File[]711:		1	40	sun.security.util.AuthResources712:		1	40	sun.nio.cs.StandardCharsets$Aliases713:		1	40	org.apache.hadoop.hdfs.protocol.LocatedBlock714:		2	32	$Proxy1715:		1	32	org.apache.hadoop.mapred.Child$4716:		1	32	java.io.UnixFileSystem717:		1	32	java.lang.InterruptedException718:		1	32	java.lang.ArithmeticException719:		1	32	com.ngmoco.ngpipes.sourcing.NgBucketingEventCountingCombiner$JobType[]720:		1	32	java.util.concurrent.SynchronousQueue721:		1	32	org.apache.hadoop.hdfs.protocol.FSConstants$SafeModeAction[]722:		2	32	java.util.regex.Pattern$Dot723:		2	32	sun.nio.ch.SocketChannelImpl$1724:		1	32	java.util.TreeMap$KeyIterator725:		2	32	org.apache.hadoop.net.SocketOutputStream726:		1	32	org.apache.hadoop.security.SaslRpcServer$AuthMethod[]727:		1	32	org.apache.hadoop.hdfs.protocol.LocatedBlocks728:		2	32	org.apache.hadoop.net.StandardSocketFactory729:		2	32	sun.nio.ch.SocketOptsImpl$IP$TCP730:		1	32	org.apache.hadoop.ipc.Status[]731:		1	32	org.codehaus.jackson.map.ser.ContainerSerializers$IndexedListSerializer732:		2	32	org.apache.avro.mapred.AvroWrapper733:		1	32	org.apache.hadoop.io.retry.RetryPolicies$RetryUpToMaximumCountWithFixedSleep734:		1	32	byte[][]735:		1	32	sun.misc.HexDumpEncoder736:		2	32	com.sun.org.apache.xerces.internal.impl.dv.dtd.ENTITYDatatypeValidator737:		1	32	org.apache.hadoop.hdfs.protocol.FSConstants$DatanodeReportType[]738:		1	32	java.lang.ClassCastException739:		1	32	java.lang.ref.Reference740:		1	32	org.apache.hadoop.hdfs.DFSClient$DFSDataInputStream741:		1	32	org.apache.hadoop.mapred.TaskLogsTruncater742:		2	32	java.util.logging.Handler[]743:		1	32	org.apache.log4j.helpers.QuietWriter744:		1	32	org.codehaus.jackson.map.ser.ArraySerializers$ObjectArraySerializer745:		1	32	sun.management.VMManagementImpl746:		1	32	java.lang.NullPointerException747:		1	32	org.apache.avro.io.BinaryData$Decoders748:		1	32	sun.reflect.MethodAccessorGenerator$1749:		1	32	org.codehaus.jackson.util.BufferRecycler$CharBufferType[]750:		2	32	sun.nio.ch.OptionAdaptor751:		1	32	java.lang.RuntimeException752:		1	32	org.apache.hadoop.io.NullWritable$Comparator753:		1	32	org.codehaus.jackson.JsonFactory754:		1	32	org.apache.hadoop.security.UserGroupInformation$UgiMetrics755:		1	32	org.apache.log4j.PatternLayout756:		1	32	sun.misc.SoftCache757:		1	32	org.apache.hadoop.security.Groups758:		1	32	java.lang.VirtualMachineError759:		1	32	java.text.DontCareFieldPosition760:		2	32	java.lang.Boolean761:		1	32	org.apache.hadoop.hdfs.protocol.DatanodeInfo[]762:		1	32	org.codehaus.jackson.map.introspect.VisibilityChecker$Std763:		1	32	org.apache.hadoop.io.Text$Comparator764:		1	32	org.apache.hadoop.mapred.SortedRanges$SkipRangeIterator765:		2	32	javax.security.auth.Subject$1766:		1	32	org.apache.hadoop.mapred.MapTask$MapOutputBuffer$BlockingBuffer767:		1	32	java.beans.PropertyChangeSupport768:		1	32	org.apache.hadoop.io.UTF8$Comparator769:		2	32	java.lang.Shutdown$Lock770:		1	32	char[][]771:		1	32	sun.nio.ch.AllocatedNativeObject772:		2	32	java.lang.annotation.Annotation[]773:		1	32	org.codehaus.jackson.map.ser.ContainerSerializers$CollectionSerializer774:		2	32	$Proxy2775:		1	32	org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool$SelectorInfo776:		1	32	java.lang.ThreadGroup[]777:		2	32	java.nio.ByteOrder778:		1	32	org.apache.hadoop.hdfs.protocol.FSConstants$UpgradeAction[]779:		1	32	org.codehaus.jackson.util.BufferRecycler$ByteBufferType[]780:		1	32	sun.nio.cs.StandardCharsets781:		2	32	org.codehaus.jackson.map.ser.StdKeySerializer782:		1	32	java.lang.OutOfMemoryError[]783:		2	32	org.codehaus.jackson.map.ser.StdSerializers$NumberSerializer784:		1	32	com.ngmoco.ngpipes.sourcing.NgBucketingEventCountingMapper785:		1	24	org.apache.hadoop.mapred.JvmContext786:		1	24	java.util.zip.CheckedOutputStream787:		1	24	org.apache.avro.specific.SpecificData788:		1	24	org.apache.avro.mapred.FsInput789:		1	24	java.lang.reflect.ReflectPermission790:		1	24	org.codehaus.jackson.map.ser.ArraySerializers$BooleanArraySerializer791:		1	24	org.codehaus.jackson.map.ext.DOMSerializer792:		1	24	org.apache.avro.io.DecoderFactory$DefaultDecoderFactory793:		1	24	java.util.Vector$1794:		1	24	org.apache.hadoop.metrics.ContextFactory795:		1	24	org.apache.avro.io.DecoderFactory796:		1	24	java.util.concurrent.CopyOnWriteArrayList797:		1	24	org.apache.avro.file.DataFileReader$SeekableInputStream798:		1	24	org.apache.hadoop.io.ObjectWritable$NullInstance799:		1	24	javax.crypto.spec.SecretKeySpec800:		1	24	org.codehaus.jackson.map.ser.FailingSerializer801:		1	24	org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool$ProviderInfo802:		1	24	org.apache.hadoop.security.Credentials803:		1	24	java.lang.Class$4804:		1	24	sun.net.ProgressMonitor805:		1	24	java.util.Collections$EmptyMap806:		1	24	com.ngmoco.ngpipes.sourcing.NgBucketingEventCountingMapper$MapperHandlersCalled[]807:		1	24	java.util.concurrent.TimeUnit$1808:		1	24	org.codehaus.jackson.map.ser.ArraySerializers$StringArraySerializer809:		1	24	java.lang.Class$1810:		1	24	org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer811:		1	24	org.apache.hadoop.fs.Path[]812:		1	24	java.lang.ProcessEnvironment$StringEnvironment813:		1	24	com.ngmoco.ngpipes.sourcing.NgBucketingEventCountingMapper$MapperHandlersCalled814:		1	24	java.util.logging.LoggingPermission815:		1	24	sun.management.ThreadImpl816:		1	24	org.apache.hadoop.hdfs.DFSClient$LeaseChecker817:		1	24	org.apache.hadoop.mapred.Task$CombineOutputCollector818:		1	24	javax.security.auth.login.LoginContext$ModuleInfo[]819:		1	24	org.codehaus.jackson.map.ser.ArraySerializers$LongArraySerializer820:		1	24	com.sun.security.auth.PolicyFile821:		1	24	java.util.BitSet822:		1	24	org.codehaus.jackson.map.ser.ArraySerializers$IntArraySerializer823:		1	24	java.net.InetAddress$Cache$Type[]824:		1	24	java.net.Inet6AddressImpl825:		1	24	org.codehaus.jackson.util.BufferRecycler826:		1	24	java.lang.reflect.Type[]827:		1	24	org.apache.hadoop.mapreduce.server.tasktracker.JVMInfo828:		1	24	org.codehaus.jackson.map.ser.ArraySerializers$FloatArraySerializer829:		1	24	org.apache.hadoop.ipc.Client$1830:		1	24	java.io.FileReader831:		1	24	org.codehaus.jackson.map.ser.ArraySerializers$ShortArraySerializer832:		1	24	java.security.Policy$UnsupportedEmptyCollection833:		1	24	com.sun.security.auth.UnixNumericGroupPrincipal834:		1	24	org.apache.hadoop.io.retry.RetryInvocationHandler835:		1	24	org.apache.hadoop.io.retry.RetryPolicies$RemoteExceptionDependentRetry836:		1	24	java.util.concurrent.TimeUnit$2837:		1	24	org.apache.hadoop.fs.FileSystem$Cache838:		1	24	org.apache.avro.mapred.HadoopMapper839:		1	24	org.codehaus.jackson.map.deser.JsonNodeDeserializer840:		1	24	java.util.concurrent.TimeUnit$4841:		1	24	org.apache.hadoop.mapred.MapRunner842:		1	24	org.apache.avro.io.BinaryDecoder$InputStreamByteSource843:		1	24	org.apache.hadoop.mapred.SpillRecord844:		1	24	org.codehaus.jackson.map.ser.ArraySerializers$DoubleArraySerializer845:		1	24	sun.reflect.generics.reflectiveObjects.ParameterizedTypeImpl846:		1	24	org.apache.hadoop.mapred.Task[]847:		1	24	com.sun.org.apache.xml.internal.utils.XMLReaderManager848:		1	24	org.apache.hadoop.mapred.JVMId849:		1	24	java.util.concurrent.Executors$DefaultThreadFactory850:		1	24	sun.nio.cs.UTF_8851:		1	24	java.lang.StringCoding$StringDecoder852:		1	24	org.apache.log4j.helpers.ISO8601DateFormat853:		1	24	java.util.concurrent.TimeUnit$7854:		1	24	org.apache.hadoop.mapred.MapOutputFile855:		1	24	java.util.concurrent.TimeUnit$5856:		1	24	org.apache.hadoop.mapreduce.split.JobSplit$TaskSplitIndex857:		1	24	java.util.concurrent.TimeUnit$6858:		1	24	org.apache.hadoop.mapred.SortedRanges859:		1	24	org.apache.hadoop.io.retry.RetryPolicies$ExceptionDependentRetry860:		1	24	java.util.concurrent.TimeUnit$3861:		1	24	org.apache.log4j.helpers.FormattingInfo862:		1	24	org.apache.hadoop.mapred.MapTask$OldOutputCollector863:		1	16	org.codehaus.jackson.map.ser.StdSerializers$UtilDateSerializer864:		1	16	com.sun.org.apache.xerces.internal.jaxp.datatype.DatatypeFactoryImpl865:		1	16	org.apache.hadoop.io.Text$1866:		1	16	org.apache.hadoop.io.NullWritable867:		1	16	java.security.ProtectionDomain$2868:		1	16	org.apache.hadoop.hdfs.protocol.DatanodeInfo$1869:		1	16	org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool870:		1	16	org.codehaus.jackson.map.deser.BeanDeserializerFactory871:		1	16	java.util.regex.Pattern$5872:		1	16	org.mortbay.log.Slf4jLog873:		1	16	com.ngmoco.ngpipes.utils.bucketingeventcounting.DefaultPublicEventHandler874:		1	16	org.codehaus.jackson.map.deser.StdKeyDeserializer$BoolKD875:		1	16	org.codehaus.jackson.map.ser.StdSerializers$SqlTimeSerializer876:		1	16	org.codehaus.jackson.map.ser.StdSerializers$StringSerializer877:		1	16	org.codehaus.jackson.map.type.TypeFactory878:		1	16	sun.misc.FloatingDecimal$1879:		1	16	org.codehaus.jackson.map.ser.BeanSerializerFactory880:		1	16	org.codehaus.jackson.map.deser.StdDeserializer$StringDeserializer881:		1	16	org.apache.hadoop.util.ReflectionUtils$1882:		1	16	org.codehaus.jackson.map.deser.FromStringDeserializer$PatternDeserializer883:		1	16	sun.nio.ch.SocketDispatcher884:		1	16	org.apache.hadoop.mapred.lib.HashPartitioner885:		1	16	com.ngmoco.ngpipes.utils.bucketingeventcounting.UINavPublicEventHandler886:		1	16	java.util.Hashtable$EmptyIterator887:		1	16	java.lang.InheritableThreadLocal888:		1	16	java.text.FieldPosition[]889:		1	16	org.apache.commons.logging.LogFactory$1890:		1	16	com.sun.security.auth.UnixPrincipal891:		1	16	org.apache.avro.file.NullCodec892:		1	16	com.sun.org.apache.xerces.internal.parsers.SecuritySupport893:		1	16	org.codehaus.jackson.map.deser.ArrayDeserializers$FloatDeser894:		1	16	java.util.regex.Pattern$LastNode895:		1	16	java.util.Hashtable$EmptyEnumerator896:		1	16	org.apache.hadoop.hdfs.protocol.HdfsFileStatus$1897:		1	16	com.sun.org.apache.xerces.internal.impl.dv.dtd.NMTOKENDatatypeValidator898:		1	16	sun.reflect.ReflectionFactory899:		1	16	org.apache.hadoop.hdfs.protocol.LocatedBlock$1900:		1	16	org.codehaus.jackson.map.ser.StdSerializerProvider$1901:		1	16	org.apache.avro.generic.GenericDatumReader$1902:		1	16	org.apache.avro.io.BinaryData$1903:		1	16	java.nio.charset.CoderResult$1904:		1	16	org.codehaus.jackson.map.ext.CoreXMLDeserializers$DOMDocumentDeserializer905:		1	16	sun.reflect.GeneratedConstructorAccessor4906:		1	16	org.codehaus.jackson.map.introspect.BasicClassIntrospector907:		1	16	org.codehaus.jackson.map.deser.StdKeyDeserializer$IntKD908:		1	16	org.codehaus.jackson.map.ser.StdSerializers$IntLikeSerializer909:		1	16	org.codehaus.jackson.map.ser.JdkSerializers$FileSerializer910:		1	16	org.codehaus.jackson.map.deser.StdKeyDeserializer$LongKD911:		1	16	com.sun.org.apache.xerces.internal.dom.CharacterDataImpl$1912:		1	16	java.nio.channels.spi.AbstractSelector$1913:		1	16	org.apache.log4j.helpers.AppenderAttachableImpl914:		1	16	java.util.concurrent.ThreadPoolExecutor$AbortPolicy915:		1	16	org.codehaus.jackson.map.deser.StdDeserializer$TokenBufferDeserializer916:		1	16	java.util.ResourceBundle$Control917:		1	16	org.codehaus.jackson.map.deser.ArrayDeserializers$ByteDeser918:		1	16	org.codehaus.jackson.map.deser.FromStringDeserializer$URIDeserializer919:		1	16	org.codehaus.jackson.map.ser.StdSerializers$SqlDateSerializer920:		1	16	org.apache.hadoop.io.Text$2921:		1	16	java.nio.charset.CoderResult$2922:		1	16	com.sun.org.apache.xerces.internal.impl.dv.dtd.NOTATIONDatatypeValidator923:		1	16	org.codehaus.jackson.map.deser.ArrayDeserializers$IntDeser924:		1	16	org.apache.hadoop.mapred.FileOutputCommitter925:		1	16	java.lang.ref.Reference$Lock926:		1	16	java.security.AccessControlContext$1927:		1	16	java.lang.Runtime928:		1	16	org.apache.hadoop.io.retry.RetryPolicies$TryOnceThenFail929:		1	16	java.lang.System$2930:		1	16	javax.xml.datatype.SecuritySupport931:		1	16	org.slf4j.impl.StaticLoggerBinder932:		1	16	org.codehaus.jackson.map.deser.ArrayDeserializers$StringDeser933:		1	16	java.net.URLClassLoader$7934:		1	16	org.codehaus.jackson.map.deser.ArrayDeserializers935:		1	16	org.codehaus.jackson.map.deser.StdDeserializer$BigDecimalDeserializer936:		1	16	java.util.WeakHashMap$KeySet937:		1	16	$Proxy0938:		1	16	com.ngmoco.ngpipes.utils.bucketingeventcounting.SessionDurationBucketingHandler939:		1	16	org.codehaus.jackson.map.ser.StdSerializers$IntegerSerializer940:		1	16	sun.reflect.GeneratedConstructorAccessor5941:		1	16	org.codehaus.jackson.map.deser.StdKeyDeserializer$ByteKD942:		1	16	java.util.jar.JavaUtilJarAccessImpl943:		1	16	com.ngmoco.ngpipes.utils.bucketingeventcounting.PlusUINavPrivateEventHandler944:		1	16	sun.misc.ASCIICaseInsensitiveComparator945:		1	16	org.codehaus.jackson.map.introspect.JacksonAnnotationIntrospector946:		1	16	org.codehaus.jackson.map.introspect.BasicClassIntrospector$GetterMethodFilter947:		1	16	sun.reflect.GeneratedConstructorAccessor9948:		1	16	sun.net.www.protocol.jar.JarFileFactory949:		1	16	com.sun.org.apache.xerces.internal.impl.dv.SecuritySupport950:		1	16	com.ngmoco.ngpipes.utils.bucketingeventcounting.DefaultPrivateEventHandler951:		1	16	com.sun.org.apache.xerces.internal.impl.dv.dtd.IDREFDatatypeValidator952:		1	16	org.codehaus.jackson.map.ext.CoreXMLDeserializers$GregorianCalendarDeserializer953:		1	16	com.sun.org.apache.xerces.internal.impl.dv.dtd.StringDatatypeValidator954:		1	16	org.apache.hadoop.util.QuickSort955:		1	16	java.lang.reflect.ReflectAccess956:		1	16	java.text.DontCareFieldPosition$1957:		1	16	sun.jkernel.DownloadManager$1958:		1	16	com.sun.org.apache.xml.internal.serializer.SecuritySupport12959:		1	16	org.apache.hadoop.util.HeapSort960:		1	16	org.apache.avro.file.NullCodec$Option961:		1	16	org.apache.hadoop.security.UserGroupInformation$HadoopConfiguration962:		1	16	org.codehaus.jackson.map.deser.StdDeserializer$ClassDeserializer963:		1	16	java.util.Collections$UnmodifiableSet964:		1	16	org.codehaus.jackson.map.ser.StdSerializers$FloatSerializer965:		1	16	org.apache.log4j.helpers.NullEnumeration966:		1	16	org.slf4j.impl.Log4jLoggerFactory967:		1	16	org.codehaus.jackson.map.deser.FromStringDeserializer$UUIDDeserializer968:		1	16	sun.misc.Launcher$Factory969:		1	16	sun.nio.ch.Util$1970:		1	16	sun.nio.ch.Util$2971:		1	16	java.util.Hashtable$EntrySet972:		1	16	java.beans.EventSetDescriptor[]973:		1	16	sun.util.calendar.Gregorian974:		1	16	org.codehaus.jackson.map.ser.ArraySerializers$ByteArraySerializer975:		1	16	org.codehaus.jackson.map.deser.FromStringDeserializer$URLDeserializer976:		1	16	org.apache.hadoop.mapred.FileInputFormat$1977:		1	16	org.apache.hadoop.io.retry.RetryPolicies$TryOnceDontFail978:		1	16	org.codehaus.jackson.map.deser.StdKeyDeserializer$FloatKD979:		1	16	sun.reflect.GeneratedConstructorAccessor3980:		1	16	org.codehaus.jackson.map.ser.NullSerializer981:		1	16	java.util.Collections$EmptySet982:		1	16	javax.xml.parsers.SecuritySupport983:		1	16	org.codehaus.jackson.map.deser.ArrayDeserializers$LongDeser984:		1	16	org.codehaus.jackson.map.ser.StdSerializers$CalendarSerializer985:		1	16	org.codehaus.jackson.map.ser.ToStringSerializer986:		1	16	com.sun.org.apache.xerces.internal.impl.dv.dtd.IDDatatypeValidator987:		1	16	org.apache.log4j.or.DefaultRenderer988:		1	16	java.lang.String$CaseInsensitiveComparator989:		1	16	org.apache.hadoop.fs.permission.FsPermission$1990:		1	16	org.codehaus.jackson.map.deser.ArrayDeserializers$BooleanDeser991:		1	16	java.lang.ApplicationShutdownHooks$1992:		1	16	org.slf4j.impl.Log4jLoggerAdapter993:		1	16	org.apache.avro.generic.GenericData994:		1	16	com.sun.security.auth.UnixNumericUserPrincipal995:		1	16	org.apache.hadoop.security.token.SecretManager$1996:		1	16	java.util.concurrent.SynchronousQueue$TransferStack997:		1	16	org.apache.log4j.spi.DefaultRepositorySelector998:		1	16	org.codehaus.jackson.map.ser.BasicSerializerFactory999:		1	16	com.ngmoco.ngpipes.utils.bucketingeventcounting.SessionSSTHardwareProfileBucketingHandler1000:		1	16	org.codehaus.jackson.map.ext.CoreXMLDeserializers$DurationDeserializer1001:		1	16	org.codehaus.jackson.map.deser.ArrayDeserializers$CharDeser1002:		1	16	org.apache.avro.Schema$21003:		1	16	org.codehaus.jackson.map.ser.JdkSerializers$ClassSerializer1004:		1	16	org.apache.hadoop.hdfs.protocol.Block$11005:		1	16	org.codehaus.jackson.map.deser.StdDeserializer$SqlDateDeserializer1006:		1	16	org.codehaus.jackson.map.ext.CoreXMLDeserializers$DOMNodeDeserializer1007:		1	16	java.net.InetAddress$11008:		1	16	sun.util.resources.LocaleData$LocaleDataResourceBundleControl1009:		1	16	org.apache.log4j.or.RendererMap1010:		1	16	java.lang.StackTraceElement[]1011:		1	16	org.codehaus.jackson.map.ser.StdSerializers$TokenBufferSerializer1012:		1	16	org.apache.hadoop.metrics.util.MetricsRegistry1013:		1	16	sun.reflect.GeneratedConstructorAccessor21014:		1	16	org.codehaus.jackson.map.ser.ArraySerializers$CharArraySerializer1015:		1	16	com.sun.org.apache.xerces.internal.dom.SecuritySupport1016:		1	16	org.apache.avro.Schema$11017:		1	16	javax.xml.transform.SecuritySupport1018:		1	16	org.apache.avro.file.DeflateCodec$Option1019:		1	16	org.apache.hadoop.security.ShellBasedUnixGroupsMapping1020:		1	16	java.net.UnknownContentHandler1021:		1	16	org.apache.log4j.DefaultCategoryFactory1022:		1	16	sun.reflect.Label1023:		1	16	org.apache.hadoop.ipc.RPC$ClientCache1024:		1	16	org.codehaus.jackson.map.deser.StdKeyDeserializer$CharKD1025:		1	16	java.io.FileDescriptor$11026:		1	16	sun.reflect.GeneratedConstructorAccessor11027:		1	16	sun.reflect.GeneratedMethodAccessor11028:		1	16	org.apache.hadoop.fs.ChecksumFileSystem$11029:		1	16	org.codehaus.jackson.map.deser.DateDeserializer1030:		1	16	sun.reflect.GeneratedConstructorAccessor61031:		1	16	org.codehaus.jackson.map.deser.StdKeyDeserializer$DoubleKD1032:		1	16	sun.net.DefaultProgressMeteringPolicy1033:		1	16	org.codehaus.jackson.map.deser.StdDeserializer$BigIntegerDeserializer1034:		1	16	sun.net.www.protocol.file.Handler1035:		1	16	$Proxy61036:		1	16	java.util.IdentityHashMap$Values1037:		1	16	sun.reflect.GeneratedMethodAccessor21038:		1	16	sun.reflect.ClassDefiner$11039:		1	16	org.apache.hadoop.hdfs.protocol.DatanodeID[]1040:		1	16	org.apache.hadoop.hdfs.protocol.LocatedBlocks$21041:		1	16	com.ngmoco.ngpipes.utils.bucketingeventcounting.VirtualEconomyChangeEventHandler1042:		1	16	org.apache.hadoop.mapred.Reporter$11043:		1	16	org.codehaus.jackson.map.deser.StdDeserializer$NumberDeserializer1044:		1	16	java.util.TreeMap$KeySet1045:		1	16	java.security.cert.Certificate[]1046:		1	16	org.apache.hadoop.io.UTF8$11047:		1	16	com.ngmoco.ngpipes.utils.bucketingeventcounting.SessionHardwareProfileBucketingHandler1048:		1	16	org.apache.hadoop.fs.FileSystem$31049:		1	16	org.codehaus.jackson.map.ser.StdSerializers$LongSerializer1050:		1	16	com.ngmoco.ngpipes.utils.bucketingeventcounting.PlusPrivatePresenceEventHandler1051:		1	16	java.util.Collections$EmptyList1052:		1	16	java.lang.Terminator$11053:		1	16	org.codehaus.jackson.map.ser.StdSerializers$DoubleSerializer1054:		1	16	org.codehaus.jackson.map.type.TypeParser1055:		1	16	com.sun.beans.WeakCache1056:		1	16	sun.reflect.GeneratedConstructorAccessor81057:		1	16	sun.misc.Unsafe1058:		1	16	java.util.regex.Pattern$Node1059:		1	16	sun.reflect.GeneratedConstructorAccessor71060:		1	16	org.codehaus.jackson.map.deser.UntypedObjectDeserializer1061:		1	16	sun.nio.ch.EPollSelectorProvider1062:		1	16	com.ngmoco.ngpipes.utils.bucketingeventcounting.FunnelEventHandler1063:		1	16	org.codehaus.jackson.map.deser.ArrayDeserializers$DoubleDeser1064:		1	16	$Proxy51065:		1	16	org.codehaus.jackson.map.ext.CoreXMLDeserializers$QNameDeserializer1066:		1	16	org.apache.hadoop.io.retry.RetryPolicies$RetryForever1067:		1	16	org.codehaus.jackson.map.deser.StdDeserializer$StackTraceElementDeserializer1068:		1	16	org.apache.avro.io.BinaryData$21069:		1	16	org.codehaus.jackson.map.deser.ArrayDeserializers$ShortDeser1070:		1	16	java.math.BigDecimal$11071:		1	16	org.codehaus.jackson.node.JsonNodeFactory1072:		1	16	org.apache.hadoop.security.UserGroupInformation$HadoopLoginModule1073:		1	16	com.ngmoco.ngpipes.utils.bucketingeventcounting.PlusFriendSearchEventHandler1074:		1	16	org.apache.hadoop.metrics.jvm.EventCounter$EventCounts1075:		1	16	sun.reflect.GeneratedConstructorAccessor101076:		1	16	com.ngmoco.ngpipes.utils.bucketingeventcounting.RevenueEventHandler1077:		1	16	com.hadoop.compression.lzo.LzoCodec1078:		1	16	java.util.Collections$ReverseComparator1079:		1	16	sun.misc.Launcher1080:		1	16	com.sun.org.apache.xalan.internal.xsltc.dom.SecuritySupport121081:		1	16	java.lang.FloatTotal : 	3125077	482600080Heap traversal took 13.835 seconds.




From: scott@richrelevance.com
To: user@avro.apache.org
Date: Wed, 1 Jun 2011 13:45:53 -0700
Subject: Re: avro object reuse




Lower down this list of object counts, what are the top org.apache.avro.** object counts?
How many AvroSerialization objects?  How many AvroMapper,  AvroWrapper, etc?
What about org.apache.hadoop.** objects?
On 6/1/11 11:34 AM, "ey-chih chow" <ey...@hotmail.com> wrote:

We ran jmap on one of our mapper and found the top usage as follows:
num 	  #instances	#bytes	Class description--------------------------------------------------------------------------1:		24405	291733256	byte[]2:		6056	40228984	int[]3:		388799	19966776	char[]4:		101779	16284640	org.codehaus.jackson.impl.ReaderBasedParser5:		369623	11827936	java.lang.String6:		111059	8769424	java.util.HashMap$Entry[]7:		204083	8163320	org.codehaus.jackson.impl.JsonReadContext8:		211374	6763968	java.util.HashMap$Entry9:		102551	5742856	org.codehaus.jackson.util.TextBuffer10:		105854	5080992	java.nio.HeapByteBuffer11:		105821	5079408	java.nio.HeapCharBuffer12:		104578	5019744	java.util.HashMap13:		102551	4922448	org.codehaus.jackson.io.IOContext14:		101782	4885536	org.codehaus.jackson.map.DeserializationConfig15:		101783	4071320	org.codehaus.jackson.sym.CharsToNameCanonicalizer16:		101779	4071160	org.codehaus.jackson.map.deser.StdDeserializationContext17:		101779	4071160	java.io.StringReader18:		101754	4070160	java.util.HashMap$KeyIterator
It looks like Jackson eats up a lot of memory.  Our mapper reads in files of the avro format.  Does avro use Jackson a lot in reading the avro files?  Is there any way to improve this?  Thanks.
Ey-Chih Chow
From: scott@richrelevance.com
To: user@avro.apache.org
Date: Tue, 31 May 2011 18:26:23 -0700
Subject: Re: avro object reuse

All of those instances are short-lived.   If you are running out of memory, its not likely due to object reuse.  This tends to cause more CPU time in the garbage collector, but not out of memory conditions.  This can be hard to do on a cluster, but grabbing 'jmap –histo' output from a JVM that has a larger-than-expected JVM heap usage can often be used to quickly identify the cause of memory consumption issues.
I'm not sure if AvroUtf8InputFormat can safely re-use its instances of Utf8 or not.

On 5/31/11 5:40 PM, "ey-chih chow" <ey...@hotmail.com> wrote:

I actually looked into Avro code to find out how Avro does object reuse.  I looked at AvroUtf8InputFormat and got the following question.  Why a new Utf8 object has to be created each time the method next(AvroWrapper<Utf8> key, NullWritable value) is called ?  Will this eat up too much memory when we call next(key, value) many times?  Since Utf8 is mutable, can we just create one Utf8 object for all the calls to next(key, value)?  Will this save memory?  Thanks.
Ey-Chih Chow 

From: eychih@hotmail.com
To: user@avro.apache.org
Subject: avro object reuse
Date: Tue, 31 May 2011 10:38:39 -0700




Hi, 
We have several mapreduce jobs using avro.  They take too much memory when running on production.  Can anybody suggest some object reuse techniques to cut down memory usage?  Thanks.
Ey-Chih Chow

Re: avro object reuse

Posted by Scott Carey <sc...@richrelevance.com>.

It is just a hunch that an OOME can happen if a corrupted array size is read (since I have seen this before).  Without the OOME stack trace, I can't say either way.  Sometimes the OOME stack trace is useless, because other things leaked leading to it, and other times it can show the source of the problem because it happens during an attempt to allocate a very large object or object graph.

Because OOME is not of type Exception, but rather (Throwable/Error), it usually gets printed out somewhere (check the std err logs of the map job) even when logging is turned down.



On 6/10/11 12:07 PM, "ey-chih chow" <ey...@hotmail.com>> wrote:

We have many MR jobs running on production, but only one of them shows this kind of behavior.  Is there any specific condition that corruption will occur?

________________________________
From: scott@richrelevance.com<ma...@richrelevance.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Date: Fri, 10 Jun 2011 11:11:55 -0700
Subject: Re: avro object reuse

Corruption can occur in I/O busses and RAM.  Does this tend to fail on the same nodes, or any node randomly?  Since it does not fail consistently, this makes me suspect some sort of corruption even more.

I suggest turning on stack traces for fatal throwables.  This shouldn't hurt production performance since they don't happen regularly and break the task anyway.

Of the heap dumps seen so far, the primary consumption is byte[] and no more than 300MB.  How large are your java heaps?

On 6/10/11 10:53 AM, "ey-chih chow" <ey...@hotmail.com>> wrote:

Since this was in production, we did not turn on stack trace.  Also, it was highly unlikely that there was any data corrupted because, if one mapper failed due to out of memory, the system started another one and went through all the data.

________________________________
From: scott@richrelevance.com<ma...@richrelevance.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Date: Thu, 9 Jun 2011 17:43:02 -0700
Subject: Re: avro object reuse

If the exception is happening while decoding, it could be due to corrupt data. Avro allocates a List preallocated to the size encoded, and I've seen corrupted data cause attempted allocations of arrays too large for the heap.

On 6/9/11 4:58 PM, "Scott Carey" <sc...@richrelevance.com>> wrote:

What is the stack trace on the out of memory exception?


On 6/9/11 4:45 PM, "ey-chih chow" <ey...@hotmail.com>> wrote:

We configure more than 100MB for MapReduce to do sorting.  Memory we allocate for doing other things in the mapper actually is larger, but, for this job, we always get out-of-meory exceptions and the job can not complete.  We try to find out if there is a way to avoid this problem.

Ey-Chih Chow

________________________________
From: scott@richrelevance.com<ma...@richrelevance.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Date: Thu, 9 Jun 2011 15:42:10 -0700
Subject: Re: avro object reuse

The most likely candidate for creating many instances of BufferAccessor and ByteArrayByteSource is BinaryData.compare() and BinaryData.hashCode().  Each call will create one of each (hash) or two of each (compare).  These are only 32 bytes per instance and quickly become garbage that is easily cleaned up by the GC.

The below have only 32 bytes each and 8MB total.
On the other hand,  the byte[]'s appear to be about 24K each on average and are using 100MB.  Is this the size of your configured MapReduce sort MB?

On 6/9/11 3:08 PM, "ey-chih chow" <ey...@hotmail.com>> wrote:

We did more monitoring.  At one instance, we got the following histogram via Jmap.  The question is why there are so many instances of BinaryDecoder$BufferAccessor and BinaryDecoder$ByteArrayByteSource.  How to avoid this?  Thanks.

Object Histogram:

num       #instances    #bytes  Class description
--------------------------------------------------------------------------
1:              4199    100241168       byte[]
2:              272948  8734336 org.apache.avro.io.BinaryDecoder$BufferAccessor
3:              272945  8734240 org.apache.avro.io.BinaryDecoder$ByteArrayByteSource
4:              2093    5387976 int[]
5:              23762   2822864 * ConstMethodKlass
6:              23762   1904760 * MethodKlass
7:              39295   1688992 * SymbolKlass
8:              2127    1216976 * ConstantPoolKlass
9:              2127    882760  * InstanceKlassKlass
10:             1847    742936  * ConstantPoolCacheKlass
11:             9602    715608  char[]
12:             1072    299584  * MethodDataKlass
13:             9698    232752  java.lang.String
14:             2317    222432  java.lang.Class
15:             3288    204440  short[]
16:             3167    156664  * System ObjArray
17:             2401    57624   java.util.HashMap$Entry
18:             666     53280   java.lang.reflect.Method
19:             161     52808   * ObjArrayKlassKlass
20:             1808    43392   java.util.Hashtable$Entry


________________________________
From: eychih@hotmail.com<ma...@hotmail.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Subject: RE: avro object reuse
Date: Wed, 1 Jun 2011 15:14:03 -0700

We use a lot of toString() call on the avro Utf8 object.  Will this cause Jackson call?  Thanks.

Ey-Chih

________________________________
From: scott@richrelevance.com<ma...@richrelevance.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Date: Wed, 1 Jun 2011 13:38:39 -0700
Subject: Re: avro object reuse

This is great info.

Jackson should only be used once when the file is opened, so this is confusing from that point of view.
Is something else using Jackson or initializing an Avro JsonDecoder frequently?  There are over 100000 Jackson DeserializationConfig objects.

Another place that parses the schema is in AvroSerialization.java.  Does the Hadoop getDeserializer() API method get called once per job, or per record?  If this is called more than once per map job, it might explain this.

In principle, Jackson is only used by a mapper during initialization.  The below indicates that this may not be the case or that something outside of Avro is causing a lot of Jackson JSON parsing.

Are you using something that is converting the Avro data to Json form?  toString() on most Avro datum objects will do a lot of work with Jackson, for example — but the below are deserializer objects not serializer objects so that is not likely the issue.

On 6/1/11 11:34 AM, "ey-chih chow" <ey...@hotmail.com>> wrote:

We ran jmap on one of our mapper and found the top usage as follows:

num  #instances #bytes Class description
--------------------------------------------------------------------------
1: 24405 291733256 byte[]
2: 6056 40228984 int[]
3: 388799 19966776 char[]
4: 101779 16284640 org.codehaus.jackson.impl.ReaderBasedParser
5: 369623 11827936 java.lang.String
6: 111059 8769424 java.util.HashMap$Entry[]
7: 204083 8163320 org.codehaus.jackson.impl.JsonReadContext
8: 211374 6763968 java.util.HashMap$Entry
9: 102551 5742856 org.codehaus.jackson.util.TextBuffer
10: 105854 5080992 java.nio.HeapByteBuffer
11: 105821 5079408 java.nio.HeapCharBuffer
12: 104578 5019744 java.util.HashMap
13: 102551 4922448 org.codehaus.jackson.io.IOContext
14: 101782 4885536 org.codehaus.jackson.map.DeserializationConfig
15: 101783 4071320 org.codehaus.jackson.sym.CharsToNameCanonicalizer
16: 101779 4071160 org.codehaus.jackson.map.deser.StdDeserializationContext
17: 101779 4071160 java.io.StringReader
18: 101754 4070160 java.util.HashMap$KeyIterator

It looks like Jackson eats up a lot of memory.  Our mapper reads in files of the avro format.  Does avro use Jackson a lot in reading the avro files?  Is there any way to improve this?  Thanks.

Ey-Chih Chow

________________________________
From: scott@richrelevance.com<ma...@richrelevance.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Date: Tue, 31 May 2011 18:26:23 -0700
Subject: Re: avro object reuse

All of those instances are short-lived.   If you are running out of memory, its not likely due to object reuse.  This tends to cause more CPU time in the garbage collector, but not out of memory conditions.  This can be hard to do on a cluster, but grabbing 'jmap –histo' output from a JVM that has a larger-than-expected JVM heap usage can often be used to quickly identify the cause of memory consumption issues.

I'm not sure if AvroUtf8InputFormat can safely re-use its instances of Utf8 or not.


On 5/31/11 5:40 PM, "ey-chih chow" <ey...@hotmail.com>> wrote:

I actually looked into Avro code to find out how Avro does object reuse.  I looked at AvroUtf8InputFormat and got the following question.  Why a new Utf8 object has to be created each time the method next(AvroWrapper<Utf8> key, NullWritable value) is called ?  Will this eat up too much memory when we call next(key, value) many times?  Since Utf8 is mutable, can we just create one Utf8 object for all the calls to next(key, value)?  Will this save memory?  Thanks.

Ey-Chih Chow

________________________________
From: eychih@hotmail.com<ma...@hotmail.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Subject: avro object reuse
Date: Tue, 31 May 2011 10:38:39 -0700

Hi,

We have several mapreduce jobs using avro.  They take too much memory when running on production.  Can anybody suggest some object reuse techniques to cut down memory usage?  Thanks.

Ey-Chih Chow

RE: avro object reuse

Posted by ey-chih chow <ey...@hotmail.com>.

We have many MR jobs running on production, but only one of them shows this kind of behavior.  Is there any specific condition that corruption will occur? 

From: scott@richrelevance.com
To: user@avro.apache.org
Date: Fri, 10 Jun 2011 11:11:55 -0700
Subject: Re: avro object reuse

Corruption can occur in I/O busses and RAM.  Does this tend to fail on the same nodes, or any node randomly?  Since it does not fail consistently, this makes me suspect some sort of corruption even more.
I suggest turning on stack traces for fatal throwables.  This shouldn't hurt production performance since they don't happen regularly and break the task anyway.
Of the heap dumps seen so far, the primary consumption is byte[] and no more than 300MB.  How large are your java heaps?
On 6/10/11 10:53 AM, "ey-chih chow" <ey...@hotmail.com> wrote:

Since this was in production, we did not turn on stack trace.  Also, it was highly unlikely that there was any data corrupted because, if one mapper failed due to out of memory, the system started another one and went through all the data.

From: scott@richrelevance.com
To: user@avro.apache.org
Date: Thu, 9 Jun 2011 17:43:02 -0700
Subject: Re: avro object reuse

If the exception is happening while decoding, it could be due to corrupt data. Avro allocates a List preallocated to the size encoded, and I've seen corrupted data cause attempted allocations of arrays too large for the heap.
On 6/9/11 4:58 PM, "Scott Carey" <sc...@richrelevance.com> wrote:
What is the stack trace on the out of memory exception?

On 6/9/11 4:45 PM, "ey-chih chow" <ey...@hotmail.com> wrote:

We configure more than 100MB for MapReduce to do sorting.  Memory we allocate for doing other things in the mapper actually is larger, but, for this job, we always get out-of-meory exceptions and the job can not complete.  We try to find out if there is a way to avoid this problem.
Ey-Chih Chow 

From: scott@richrelevance.com
To: user@avro.apache.org
Date: Thu, 9 Jun 2011 15:42:10 -0700
Subject: Re: avro object reuse

The most likely candidate for creating many instances of BufferAccessor and ByteArrayByteSource is BinaryData.compare() and BinaryData.hashCode().  Each call will create one of each (hash) or two of each (compare).  These are only 32 bytes per instance and quickly become garbage that is easily cleaned up by the GC.  
The below have only 32 bytes each and 8MB total.On the other hand,  the byte[]'s appear to be about 24K each on average and are using 100MB.  Is this the size of your configured MapReduce sort MB?
On 6/9/11 3:08 PM, "ey-chih chow" <ey...@hotmail.com> wrote:

We did more monitoring.  At one instance, we got the following histogram via Jmap.  The question is why there are so many instances of BinaryDecoder$BufferAccessor and BinaryDecoder$ByteArrayByteSource.  How to avoid this?  Thanks. 

Object Histogram:

num       #instances    #bytes  Class description
--------------------------------------------------------------------------
1:              4199    100241168       byte[]
2:              272948  8734336 org.apache.avro.io.BinaryDecoder$BufferAccessor
3:              272945  8734240 org.apache.avro.io.BinaryDecoder$ByteArrayByteSource
4:              2093    5387976 int[]
5:              23762   2822864 * ConstMethodKlass
6:              23762   1904760 * MethodKlass
7:              39295   1688992 * SymbolKlass
8:              2127    1216976 * ConstantPoolKlass
9:              2127    882760  * InstanceKlassKlass
10:             1847    742936  * ConstantPoolCacheKlass
11:             9602    715608  char[]
12:             1072    299584  * MethodDataKlass
13:             9698    232752  java.lang.String
14:             2317    222432  java.lang.Class
15:             3288    204440  short[]
16:             3167    156664  * System ObjArray
17:             2401    57624   java.util.HashMap$Entry
18:             666     53280   java.lang.reflect.Method
19:             161     52808   * ObjArrayKlassKlass
20:             1808    43392   java.util.Hashtable$Entry

From: eychih@hotmail.com
To: user@avro.apache.org
Subject: RE: avro object reuse
Date: Wed, 1 Jun 2011 15:14:03 -0700

We use a lot of toString() call on the avro Utf8 object.  Will this cause Jackson call?  Thanks.
Ey-Chih 

From: scott@richrelevance.com
To: user@avro.apache.org
Date: Wed, 1 Jun 2011 13:38:39 -0700
Subject: Re: avro object reuse

This is great info.
Jackson should only be used once when the file is opened, so this is confusing from that point of view.  Is something else using Jackson or initializing an Avro JsonDecoder frequently?  There are over 100000 Jackson DeserializationConfig objects.
Another place that parses the schema is in AvroSerialization.java.  Does the Hadoop getDeserializer() API method get called once per job, or per record?  If this is called more than once per map job, it might explain this.
In principle, Jackson is only used by a mapper during initialization.  The below indicates that this may not be the case or that something outside of Avro is causing a lot of Jackson JSON parsing. 
Are you using something that is converting the Avro data to Json form?  toString() on most Avro datum objects will do a lot of work with Jackson, for example — but the below are deserializer objects not serializer objects so that is not likely the issue.
On 6/1/11 11:34 AM, "ey-chih chow" <ey...@hotmail.com> wrote:

We ran jmap on one of our mapper and found the top usage as follows:
num 	  #instances	#bytes	Class description--------------------------------------------------------------------------1:		24405	291733256	byte[]2:		6056	40228984	int[]3:		388799	19966776	char[]4:		101779	16284640	org.codehaus.jackson.impl.ReaderBasedParser5:		369623	11827936	java.lang.String6:		111059	8769424	java.util.HashMap$Entry[]7:		204083	8163320	org.codehaus.jackson.impl.JsonReadContext8:		211374	6763968	java.util.HashMap$Entry9:		102551	5742856	org.codehaus.jackson.util.TextBuffer10:		105854	5080992	java.nio.HeapByteBuffer11:		105821	5079408	java.nio.HeapCharBuffer12:		104578	5019744	java.util.HashMap13:		102551	4922448	org.codehaus.jackson.io.IOContext14:		101782	4885536	org.codehaus.jackson.map.DeserializationConfig15:		101783	4071320	org.codehaus.jackson.sym.CharsToNameCanonicalizer16:		101779	4071160	org.codehaus.jackson.map.deser.StdDeserializationContext17:		101779	4071160	java.io.StringReader18:		101754	4070160	java.util.HashMap$KeyIterator
It looks like Jackson eats up a lot of memory.  Our mapper reads in files of the avro format.  Does avro use Jackson a lot in reading the avro files?  Is there any way to improve this?  Thanks.
Ey-Chih Chow
From: scott@richrelevance.com
To: user@avro.apache.org
Date: Tue, 31 May 2011 18:26:23 -0700
Subject: Re: avro object reuse

All of those instances are short-lived.   If you are running out of memory, its not likely due to object reuse.  This tends to cause more CPU time in the garbage collector, but not out of memory conditions.  This can be hard to do on a cluster, but grabbing 'jmap –histo' output from a JVM that has a larger-than-expected JVM heap usage can often be used to quickly identify the cause of memory consumption issues.
I'm not sure if AvroUtf8InputFormat can safely re-use its instances of Utf8 or not.

On 5/31/11 5:40 PM, "ey-chih chow" <ey...@hotmail.com> wrote:

I actually looked into Avro code to find out how Avro does object reuse.  I looked at AvroUtf8InputFormat and got the following question.  Why a new Utf8 object has to be created each time the method next(AvroWrapper<Utf8> key, NullWritable value) is called ?  Will this eat up too much memory when we call next(key, value) many times?  Since Utf8 is mutable, can we just create one Utf8 object for all the calls to next(key, value)?  Will this save memory?  Thanks.
Ey-Chih Chow 

From: eychih@hotmail.com
To: user@avro.apache.org
Subject: avro object reuse
Date: Tue, 31 May 2011 10:38:39 -0700

Hi, 
We have several mapreduce jobs using avro.  They take too much memory when running on production.  Can anybody suggest some object reuse techniques to cut down memory usage?  Thanks.
Ey-Chih Chow

Re: avro object reuse

Posted by Scott Carey <sc...@richrelevance.com>.

Corruption can occur in I/O busses and RAM.  Does this tend to fail on the same nodes, or any node randomly?  Since it does not fail consistently, this makes me suspect some sort of corruption even more.

I suggest turning on stack traces for fatal throwables.  This shouldn't hurt production performance since they don't happen regularly and break the task anyway.

Of the heap dumps seen so far, the primary consumption is byte[] and no more than 300MB.  How large are your java heaps?

On 6/10/11 10:53 AM, "ey-chih chow" <ey...@hotmail.com>> wrote:

Since this was in production, we did not turn on stack trace.  Also, it was highly unlikely that there was any data corrupted because, if one mapper failed due to out of memory, the system started another one and went through all the data.

________________________________
From: scott@richrelevance.com<ma...@richrelevance.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Date: Thu, 9 Jun 2011 17:43:02 -0700
Subject: Re: avro object reuse

If the exception is happening while decoding, it could be due to corrupt data. Avro allocates a List preallocated to the size encoded, and I've seen corrupted data cause attempted allocations of arrays too large for the heap.

On 6/9/11 4:58 PM, "Scott Carey" <sc...@richrelevance.com>> wrote:

What is the stack trace on the out of memory exception?


On 6/9/11 4:45 PM, "ey-chih chow" <ey...@hotmail.com>> wrote:

We configure more than 100MB for MapReduce to do sorting.  Memory we allocate for doing other things in the mapper actually is larger, but, for this job, we always get out-of-meory exceptions and the job can not complete.  We try to find out if there is a way to avoid this problem.

Ey-Chih Chow

________________________________
From: scott@richrelevance.com<ma...@richrelevance.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Date: Thu, 9 Jun 2011 15:42:10 -0700
Subject: Re: avro object reuse

The most likely candidate for creating many instances of BufferAccessor and ByteArrayByteSource is BinaryData.compare() and BinaryData.hashCode().  Each call will create one of each (hash) or two of each (compare).  These are only 32 bytes per instance and quickly become garbage that is easily cleaned up by the GC.

The below have only 32 bytes each and 8MB total.
On the other hand,  the byte[]'s appear to be about 24K each on average and are using 100MB.  Is this the size of your configured MapReduce sort MB?

On 6/9/11 3:08 PM, "ey-chih chow" <ey...@hotmail.com>> wrote:

We did more monitoring.  At one instance, we got the following histogram via Jmap.  The question is why there are so many instances of BinaryDecoder$BufferAccessor and BinaryDecoder$ByteArrayByteSource.  How to avoid this?  Thanks.

Object Histogram:

num       #instances    #bytes  Class description
--------------------------------------------------------------------------
1:              4199    100241168       byte[]
2:              272948  8734336 org.apache.avro.io.BinaryDecoder$BufferAccessor
3:              272945  8734240 org.apache.avro.io.BinaryDecoder$ByteArrayByteSource
4:              2093    5387976 int[]
5:              23762   2822864 * ConstMethodKlass
6:              23762   1904760 * MethodKlass
7:              39295   1688992 * SymbolKlass
8:              2127    1216976 * ConstantPoolKlass
9:              2127    882760  * InstanceKlassKlass
10:             1847    742936  * ConstantPoolCacheKlass
11:             9602    715608  char[]
12:             1072    299584  * MethodDataKlass
13:             9698    232752  java.lang.String
14:             2317    222432  java.lang.Class
15:             3288    204440  short[]
16:             3167    156664  * System ObjArray
17:             2401    57624   java.util.HashMap$Entry
18:             666     53280   java.lang.reflect.Method
19:             161     52808   * ObjArrayKlassKlass
20:             1808    43392   java.util.Hashtable$Entry


________________________________
From: eychih@hotmail.com<ma...@hotmail.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Subject: RE: avro object reuse
Date: Wed, 1 Jun 2011 15:14:03 -0700

We use a lot of toString() call on the avro Utf8 object.  Will this cause Jackson call?  Thanks.

Ey-Chih

________________________________
From: scott@richrelevance.com<ma...@richrelevance.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Date: Wed, 1 Jun 2011 13:38:39 -0700
Subject: Re: avro object reuse

This is great info.

Jackson should only be used once when the file is opened, so this is confusing from that point of view.
Is something else using Jackson or initializing an Avro JsonDecoder frequently?  There are over 100000 Jackson DeserializationConfig objects.

Another place that parses the schema is in AvroSerialization.java.  Does the Hadoop getDeserializer() API method get called once per job, or per record?  If this is called more than once per map job, it might explain this.

In principle, Jackson is only used by a mapper during initialization.  The below indicates that this may not be the case or that something outside of Avro is causing a lot of Jackson JSON parsing.

Are you using something that is converting the Avro data to Json form?  toString() on most Avro datum objects will do a lot of work with Jackson, for example — but the below are deserializer objects not serializer objects so that is not likely the issue.

On 6/1/11 11:34 AM, "ey-chih chow" <ey...@hotmail.com>> wrote:

We ran jmap on one of our mapper and found the top usage as follows:

num  #instances #bytes Class description
--------------------------------------------------------------------------
1: 24405 291733256 byte[]
2: 6056 40228984 int[]
3: 388799 19966776 char[]
4: 101779 16284640 org.codehaus.jackson.impl.ReaderBasedParser
5: 369623 11827936 java.lang.String
6: 111059 8769424 java.util.HashMap$Entry[]
7: 204083 8163320 org.codehaus.jackson.impl.JsonReadContext
8: 211374 6763968 java.util.HashMap$Entry
9: 102551 5742856 org.codehaus.jackson.util.TextBuffer
10: 105854 5080992 java.nio.HeapByteBuffer
11: 105821 5079408 java.nio.HeapCharBuffer
12: 104578 5019744 java.util.HashMap
13: 102551 4922448 org.codehaus.jackson.io.IOContext
14: 101782 4885536 org.codehaus.jackson.map.DeserializationConfig
15: 101783 4071320 org.codehaus.jackson.sym.CharsToNameCanonicalizer
16: 101779 4071160 org.codehaus.jackson.map.deser.StdDeserializationContext
17: 101779 4071160 java.io.StringReader
18: 101754 4070160 java.util.HashMap$KeyIterator

It looks like Jackson eats up a lot of memory.  Our mapper reads in files of the avro format.  Does avro use Jackson a lot in reading the avro files?  Is there any way to improve this?  Thanks.

Ey-Chih Chow

________________________________
From: scott@richrelevance.com<ma...@richrelevance.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Date: Tue, 31 May 2011 18:26:23 -0700
Subject: Re: avro object reuse

All of those instances are short-lived.   If you are running out of memory, its not likely due to object reuse.  This tends to cause more CPU time in the garbage collector, but not out of memory conditions.  This can be hard to do on a cluster, but grabbing 'jmap –histo' output from a JVM that has a larger-than-expected JVM heap usage can often be used to quickly identify the cause of memory consumption issues.

I'm not sure if AvroUtf8InputFormat can safely re-use its instances of Utf8 or not.


On 5/31/11 5:40 PM, "ey-chih chow" <ey...@hotmail.com>> wrote:

I actually looked into Avro code to find out how Avro does object reuse.  I looked at AvroUtf8InputFormat and got the following question.  Why a new Utf8 object has to be created each time the method next(AvroWrapper<Utf8> key, NullWritable value) is called ?  Will this eat up too much memory when we call next(key, value) many times?  Since Utf8 is mutable, can we just create one Utf8 object for all the calls to next(key, value)?  Will this save memory?  Thanks.

Ey-Chih Chow

________________________________
From: eychih@hotmail.com<ma...@hotmail.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Subject: avro object reuse
Date: Tue, 31 May 2011 10:38:39 -0700

Hi,

We have several mapreduce jobs using avro.  They take too much memory when running on production.  Can anybody suggest some object reuse techniques to cut down memory usage?  Thanks.

Ey-Chih Chow

RE: avro object reuse

Posted by ey-chih chow <ey...@hotmail.com>.

Since this was in production, we did not turn on stack trace.  Also, it was highly unlikely that there was any data corrupted because, if one mapper failed due to out of memory, the system started another one and went through all the data.

From: scott@richrelevance.com
To: user@avro.apache.org
Date: Thu, 9 Jun 2011 17:43:02 -0700
Subject: Re: avro object reuse

If the exception is happening while decoding, it could be due to corrupt data. Avro allocates a List preallocated to the size encoded, and I've seen corrupted data cause attempted allocations of arrays too large for the heap.
On 6/9/11 4:58 PM, "Scott Carey" <sc...@richrelevance.com> wrote:
What is the stack trace on the out of memory exception?

On 6/9/11 4:45 PM, "ey-chih chow" <ey...@hotmail.com> wrote:

We configure more than 100MB for MapReduce to do sorting.  Memory we allocate for doing other things in the mapper actually is larger, but, for this job, we always get out-of-meory exceptions and the job can not complete.  We try to find out if there is a way to avoid this problem.
Ey-Chih Chow 

From: scott@richrelevance.com
To: user@avro.apache.org
Date: Thu, 9 Jun 2011 15:42:10 -0700
Subject: Re: avro object reuse

The most likely candidate for creating many instances of BufferAccessor and ByteArrayByteSource is BinaryData.compare() and BinaryData.hashCode().  Each call will create one of each (hash) or two of each (compare).  These are only 32 bytes per instance and quickly become garbage that is easily cleaned up by the GC.  
The below have only 32 bytes each and 8MB total.On the other hand,  the byte[]'s appear to be about 24K each on average and are using 100MB.  Is this the size of your configured MapReduce sort MB?
On 6/9/11 3:08 PM, "ey-chih chow" <ey...@hotmail.com> wrote:

We did more monitoring.  At one instance, we got the following histogram via Jmap.  The question is why there are so many instances of BinaryDecoder$BufferAccessor and BinaryDecoder$ByteArrayByteSource.  How to avoid this?  Thanks. 

Object Histogram:

num       #instances    #bytes  Class description
--------------------------------------------------------------------------
1:              4199    100241168       byte[]
2:              272948  8734336 org.apache.avro.io.BinaryDecoder$BufferAccessor
3:              272945  8734240 org.apache.avro.io.BinaryDecoder$ByteArrayByteSource
4:              2093    5387976 int[]
5:              23762   2822864 * ConstMethodKlass
6:              23762   1904760 * MethodKlass
7:              39295   1688992 * SymbolKlass
8:              2127    1216976 * ConstantPoolKlass
9:              2127    882760  * InstanceKlassKlass
10:             1847    742936  * ConstantPoolCacheKlass
11:             9602    715608  char[]
12:             1072    299584  * MethodDataKlass
13:             9698    232752  java.lang.String
14:             2317    222432  java.lang.Class
15:             3288    204440  short[]
16:             3167    156664  * System ObjArray
17:             2401    57624   java.util.HashMap$Entry
18:             666     53280   java.lang.reflect.Method
19:             161     52808   * ObjArrayKlassKlass
20:             1808    43392   java.util.Hashtable$Entry

From: eychih@hotmail.com
To: user@avro.apache.org
Subject: RE: avro object reuse
Date: Wed, 1 Jun 2011 15:14:03 -0700

We use a lot of toString() call on the avro Utf8 object.  Will this cause Jackson call?  Thanks.
Ey-Chih 

From: scott@richrelevance.com
To: user@avro.apache.org
Date: Wed, 1 Jun 2011 13:38:39 -0700
Subject: Re: avro object reuse

This is great info.
Jackson should only be used once when the file is opened, so this is confusing from that point of view.  Is something else using Jackson or initializing an Avro JsonDecoder frequently?  There are over 100000 Jackson DeserializationConfig objects.
Another place that parses the schema is in AvroSerialization.java.  Does the Hadoop getDeserializer() API method get called once per job, or per record?  If this is called more than once per map job, it might explain this.
In principle, Jackson is only used by a mapper during initialization.  The below indicates that this may not be the case or that something outside of Avro is causing a lot of Jackson JSON parsing. 
Are you using something that is converting the Avro data to Json form?  toString() on most Avro datum objects will do a lot of work with Jackson, for example — but the below are deserializer objects not serializer objects so that is not likely the issue.
On 6/1/11 11:34 AM, "ey-chih chow" <ey...@hotmail.com> wrote:

We ran jmap on one of our mapper and found the top usage as follows:
num 	  #instances	#bytes	Class description--------------------------------------------------------------------------1:		24405	291733256	byte[]2:		6056	40228984	int[]3:		388799	19966776	char[]4:		101779	16284640	org.codehaus.jackson.impl.ReaderBasedParser5:		369623	11827936	java.lang.String6:		111059	8769424	java.util.HashMap$Entry[]7:		204083	8163320	org.codehaus.jackson.impl.JsonReadContext8:		211374	6763968	java.util.HashMap$Entry9:		102551	5742856	org.codehaus.jackson.util.TextBuffer10:		105854	5080992	java.nio.HeapByteBuffer11:		105821	5079408	java.nio.HeapCharBuffer12:		104578	5019744	java.util.HashMap13:		102551	4922448	org.codehaus.jackson.io.IOContext14:		101782	4885536	org.codehaus.jackson.map.DeserializationConfig15:		101783	4071320	org.codehaus.jackson.sym.CharsToNameCanonicalizer16:		101779	4071160	org.codehaus.jackson.map.deser.StdDeserializationContext17:		101779	4071160	java.io.StringReader18:		101754	4070160	java.util.HashMap$KeyIterator
It looks like Jackson eats up a lot of memory.  Our mapper reads in files of the avro format.  Does avro use Jackson a lot in reading the avro files?  Is there any way to improve this?  Thanks.
Ey-Chih Chow
From: scott@richrelevance.com
To: user@avro.apache.org
Date: Tue, 31 May 2011 18:26:23 -0700
Subject: Re: avro object reuse

All of those instances are short-lived.   If you are running out of memory, its not likely due to object reuse.  This tends to cause more CPU time in the garbage collector, but not out of memory conditions.  This can be hard to do on a cluster, but grabbing 'jmap –histo' output from a JVM that has a larger-than-expected JVM heap usage can often be used to quickly identify the cause of memory consumption issues.
I'm not sure if AvroUtf8InputFormat can safely re-use its instances of Utf8 or not.

On 5/31/11 5:40 PM, "ey-chih chow" <ey...@hotmail.com> wrote:

I actually looked into Avro code to find out how Avro does object reuse.  I looked at AvroUtf8InputFormat and got the following question.  Why a new Utf8 object has to be created each time the method next(AvroWrapper<Utf8> key, NullWritable value) is called ?  Will this eat up too much memory when we call next(key, value) many times?  Since Utf8 is mutable, can we just create one Utf8 object for all the calls to next(key, value)?  Will this save memory?  Thanks.
Ey-Chih Chow 

From: eychih@hotmail.com
To: user@avro.apache.org
Subject: avro object reuse
Date: Tue, 31 May 2011 10:38:39 -0700

Hi, 
We have several mapreduce jobs using avro.  They take too much memory when running on production.  Can anybody suggest some object reuse techniques to cut down memory usage?  Thanks.
Ey-Chih Chow

Re: avro object reuse

Posted by Scott Carey <sc...@richrelevance.com>.

If the exception is happening while decoding, it could be due to corrupt data. Avro allocates a List preallocated to the size encoded, and I've seen corrupted data cause attempted allocations of arrays too large for the heap.

On 6/9/11 4:58 PM, "Scott Carey" <sc...@richrelevance.com>> wrote:

What is the stack trace on the out of memory exception?


On 6/9/11 4:45 PM, "ey-chih chow" <ey...@hotmail.com>> wrote:

We configure more than 100MB for MapReduce to do sorting.  Memory we allocate for doing other things in the mapper actually is larger, but, for this job, we always get out-of-meory exceptions and the job can not complete.  We try to find out if there is a way to avoid this problem.

Ey-Chih Chow

________________________________
From: scott@richrelevance.com<ma...@richrelevance.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Date: Thu, 9 Jun 2011 15:42:10 -0700
Subject: Re: avro object reuse

The most likely candidate for creating many instances of BufferAccessor and ByteArrayByteSource is BinaryData.compare() and BinaryData.hashCode().  Each call will create one of each (hash) or two of each (compare).  These are only 32 bytes per instance and quickly become garbage that is easily cleaned up by the GC.

The below have only 32 bytes each and 8MB total.
On the other hand,  the byte[]'s appear to be about 24K each on average and are using 100MB.  Is this the size of your configured MapReduce sort MB?

On 6/9/11 3:08 PM, "ey-chih chow" <ey...@hotmail.com>> wrote:

We did more monitoring.  At one instance, we got the following histogram via Jmap.  The question is why there are so many instances of BinaryDecoder$BufferAccessor and BinaryDecoder$ByteArrayByteSource.  How to avoid this?  Thanks.

Object Histogram:

num       #instances    #bytes  Class description
--------------------------------------------------------------------------
1:              4199    100241168       byte[]
2:              272948  8734336 org.apache.avro.io.BinaryDecoder$BufferAccessor
3:              272945  8734240 org.apache.avro.io.BinaryDecoder$ByteArrayByteSource
4:              2093    5387976 int[]
5:              23762   2822864 * ConstMethodKlass
6:              23762   1904760 * MethodKlass
7:              39295   1688992 * SymbolKlass
8:              2127    1216976 * ConstantPoolKlass
9:              2127    882760  * InstanceKlassKlass
10:             1847    742936  * ConstantPoolCacheKlass
11:             9602    715608  char[]
12:             1072    299584  * MethodDataKlass
13:             9698    232752  java.lang.String
14:             2317    222432  java.lang.Class
15:             3288    204440  short[]
16:             3167    156664  * System ObjArray
17:             2401    57624   java.util.HashMap$Entry
18:             666     53280   java.lang.reflect.Method
19:             161     52808   * ObjArrayKlassKlass
20:             1808    43392   java.util.Hashtable$Entry


________________________________
From: eychih@hotmail.com<ma...@hotmail.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Subject: RE: avro object reuse
Date: Wed, 1 Jun 2011 15:14:03 -0700

We use a lot of toString() call on the avro Utf8 object.  Will this cause Jackson call?  Thanks.

Ey-Chih

________________________________
From: scott@richrelevance.com<ma...@richrelevance.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Date: Wed, 1 Jun 2011 13:38:39 -0700
Subject: Re: avro object reuse

This is great info.

Jackson should only be used once when the file is opened, so this is confusing from that point of view.
Is something else using Jackson or initializing an Avro JsonDecoder frequently?  There are over 100000 Jackson DeserializationConfig objects.

Another place that parses the schema is in AvroSerialization.java.  Does the Hadoop getDeserializer() API method get called once per job, or per record?  If this is called more than once per map job, it might explain this.

In principle, Jackson is only used by a mapper during initialization.  The below indicates that this may not be the case or that something outside of Avro is causing a lot of Jackson JSON parsing.

Are you using something that is converting the Avro data to Json form?  toString() on most Avro datum objects will do a lot of work with Jackson, for example — but the below are deserializer objects not serializer objects so that is not likely the issue.

On 6/1/11 11:34 AM, "ey-chih chow" <ey...@hotmail.com>> wrote:

We ran jmap on one of our mapper and found the top usage as follows:

num  #instances #bytes Class description
--------------------------------------------------------------------------
1: 24405 291733256 byte[]
2: 6056 40228984 int[]
3: 388799 19966776 char[]
4: 101779 16284640 org.codehaus.jackson.impl.ReaderBasedParser
5: 369623 11827936 java.lang.String
6: 111059 8769424 java.util.HashMap$Entry[]
7: 204083 8163320 org.codehaus.jackson.impl.JsonReadContext
8: 211374 6763968 java.util.HashMap$Entry
9: 102551 5742856 org.codehaus.jackson.util.TextBuffer
10: 105854 5080992 java.nio.HeapByteBuffer
11: 105821 5079408 java.nio.HeapCharBuffer
12: 104578 5019744 java.util.HashMap
13: 102551 4922448 org.codehaus.jackson.io.IOContext
14: 101782 4885536 org.codehaus.jackson.map.DeserializationConfig
15: 101783 4071320 org.codehaus.jackson.sym.CharsToNameCanonicalizer
16: 101779 4071160 org.codehaus.jackson.map.deser.StdDeserializationContext
17: 101779 4071160 java.io.StringReader
18: 101754 4070160 java.util.HashMap$KeyIterator

It looks like Jackson eats up a lot of memory.  Our mapper reads in files of the avro format.  Does avro use Jackson a lot in reading the avro files?  Is there any way to improve this?  Thanks.

Ey-Chih Chow

________________________________
From: scott@richrelevance.com<ma...@richrelevance.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Date: Tue, 31 May 2011 18:26:23 -0700
Subject: Re: avro object reuse

All of those instances are short-lived.   If you are running out of memory, its not likely due to object reuse.  This tends to cause more CPU time in the garbage collector, but not out of memory conditions.  This can be hard to do on a cluster, but grabbing 'jmap –histo' output from a JVM that has a larger-than-expected JVM heap usage can often be used to quickly identify the cause of memory consumption issues.

I'm not sure if AvroUtf8InputFormat can safely re-use its instances of Utf8 or not.


On 5/31/11 5:40 PM, "ey-chih chow" <ey...@hotmail.com>> wrote:

I actually looked into Avro code to find out how Avro does object reuse.  I looked at AvroUtf8InputFormat and got the following question.  Why a new Utf8 object has to be created each time the method next(AvroWrapper<Utf8> key, NullWritable value) is called ?  Will this eat up too much memory when we call next(key, value) many times?  Since Utf8 is mutable, can we just create one Utf8 object for all the calls to next(key, value)?  Will this save memory?  Thanks.

Ey-Chih Chow

________________________________
From: eychih@hotmail.com<ma...@hotmail.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Subject: avro object reuse
Date: Tue, 31 May 2011 10:38:39 -0700

Hi,

We have several mapreduce jobs using avro.  They take too much memory when running on production.  Can anybody suggest some object reuse techniques to cut down memory usage?  Thanks.

Ey-Chih Chow

Re: avro object reuse

Posted by Scott Carey <sc...@richrelevance.com>.

What is the stack trace on the out of memory exception?


On 6/9/11 4:45 PM, "ey-chih chow" <ey...@hotmail.com>> wrote:

We configure more than 100MB for MapReduce to do sorting.  Memory we allocate for doing other things in the mapper actually is larger, but, for this job, we always get out-of-meory exceptions and the job can not complete.  We try to find out if there is a way to avoid this problem.

Ey-Chih Chow

________________________________
From: scott@richrelevance.com<ma...@richrelevance.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Date: Thu, 9 Jun 2011 15:42:10 -0700
Subject: Re: avro object reuse

The most likely candidate for creating many instances of BufferAccessor and ByteArrayByteSource is BinaryData.compare() and BinaryData.hashCode().  Each call will create one of each (hash) or two of each (compare).  These are only 32 bytes per instance and quickly become garbage that is easily cleaned up by the GC.

The below have only 32 bytes each and 8MB total.
On the other hand,  the byte[]'s appear to be about 24K each on average and are using 100MB.  Is this the size of your configured MapReduce sort MB?

On 6/9/11 3:08 PM, "ey-chih chow" <ey...@hotmail.com>> wrote:

We did more monitoring.  At one instance, we got the following histogram via Jmap.  The question is why there are so many instances of BinaryDecoder$BufferAccessor and BinaryDecoder$ByteArrayByteSource.  How to avoid this?  Thanks.

Object Histogram:

num       #instances    #bytes  Class description
--------------------------------------------------------------------------
1:              4199    100241168       byte[]
2:              272948  8734336 org.apache.avro.io.BinaryDecoder$BufferAccessor
3:              272945  8734240 org.apache.avro.io.BinaryDecoder$ByteArrayByteSource
4:              2093    5387976 int[]
5:              23762   2822864 * ConstMethodKlass
6:              23762   1904760 * MethodKlass
7:              39295   1688992 * SymbolKlass
8:              2127    1216976 * ConstantPoolKlass
9:              2127    882760  * InstanceKlassKlass
10:             1847    742936  * ConstantPoolCacheKlass
11:             9602    715608  char[]
12:             1072    299584  * MethodDataKlass
13:             9698    232752  java.lang.String
14:             2317    222432  java.lang.Class
15:             3288    204440  short[]
16:             3167    156664  * System ObjArray
17:             2401    57624   java.util.HashMap$Entry
18:             666     53280   java.lang.reflect.Method
19:             161     52808   * ObjArrayKlassKlass
20:             1808    43392   java.util.Hashtable$Entry


________________________________
From: eychih@hotmail.com<ma...@hotmail.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Subject: RE: avro object reuse
Date: Wed, 1 Jun 2011 15:14:03 -0700

We use a lot of toString() call on the avro Utf8 object.  Will this cause Jackson call?  Thanks.

Ey-Chih

________________________________
From: scott@richrelevance.com<ma...@richrelevance.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Date: Wed, 1 Jun 2011 13:38:39 -0700
Subject: Re: avro object reuse

This is great info.

Jackson should only be used once when the file is opened, so this is confusing from that point of view.
Is something else using Jackson or initializing an Avro JsonDecoder frequently?  There are over 100000 Jackson DeserializationConfig objects.

Another place that parses the schema is in AvroSerialization.java.  Does the Hadoop getDeserializer() API method get called once per job, or per record?  If this is called more than once per map job, it might explain this.

In principle, Jackson is only used by a mapper during initialization.  The below indicates that this may not be the case or that something outside of Avro is causing a lot of Jackson JSON parsing.

Are you using something that is converting the Avro data to Json form?  toString() on most Avro datum objects will do a lot of work with Jackson, for example — but the below are deserializer objects not serializer objects so that is not likely the issue.

On 6/1/11 11:34 AM, "ey-chih chow" <ey...@hotmail.com>> wrote:

We ran jmap on one of our mapper and found the top usage as follows:

num  #instances #bytes Class description
--------------------------------------------------------------------------
1: 24405 291733256 byte[]
2: 6056 40228984 int[]
3: 388799 19966776 char[]
4: 101779 16284640 org.codehaus.jackson.impl.ReaderBasedParser
5: 369623 11827936 java.lang.String
6: 111059 8769424 java.util.HashMap$Entry[]
7: 204083 8163320 org.codehaus.jackson.impl.JsonReadContext
8: 211374 6763968 java.util.HashMap$Entry
9: 102551 5742856 org.codehaus.jackson.util.TextBuffer
10: 105854 5080992 java.nio.HeapByteBuffer
11: 105821 5079408 java.nio.HeapCharBuffer
12: 104578 5019744 java.util.HashMap
13: 102551 4922448 org.codehaus.jackson.io.IOContext
14: 101782 4885536 org.codehaus.jackson.map.DeserializationConfig
15: 101783 4071320 org.codehaus.jackson.sym.CharsToNameCanonicalizer
16: 101779 4071160 org.codehaus.jackson.map.deser.StdDeserializationContext
17: 101779 4071160 java.io.StringReader
18: 101754 4070160 java.util.HashMap$KeyIterator

It looks like Jackson eats up a lot of memory.  Our mapper reads in files of the avro format.  Does avro use Jackson a lot in reading the avro files?  Is there any way to improve this?  Thanks.

Ey-Chih Chow

________________________________
From: scott@richrelevance.com<ma...@richrelevance.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Date: Tue, 31 May 2011 18:26:23 -0700
Subject: Re: avro object reuse

All of those instances are short-lived.   If you are running out of memory, its not likely due to object reuse.  This tends to cause more CPU time in the garbage collector, but not out of memory conditions.  This can be hard to do on a cluster, but grabbing 'jmap –histo' output from a JVM that has a larger-than-expected JVM heap usage can often be used to quickly identify the cause of memory consumption issues.

I'm not sure if AvroUtf8InputFormat can safely re-use its instances of Utf8 or not.


On 5/31/11 5:40 PM, "ey-chih chow" <ey...@hotmail.com>> wrote:

I actually looked into Avro code to find out how Avro does object reuse.  I looked at AvroUtf8InputFormat and got the following question.  Why a new Utf8 object has to be created each time the method next(AvroWrapper<Utf8> key, NullWritable value) is called ?  Will this eat up too much memory when we call next(key, value) many times?  Since Utf8 is mutable, can we just create one Utf8 object for all the calls to next(key, value)?  Will this save memory?  Thanks.

Ey-Chih Chow

________________________________
From: eychih@hotmail.com<ma...@hotmail.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Subject: avro object reuse
Date: Tue, 31 May 2011 10:38:39 -0700

Hi,

We have several mapreduce jobs using avro.  They take too much memory when running on production.  Can anybody suggest some object reuse techniques to cut down memory usage?  Thanks.

Ey-Chih Chow

RE: avro object reuse

Posted by ey-chih chow <ey...@hotmail.com>.

We configure more than 100MB for MapReduce to do sorting.  Memory we allocate for doing other things in the mapper actually is larger, but, for this job, we always get out-of-meory exceptions and the job can not complete.  We try to find out if there is a way to avoid this problem.
Ey-Chih Chow 

From: scott@richrelevance.com
To: user@avro.apache.org
Date: Thu, 9 Jun 2011 15:42:10 -0700
Subject: Re: avro object reuse

The most likely candidate for creating many instances of BufferAccessor and ByteArrayByteSource is BinaryData.compare() and BinaryData.hashCode().  Each call will create one of each (hash) or two of each (compare).  These are only 32 bytes per instance and quickly become garbage that is easily cleaned up by the GC.  
The below have only 32 bytes each and 8MB total.On the other hand,  the byte[]'s appear to be about 24K each on average and are using 100MB.  Is this the size of your configured MapReduce sort MB?
On 6/9/11 3:08 PM, "ey-chih chow" <ey...@hotmail.com> wrote:

We did more monitoring.  At one instance, we got the following histogram via Jmap.  The question is why there are so many instances of BinaryDecoder$BufferAccessor and BinaryDecoder$ByteArrayByteSource.  How to avoid this?  Thanks. 

Object Histogram:

num       #instances    #bytes  Class description
--------------------------------------------------------------------------
1:              4199    100241168       byte[]
2:              272948  8734336 org.apache.avro.io.BinaryDecoder$BufferAccessor
3:              272945  8734240 org.apache.avro.io.BinaryDecoder$ByteArrayByteSource
4:              2093    5387976 int[]
5:              23762   2822864 * ConstMethodKlass
6:              23762   1904760 * MethodKlass
7:              39295   1688992 * SymbolKlass
8:              2127    1216976 * ConstantPoolKlass
9:              2127    882760  * InstanceKlassKlass
10:             1847    742936  * ConstantPoolCacheKlass
11:             9602    715608  char[]
12:             1072    299584  * MethodDataKlass
13:             9698    232752  java.lang.String
14:             2317    222432  java.lang.Class
15:             3288    204440  short[]
16:             3167    156664  * System ObjArray
17:             2401    57624   java.util.HashMap$Entry
18:             666     53280   java.lang.reflect.Method
19:             161     52808   * ObjArrayKlassKlass
20:             1808    43392   java.util.Hashtable$Entry

From: eychih@hotmail.com
To: user@avro.apache.org
Subject: RE: avro object reuse
Date: Wed, 1 Jun 2011 15:14:03 -0700

We use a lot of toString() call on the avro Utf8 object.  Will this cause Jackson call?  Thanks.
Ey-Chih 

From: scott@richrelevance.com
To: user@avro.apache.org
Date: Wed, 1 Jun 2011 13:38:39 -0700
Subject: Re: avro object reuse

This is great info.
Jackson should only be used once when the file is opened, so this is confusing from that point of view.  Is something else using Jackson or initializing an Avro JsonDecoder frequently?  There are over 100000 Jackson DeserializationConfig objects.
Another place that parses the schema is in AvroSerialization.java.  Does the Hadoop getDeserializer() API method get called once per job, or per record?  If this is called more than once per map job, it might explain this.
In principle, Jackson is only used by a mapper during initialization.  The below indicates that this may not be the case or that something outside of Avro is causing a lot of Jackson JSON parsing. 
Are you using something that is converting the Avro data to Json form?  toString() on most Avro datum objects will do a lot of work with Jackson, for example — but the below are deserializer objects not serializer objects so that is not likely the issue.
On 6/1/11 11:34 AM, "ey-chih chow" <ey...@hotmail.com> wrote:

We ran jmap on one of our mapper and found the top usage as follows:
num 	  #instances	#bytes	Class description--------------------------------------------------------------------------1:		24405	291733256	byte[]2:		6056	40228984	int[]3:		388799	19966776	char[]4:		101779	16284640	org.codehaus.jackson.impl.ReaderBasedParser5:		369623	11827936	java.lang.String6:		111059	8769424	java.util.HashMap$Entry[]7:		204083	8163320	org.codehaus.jackson.impl.JsonReadContext8:		211374	6763968	java.util.HashMap$Entry9:		102551	5742856	org.codehaus.jackson.util.TextBuffer10:		105854	5080992	java.nio.HeapByteBuffer11:		105821	5079408	java.nio.HeapCharBuffer12:		104578	5019744	java.util.HashMap13:		102551	4922448	org.codehaus.jackson.io.IOContext14:		101782	4885536	org.codehaus.jackson.map.DeserializationConfig15:		101783	4071320	org.codehaus.jackson.sym.CharsToNameCanonicalizer16:		101779	4071160	org.codehaus.jackson.map.deser.StdDeserializationContext17:		101779	4071160	java.io.StringReader18:		101754	4070160	java.util.HashMap$KeyIterator
It looks like Jackson eats up a lot of memory.  Our mapper reads in files of the avro format.  Does avro use Jackson a lot in reading the avro files?  Is there any way to improve this?  Thanks.
Ey-Chih Chow
From: scott@richrelevance.com
To: user@avro.apache.org
Date: Tue, 31 May 2011 18:26:23 -0700
Subject: Re: avro object reuse

All of those instances are short-lived.   If you are running out of memory, its not likely due to object reuse.  This tends to cause more CPU time in the garbage collector, but not out of memory conditions.  This can be hard to do on a cluster, but grabbing 'jmap –histo' output from a JVM that has a larger-than-expected JVM heap usage can often be used to quickly identify the cause of memory consumption issues.
I'm not sure if AvroUtf8InputFormat can safely re-use its instances of Utf8 or not.

On 5/31/11 5:40 PM, "ey-chih chow" <ey...@hotmail.com> wrote:

I actually looked into Avro code to find out how Avro does object reuse.  I looked at AvroUtf8InputFormat and got the following question.  Why a new Utf8 object has to be created each time the method next(AvroWrapper<Utf8> key, NullWritable value) is called ?  Will this eat up too much memory when we call next(key, value) many times?  Since Utf8 is mutable, can we just create one Utf8 object for all the calls to next(key, value)?  Will this save memory?  Thanks.
Ey-Chih Chow 

From: eychih@hotmail.com
To: user@avro.apache.org
Subject: avro object reuse
Date: Tue, 31 May 2011 10:38:39 -0700

Hi, 
We have several mapreduce jobs using avro.  They take too much memory when running on production.  Can anybody suggest some object reuse techniques to cut down memory usage?  Thanks.
Ey-Chih Chow

Re: avro object reuse

Posted by Scott Carey <sc...@richrelevance.com>.

If you do just 'jmap –histo' it shows you all of the objects on the heap.  Many of these objects may be garbage and unreferenced.  This is quick, and does not block the app or force a GC.

If you do 'jmap –histo:live' it will GC and only show the objects that are 'live' (currently referenced).

These are different because a GC ran and removed all the BinaryData inner class temporary objects.

On 6/9/11 3:26 PM, "ey-chih chow" <ey...@hotmail.com>> wrote:

In addition, we ran the same MR job once again and got the following histogram.  Whey this is different from the previous one?  Thanks.


Ey-Chih Chow

Object Histogram:

num       #instances    #bytes  Class description
--------------------------------------------------------------------------
1:              4327    100242096       byte[]
2:              2050    5381496 int[]
3:              23762   2822864 * ConstMethodKlass
4:              23762   1904760 * MethodKlass
5:              39295   1688992 * SymbolKlass
6:              2127    1216976 * ConstantPoolKlass
7:              2127    882760  * InstanceKlassKlass
8:              11298   773008  char[]
9:              1847    742936  * ConstantPoolCacheKlass
10:             1064    297448  * MethodDataKlass
11:             11387   273288  java.lang.String
12:             2317    222432  java.lang.Class
13:             3288    204440  short[]
14:             3167    156664  * System ObjArray
15:             1360    86720   java.util.HashMap$Entry[]
16:             535     85600   org.codehaus.jackson.impl.ReaderBasedParser
17:             3498    83952   java.util.HashMap$Entry
18:             666     53280   java.lang.reflect.Method
19:             161     52808   * ObjArrayKlassKlass
20:             1267    44704   java.lang.Object[]
21:             1808    43392   java.util.Hashtable$Entry
22:             1070    42800   org.codehaus.jackson.impl.JsonReadContext
23:             777     31080   java.util.HashMap
24:             535     29960   org.codehaus.jackson.util.TextBuffer
25:             567     27216   java.nio.HeapByteBuffer
26:             553     26544   org.apache.avro.Schema$Props
27:             549     26352   java.nio.HeapCharBuffer
28:             538     25824   org.codehaus.jackson.map.DeserializationConfig
29:             535     25680   org.codehaus.jackson.io.IOContext
30:             1554    24864   org.codehaus.jackson.sym.CharsToNameCanonicalizer$Bucket
31:             539     21560   org.codehaus.jackson.sym.CharsToNameCanonicalizer


________________________________
From: eychih@hotmail.com<ma...@hotmail.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Subject: RE: avro object reuse
Date: Thu, 9 Jun 2011 15:16:29 -0700

I forgot to mention that the histogram in my previous message was extracted from a mapper of one of our MR job.

Ey-Chih Chow

________________________________
From: eychih@hotmail.com<ma...@hotmail.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Subject: RE: avro object reuse
Date: Thu, 9 Jun 2011 15:08:02 -0700

We did more monitoring.  At one instance, we got the following histogram via Jmap.  The question is why there are so many instances of BinaryDecoder$BufferAccessor and BinaryDecoder$ByteArrayByteSource.  How to avoid this?  Thanks.

Object Histogram:

num       #instances    #bytes  Class description
--------------------------------------------------------------------------
1:              4199    100241168       byte[]
2:              272948  8734336 org.apache.avro.io.BinaryDecoder$BufferAccessor
3:              272945  8734240 org.apache.avro.io.BinaryDecoder$ByteArrayByteSource
4:              2093    5387976 int[]
5:              23762   2822864 * ConstMethodKlass
6:              23762   1904760 * MethodKlass
7:              39295   1688992 * SymbolKlass
8:              2127    1216976 * ConstantPoolKlass
9:              2127    882760  * InstanceKlassKlass
10:             1847    742936  * ConstantPoolCacheKlass
11:             9602    715608  char[]
12:             1072    299584  * MethodDataKlass
13:             9698    232752  java.lang.String
14:             2317    222432  java.lang.Class
15:             3288    204440  short[]
16:             3167    156664  * System ObjArray
17:             2401    57624   java.util.HashMap$Entry
18:             666     53280   java.lang.reflect.Method
19:             161     52808   * ObjArrayKlassKlass
20:             1808    43392   java.util.Hashtable$Entry


________________________________
From: eychih@hotmail.com<ma...@hotmail.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Subject: RE: avro object reuse
Date: Wed, 1 Jun 2011 15:14:03 -0700

We use a lot of toString() call on the avro Utf8 object.  Will this cause Jackson call?  Thanks.

Ey-Chih

________________________________
From: scott@richrelevance.com<ma...@richrelevance.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Date: Wed, 1 Jun 2011 13:38:39 -0700
Subject: Re: avro object reuse

This is great info.

Jackson should only be used once when the file is opened, so this is confusing from that point of view.
Is something else using Jackson or initializing an Avro JsonDecoder frequently?  There are over 100000 Jackson DeserializationConfig objects.

Another place that parses the schema is in AvroSerialization.java.  Does the Hadoop getDeserializer() API method get called once per job, or per record?  If this is called more than once per map job, it might explain this.

In principle, Jackson is only used by a mapper during initialization.  The below indicates that this may not be the case or that something outside of Avro is causing a lot of Jackson JSON parsing.

Are you using something that is converting the Avro data to Json form?  toString() on most Avro datum objects will do a lot of work with Jackson, for example — but the below are deserializer objects not serializer objects so that is not likely the issue.

On 6/1/11 11:34 AM, "ey-chih chow" <ey...@hotmail.com>> wrote:

We ran jmap on one of our mapper and found the top usage as follows:

num  #instances #bytes Class description
--------------------------------------------------------------------------
1: 24405 291733256 byte[]
2: 6056 40228984 int[]
3: 388799 19966776 char[]
4: 101779 16284640 org.codehaus.jackson.impl.ReaderBasedParser
5: 369623 11827936 java.lang.String
6: 111059 8769424 java.util.HashMap$Entry[]
7: 204083 8163320 org.codehaus.jackson.impl.JsonReadContext
8: 211374 6763968 java.util.HashMap$Entry
9: 102551 5742856 org.codehaus.jackson.util.TextBuffer
10: 105854 5080992 java.nio.HeapByteBuffer
11: 105821 5079408 java.nio.HeapCharBuffer
12: 104578 5019744 java.util.HashMap
13: 102551 4922448 org.codehaus.jackson.io.IOContext
14: 101782 4885536 org.codehaus.jackson.map.DeserializationConfig
15: 101783 4071320 org.codehaus.jackson.sym.CharsToNameCanonicalizer
16: 101779 4071160 org.codehaus.jackson.map.deser.StdDeserializationContext
17: 101779 4071160 java.io.StringReader
18: 101754 4070160 java.util.HashMap$KeyIterator

It looks like Jackson eats up a lot of memory.  Our mapper reads in files of the avro format.  Does avro use Jackson a lot in reading the avro files?  Is there any way to improve this?  Thanks.

Ey-Chih Chow

________________________________
From: scott@richrelevance.com<ma...@richrelevance.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Date: Tue, 31 May 2011 18:26:23 -0700
Subject: Re: avro object reuse

All of those instances are short-lived.   If you are running out of memory, its not likely due to object reuse.  This tends to cause more CPU time in the garbage collector, but not out of memory conditions.  This can be hard to do on a cluster, but grabbing 'jmap –histo' output from a JVM that has a larger-than-expected JVM heap usage can often be used to quickly identify the cause of memory consumption issues.

I'm not sure if AvroUtf8InputFormat can safely re-use its instances of Utf8 or not.


On 5/31/11 5:40 PM, "ey-chih chow" <ey...@hotmail.com>> wrote:

I actually looked into Avro code to find out how Avro does object reuse.  I looked at AvroUtf8InputFormat and got the following question.  Why a new Utf8 object has to be created each time the method next(AvroWrapper<Utf8> key, NullWritable value) is called ?  Will this eat up too much memory when we call next(key, value) many times?  Since Utf8 is mutable, can we just create one Utf8 object for all the calls to next(key, value)?  Will this save memory?  Thanks.

Ey-Chih Chow

________________________________
From: eychih@hotmail.com<ma...@hotmail.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Subject: avro object reuse
Date: Tue, 31 May 2011 10:38:39 -0700

Hi,

We have several mapreduce jobs using avro.  They take too much memory when running on production.  Can anybody suggest some object reuse techniques to cut down memory usage?  Thanks.

Ey-Chih Chow

RE: avro object reuse

Posted by ey-chih chow <ey...@hotmail.com>.

In addition, we ran the same MR job once again and got the following histogram.  Whey this is different from the previous one?  Thanks.


Ey-Chih Chow

Object Histogram:

num       #instances    #bytes  Class description
--------------------------------------------------------------------------
1:              4327    100242096       byte[]
2:              2050    5381496 int[]
3:              23762   2822864 * ConstMethodKlass
4:              23762   1904760 * MethodKlass
5:              39295   1688992 * SymbolKlass
6:              2127    1216976 * ConstantPoolKlass
7:              2127    882760  * InstanceKlassKlass
8:              11298   773008  char[]
9:              1847    742936  * ConstantPoolCacheKlass
10:             1064    297448  * MethodDataKlass
11:             11387   273288  java.lang.String
12:             2317    222432  java.lang.Class
13:             3288    204440  short[]
14:             3167    156664  * System ObjArray
15:             1360    86720   java.util.HashMap$Entry[]
16:             535     85600   org.codehaus.jackson.impl.ReaderBasedParser
17:             3498    83952   java.util.HashMap$Entry
18:             666     53280   java.lang.reflect.Method
19:             161     52808   * ObjArrayKlassKlass
20:             1267    44704   java.lang.Object[]
21:             1808    43392   java.util.Hashtable$Entry
22:             1070    42800   org.codehaus.jackson.impl.JsonReadContext
23:             777     31080   java.util.HashMap
24:             535     29960   org.codehaus.jackson.util.TextBuffer
25:             567     27216   java.nio.HeapByteBuffer
26:             553     26544   org.apache.avro.Schema$Props
27:             549     26352   java.nio.HeapCharBuffer
28:             538     25824   org.codehaus.jackson.map.DeserializationConfig
29:             535     25680   org.codehaus.jackson.io.IOContext
30:             1554    24864   org.codehaus.jackson.sym.CharsToNameCanonicalizer$Bucket
31:             539     21560   org.codehaus.jackson.sym.CharsToNameCanonicalizer


From: eychih@hotmail.com
To: user@avro.apache.org
Subject: RE: avro object reuse
Date: Thu, 9 Jun 2011 15:16:29 -0700








I forgot to mention that the histogram in my previous message was extracted from a mapper of one of our MR job.
Ey-Chih Chow

From: eychih@hotmail.com
To: user@avro.apache.org
Subject: RE: avro object reuse
Date: Thu, 9 Jun 2011 15:08:02 -0700








We did more monitoring.  At one instance, we got the following histogram via Jmap.  The question is why there are so many instances of BinaryDecoder$BufferAccessor and BinaryDecoder$ByteArrayByteSource.  How to avoid this?  Thanks. 

Object Histogram:

num       #instances    #bytes  Class description
--------------------------------------------------------------------------
1:              4199    100241168       byte[]
2:              272948  8734336 org.apache.avro.io.BinaryDecoder$BufferAccessor
3:              272945  8734240 org.apache.avro.io.BinaryDecoder$ByteArrayByteSource
4:              2093    5387976 int[]
5:              23762   2822864 * ConstMethodKlass
6:              23762   1904760 * MethodKlass
7:              39295   1688992 * SymbolKlass
8:              2127    1216976 * ConstantPoolKlass
9:              2127    882760  * InstanceKlassKlass
10:             1847    742936  * ConstantPoolCacheKlass
11:             9602    715608  char[]
12:             1072    299584  * MethodDataKlass
13:             9698    232752  java.lang.String
14:             2317    222432  java.lang.Class
15:             3288    204440  short[]
16:             3167    156664  * System ObjArray
17:             2401    57624   java.util.HashMap$Entry
18:             666     53280   java.lang.reflect.Method
19:             161     52808   * ObjArrayKlassKlass
20:             1808    43392   java.util.Hashtable$Entry


From: eychih@hotmail.com
To: user@avro.apache.org
Subject: RE: avro object reuse
Date: Wed, 1 Jun 2011 15:14:03 -0700








We use a lot of toString() call on the avro Utf8 object.  Will this cause Jackson call?  Thanks.
Ey-Chih 

From: scott@richrelevance.com
To: user@avro.apache.org
Date: Wed, 1 Jun 2011 13:38:39 -0700
Subject: Re: avro object reuse




This is great info.
Jackson should only be used once when the file is opened, so this is confusing from that point of view.  Is something else using Jackson or initializing an Avro JsonDecoder frequently?  There are over 100000 Jackson DeserializationConfig objects.
Another place that parses the schema is in AvroSerialization.java.  Does the Hadoop getDeserializer() API method get called once per job, or per record?  If this is called more than once per map job, it might explain this.
In principle, Jackson is only used by a mapper during initialization.  The below indicates that this may not be the case or that something outside of Avro is causing a lot of Jackson JSON parsing. 
Are you using something that is converting the Avro data to Json form?  toString() on most Avro datum objects will do a lot of work with Jackson, for example — but the below are deserializer objects not serializer objects so that is not likely the issue.
On 6/1/11 11:34 AM, "ey-chih chow" <ey...@hotmail.com> wrote:

We ran jmap on one of our mapper and found the top usage as follows:
num 	  #instances	#bytes	Class description--------------------------------------------------------------------------1:		24405	291733256	byte[]2:		6056	40228984	int[]3:		388799	19966776	char[]4:		101779	16284640	org.codehaus.jackson.impl.ReaderBasedParser5:		369623	11827936	java.lang.String6:		111059	8769424	java.util.HashMap$Entry[]7:		204083	8163320	org.codehaus.jackson.impl.JsonReadContext8:		211374	6763968	java.util.HashMap$Entry9:		102551	5742856	org.codehaus.jackson.util.TextBuffer10:		105854	5080992	java.nio.HeapByteBuffer11:		105821	5079408	java.nio.HeapCharBuffer12:		104578	5019744	java.util.HashMap13:		102551	4922448	org.codehaus.jackson.io.IOContext14:		101782	4885536	org.codehaus.jackson.map.DeserializationConfig15:		101783	4071320	org.codehaus.jackson.sym.CharsToNameCanonicalizer16:		101779	4071160	org.codehaus.jackson.map.deser.StdDeserializationContext17:		101779	4071160	java.io.StringReader18:		101754	4070160	java.util.HashMap$KeyIterator
It looks like Jackson eats up a lot of memory.  Our mapper reads in files of the avro format.  Does avro use Jackson a lot in reading the avro files?  Is there any way to improve this?  Thanks.
Ey-Chih Chow
From: scott@richrelevance.com
To: user@avro.apache.org
Date: Tue, 31 May 2011 18:26:23 -0700
Subject: Re: avro object reuse

All of those instances are short-lived.   If you are running out of memory, its not likely due to object reuse.  This tends to cause more CPU time in the garbage collector, but not out of memory conditions.  This can be hard to do on a cluster, but grabbing 'jmap –histo' output from a JVM that has a larger-than-expected JVM heap usage can often be used to quickly identify the cause of memory consumption issues.
I'm not sure if AvroUtf8InputFormat can safely re-use its instances of Utf8 or not.

On 5/31/11 5:40 PM, "ey-chih chow" <ey...@hotmail.com> wrote:

I actually looked into Avro code to find out how Avro does object reuse.  I looked at AvroUtf8InputFormat and got the following question.  Why a new Utf8 object has to be created each time the method next(AvroWrapper<Utf8> key, NullWritable value) is called ?  Will this eat up too much memory when we call next(key, value) many times?  Since Utf8 is mutable, can we just create one Utf8 object for all the calls to next(key, value)?  Will this save memory?  Thanks.
Ey-Chih Chow 

From: eychih@hotmail.com
To: user@avro.apache.org
Subject: avro object reuse
Date: Tue, 31 May 2011 10:38:39 -0700




Hi, 
We have several mapreduce jobs using avro.  They take too much memory when running on production.  Can anybody suggest some object reuse techniques to cut down memory usage?  Thanks.
Ey-Chih Chow

RE: avro object reuse

Posted by ey-chih chow <ey...@hotmail.com>.

I forgot to mention that the histogram in my previous message was extracted from a mapper of one of our MR job.
Ey-Chih Chow

From: eychih@hotmail.com
To: user@avro.apache.org
Subject: RE: avro object reuse
Date: Thu, 9 Jun 2011 15:08:02 -0700

We did more monitoring.  At one instance, we got the following histogram via Jmap.  The question is why there are so many instances of BinaryDecoder$BufferAccessor and BinaryDecoder$ByteArrayByteSource.  How to avoid this?  Thanks. 

Object Histogram:

num       #instances    #bytes  Class description
--------------------------------------------------------------------------
1:              4199    100241168       byte[]
2:              272948  8734336 org.apache.avro.io.BinaryDecoder$BufferAccessor
3:              272945  8734240 org.apache.avro.io.BinaryDecoder$ByteArrayByteSource
4:              2093    5387976 int[]
5:              23762   2822864 * ConstMethodKlass
6:              23762   1904760 * MethodKlass
7:              39295   1688992 * SymbolKlass
8:              2127    1216976 * ConstantPoolKlass
9:              2127    882760  * InstanceKlassKlass
10:             1847    742936  * ConstantPoolCacheKlass
11:             9602    715608  char[]
12:             1072    299584  * MethodDataKlass
13:             9698    232752  java.lang.String
14:             2317    222432  java.lang.Class
15:             3288    204440  short[]
16:             3167    156664  * System ObjArray
17:             2401    57624   java.util.HashMap$Entry
18:             666     53280   java.lang.reflect.Method
19:             161     52808   * ObjArrayKlassKlass
20:             1808    43392   java.util.Hashtable$Entry

From: eychih@hotmail.com
To: user@avro.apache.org
Subject: RE: avro object reuse
Date: Wed, 1 Jun 2011 15:14:03 -0700

We use a lot of toString() call on the avro Utf8 object.  Will this cause Jackson call?  Thanks.
Ey-Chih 

From: scott@richrelevance.com
To: user@avro.apache.org
Date: Wed, 1 Jun 2011 13:38:39 -0700
Subject: Re: avro object reuse

This is great info.
Jackson should only be used once when the file is opened, so this is confusing from that point of view.  Is something else using Jackson or initializing an Avro JsonDecoder frequently?  There are over 100000 Jackson DeserializationConfig objects.
Another place that parses the schema is in AvroSerialization.java.  Does the Hadoop getDeserializer() API method get called once per job, or per record?  If this is called more than once per map job, it might explain this.
In principle, Jackson is only used by a mapper during initialization.  The below indicates that this may not be the case or that something outside of Avro is causing a lot of Jackson JSON parsing. 
Are you using something that is converting the Avro data to Json form?  toString() on most Avro datum objects will do a lot of work with Jackson, for example — but the below are deserializer objects not serializer objects so that is not likely the issue.
On 6/1/11 11:34 AM, "ey-chih chow" <ey...@hotmail.com> wrote:

We ran jmap on one of our mapper and found the top usage as follows:
num 	  #instances	#bytes	Class description--------------------------------------------------------------------------1:		24405	291733256	byte[]2:		6056	40228984	int[]3:		388799	19966776	char[]4:		101779	16284640	org.codehaus.jackson.impl.ReaderBasedParser5:		369623	11827936	java.lang.String6:		111059	8769424	java.util.HashMap$Entry[]7:		204083	8163320	org.codehaus.jackson.impl.JsonReadContext8:		211374	6763968	java.util.HashMap$Entry9:		102551	5742856	org.codehaus.jackson.util.TextBuffer10:		105854	5080992	java.nio.HeapByteBuffer11:		105821	5079408	java.nio.HeapCharBuffer12:		104578	5019744	java.util.HashMap13:		102551	4922448	org.codehaus.jackson.io.IOContext14:		101782	4885536	org.codehaus.jackson.map.DeserializationConfig15:		101783	4071320	org.codehaus.jackson.sym.CharsToNameCanonicalizer16:		101779	4071160	org.codehaus.jackson.map.deser.StdDeserializationContext17:		101779	4071160	java.io.StringReader18:		101754	4070160	java.util.HashMap$KeyIterator
It looks like Jackson eats up a lot of memory.  Our mapper reads in files of the avro format.  Does avro use Jackson a lot in reading the avro files?  Is there any way to improve this?  Thanks.
Ey-Chih Chow
From: scott@richrelevance.com
To: user@avro.apache.org
Date: Tue, 31 May 2011 18:26:23 -0700
Subject: Re: avro object reuse

All of those instances are short-lived.   If you are running out of memory, its not likely due to object reuse.  This tends to cause more CPU time in the garbage collector, but not out of memory conditions.  This can be hard to do on a cluster, but grabbing 'jmap –histo' output from a JVM that has a larger-than-expected JVM heap usage can often be used to quickly identify the cause of memory consumption issues.
I'm not sure if AvroUtf8InputFormat can safely re-use its instances of Utf8 or not.

On 5/31/11 5:40 PM, "ey-chih chow" <ey...@hotmail.com> wrote:

I actually looked into Avro code to find out how Avro does object reuse.  I looked at AvroUtf8InputFormat and got the following question.  Why a new Utf8 object has to be created each time the method next(AvroWrapper<Utf8> key, NullWritable value) is called ?  Will this eat up too much memory when we call next(key, value) many times?  Since Utf8 is mutable, can we just create one Utf8 object for all the calls to next(key, value)?  Will this save memory?  Thanks.
Ey-Chih Chow 

From: eychih@hotmail.com
To: user@avro.apache.org
Subject: avro object reuse
Date: Tue, 31 May 2011 10:38:39 -0700

Hi, 
We have several mapreduce jobs using avro.  They take too much memory when running on production.  Can anybody suggest some object reuse techniques to cut down memory usage?  Thanks.
Ey-Chih Chow

Re: avro object reuse

Posted by Scott Carey <sc...@richrelevance.com>.

The most likely candidate for creating many instances of BufferAccessor and ByteArrayByteSource is BinaryData.compare() and BinaryData.hashCode().  Each call will create one of each (hash) or two of each (compare).  These are only 32 bytes per instance and quickly become garbage that is easily cleaned up by the GC.

The below have only 32 bytes each and 8MB total.
On the other hand,  the byte[]'s appear to be about 24K each on average and are using 100MB.  Is this the size of your configured MapReduce sort MB?

On 6/9/11 3:08 PM, "ey-chih chow" <ey...@hotmail.com>> wrote:

We did more monitoring.  At one instance, we got the following histogram via Jmap.  The question is why there are so many instances of BinaryDecoder$BufferAccessor and BinaryDecoder$ByteArrayByteSource.  How to avoid this?  Thanks.

Object Histogram:

num       #instances    #bytes  Class description
--------------------------------------------------------------------------
1:              4199    100241168       byte[]
2:              272948  8734336 org.apache.avro.io.BinaryDecoder$BufferAccessor
3:              272945  8734240 org.apache.avro.io.BinaryDecoder$ByteArrayByteSource
4:              2093    5387976 int[]
5:              23762   2822864 * ConstMethodKlass
6:              23762   1904760 * MethodKlass
7:              39295   1688992 * SymbolKlass
8:              2127    1216976 * ConstantPoolKlass
9:              2127    882760  * InstanceKlassKlass
10:             1847    742936  * ConstantPoolCacheKlass
11:             9602    715608  char[]
12:             1072    299584  * MethodDataKlass
13:             9698    232752  java.lang.String
14:             2317    222432  java.lang.Class
15:             3288    204440  short[]
16:             3167    156664  * System ObjArray
17:             2401    57624   java.util.HashMap$Entry
18:             666     53280   java.lang.reflect.Method
19:             161     52808   * ObjArrayKlassKlass
20:             1808    43392   java.util.Hashtable$Entry


________________________________
From: eychih@hotmail.com<ma...@hotmail.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Subject: RE: avro object reuse
Date: Wed, 1 Jun 2011 15:14:03 -0700

We use a lot of toString() call on the avro Utf8 object.  Will this cause Jackson call?  Thanks.

Ey-Chih

________________________________
From: scott@richrelevance.com<ma...@richrelevance.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Date: Wed, 1 Jun 2011 13:38:39 -0700
Subject: Re: avro object reuse

This is great info.

Jackson should only be used once when the file is opened, so this is confusing from that point of view.
Is something else using Jackson or initializing an Avro JsonDecoder frequently?  There are over 100000 Jackson DeserializationConfig objects.

Another place that parses the schema is in AvroSerialization.java.  Does the Hadoop getDeserializer() API method get called once per job, or per record?  If this is called more than once per map job, it might explain this.

In principle, Jackson is only used by a mapper during initialization.  The below indicates that this may not be the case or that something outside of Avro is causing a lot of Jackson JSON parsing.

Are you using something that is converting the Avro data to Json form?  toString() on most Avro datum objects will do a lot of work with Jackson, for example — but the below are deserializer objects not serializer objects so that is not likely the issue.

On 6/1/11 11:34 AM, "ey-chih chow" <ey...@hotmail.com>> wrote:

We ran jmap on one of our mapper and found the top usage as follows:

num  #instances #bytes Class description
--------------------------------------------------------------------------
1: 24405 291733256 byte[]
2: 6056 40228984 int[]
3: 388799 19966776 char[]
4: 101779 16284640 org.codehaus.jackson.impl.ReaderBasedParser
5: 369623 11827936 java.lang.String
6: 111059 8769424 java.util.HashMap$Entry[]
7: 204083 8163320 org.codehaus.jackson.impl.JsonReadContext
8: 211374 6763968 java.util.HashMap$Entry
9: 102551 5742856 org.codehaus.jackson.util.TextBuffer
10: 105854 5080992 java.nio.HeapByteBuffer
11: 105821 5079408 java.nio.HeapCharBuffer
12: 104578 5019744 java.util.HashMap
13: 102551 4922448 org.codehaus.jackson.io.IOContext
14: 101782 4885536 org.codehaus.jackson.map.DeserializationConfig
15: 101783 4071320 org.codehaus.jackson.sym.CharsToNameCanonicalizer
16: 101779 4071160 org.codehaus.jackson.map.deser.StdDeserializationContext
17: 101779 4071160 java.io.StringReader
18: 101754 4070160 java.util.HashMap$KeyIterator

It looks like Jackson eats up a lot of memory.  Our mapper reads in files of the avro format.  Does avro use Jackson a lot in reading the avro files?  Is there any way to improve this?  Thanks.

Ey-Chih Chow

________________________________
From: scott@richrelevance.com<ma...@richrelevance.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Date: Tue, 31 May 2011 18:26:23 -0700
Subject: Re: avro object reuse

All of those instances are short-lived.   If you are running out of memory, its not likely due to object reuse.  This tends to cause more CPU time in the garbage collector, but not out of memory conditions.  This can be hard to do on a cluster, but grabbing 'jmap –histo' output from a JVM that has a larger-than-expected JVM heap usage can often be used to quickly identify the cause of memory consumption issues.

I'm not sure if AvroUtf8InputFormat can safely re-use its instances of Utf8 or not.


On 5/31/11 5:40 PM, "ey-chih chow" <ey...@hotmail.com>> wrote:

I actually looked into Avro code to find out how Avro does object reuse.  I looked at AvroUtf8InputFormat and got the following question.  Why a new Utf8 object has to be created each time the method next(AvroWrapper<Utf8> key, NullWritable value) is called ?  Will this eat up too much memory when we call next(key, value) many times?  Since Utf8 is mutable, can we just create one Utf8 object for all the calls to next(key, value)?  Will this save memory?  Thanks.

Ey-Chih Chow

________________________________
From: eychih@hotmail.com<ma...@hotmail.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Subject: avro object reuse
Date: Tue, 31 May 2011 10:38:39 -0700

Hi,

We have several mapreduce jobs using avro.  They take too much memory when running on production.  Can anybody suggest some object reuse techniques to cut down memory usage?  Thanks.

Ey-Chih Chow

RE: avro object reuse

Posted by ey-chih chow <ey...@hotmail.com>.

We did more monitoring.  At one instance, we got the following histogram via Jmap.  The question is why there are so many instances of BinaryDecoder$BufferAccessor and BinaryDecoder$ByteArrayByteSource.  How to avoid this?  Thanks. 

Object Histogram:

num       #instances    #bytes  Class description
--------------------------------------------------------------------------
1:              4199    100241168       byte[]
2:              272948  8734336 org.apache.avro.io.BinaryDecoder$BufferAccessor
3:              272945  8734240 org.apache.avro.io.BinaryDecoder$ByteArrayByteSource
4:              2093    5387976 int[]
5:              23762   2822864 * ConstMethodKlass
6:              23762   1904760 * MethodKlass
7:              39295   1688992 * SymbolKlass
8:              2127    1216976 * ConstantPoolKlass
9:              2127    882760  * InstanceKlassKlass
10:             1847    742936  * ConstantPoolCacheKlass
11:             9602    715608  char[]
12:             1072    299584  * MethodDataKlass
13:             9698    232752  java.lang.String
14:             2317    222432  java.lang.Class
15:             3288    204440  short[]
16:             3167    156664  * System ObjArray
17:             2401    57624   java.util.HashMap$Entry
18:             666     53280   java.lang.reflect.Method
19:             161     52808   * ObjArrayKlassKlass
20:             1808    43392   java.util.Hashtable$Entry


From: eychih@hotmail.com
To: user@avro.apache.org
Subject: RE: avro object reuse
Date: Wed, 1 Jun 2011 15:14:03 -0700








We use a lot of toString() call on the avro Utf8 object.  Will this cause Jackson call?  Thanks.
Ey-Chih 

From: scott@richrelevance.com
To: user@avro.apache.org
Date: Wed, 1 Jun 2011 13:38:39 -0700
Subject: Re: avro object reuse




This is great info.
Jackson should only be used once when the file is opened, so this is confusing from that point of view.  Is something else using Jackson or initializing an Avro JsonDecoder frequently?  There are over 100000 Jackson DeserializationConfig objects.
Another place that parses the schema is in AvroSerialization.java.  Does the Hadoop getDeserializer() API method get called once per job, or per record?  If this is called more than once per map job, it might explain this.
In principle, Jackson is only used by a mapper during initialization.  The below indicates that this may not be the case or that something outside of Avro is causing a lot of Jackson JSON parsing. 
Are you using something that is converting the Avro data to Json form?  toString() on most Avro datum objects will do a lot of work with Jackson, for example — but the below are deserializer objects not serializer objects so that is not likely the issue.
On 6/1/11 11:34 AM, "ey-chih chow" <ey...@hotmail.com> wrote:

We ran jmap on one of our mapper and found the top usage as follows:
num 	  #instances	#bytes	Class description--------------------------------------------------------------------------1:		24405	291733256	byte[]2:		6056	40228984	int[]3:		388799	19966776	char[]4:		101779	16284640	org.codehaus.jackson.impl.ReaderBasedParser5:		369623	11827936	java.lang.String6:		111059	8769424	java.util.HashMap$Entry[]7:		204083	8163320	org.codehaus.jackson.impl.JsonReadContext8:		211374	6763968	java.util.HashMap$Entry9:		102551	5742856	org.codehaus.jackson.util.TextBuffer10:		105854	5080992	java.nio.HeapByteBuffer11:		105821	5079408	java.nio.HeapCharBuffer12:		104578	5019744	java.util.HashMap13:		102551	4922448	org.codehaus.jackson.io.IOContext14:		101782	4885536	org.codehaus.jackson.map.DeserializationConfig15:		101783	4071320	org.codehaus.jackson.sym.CharsToNameCanonicalizer16:		101779	4071160	org.codehaus.jackson.map.deser.StdDeserializationContext17:		101779	4071160	java.io.StringReader18:		101754	4070160	java.util.HashMap$KeyIterator
It looks like Jackson eats up a lot of memory.  Our mapper reads in files of the avro format.  Does avro use Jackson a lot in reading the avro files?  Is there any way to improve this?  Thanks.
Ey-Chih Chow
From: scott@richrelevance.com
To: user@avro.apache.org
Date: Tue, 31 May 2011 18:26:23 -0700
Subject: Re: avro object reuse

All of those instances are short-lived.   If you are running out of memory, its not likely due to object reuse.  This tends to cause more CPU time in the garbage collector, but not out of memory conditions.  This can be hard to do on a cluster, but grabbing 'jmap –histo' output from a JVM that has a larger-than-expected JVM heap usage can often be used to quickly identify the cause of memory consumption issues.
I'm not sure if AvroUtf8InputFormat can safely re-use its instances of Utf8 or not.

On 5/31/11 5:40 PM, "ey-chih chow" <ey...@hotmail.com> wrote:

I actually looked into Avro code to find out how Avro does object reuse.  I looked at AvroUtf8InputFormat and got the following question.  Why a new Utf8 object has to be created each time the method next(AvroWrapper<Utf8> key, NullWritable value) is called ?  Will this eat up too much memory when we call next(key, value) many times?  Since Utf8 is mutable, can we just create one Utf8 object for all the calls to next(key, value)?  Will this save memory?  Thanks.
Ey-Chih Chow 

From: eychih@hotmail.com
To: user@avro.apache.org
Subject: avro object reuse
Date: Tue, 31 May 2011 10:38:39 -0700




Hi, 
We have several mapreduce jobs using avro.  They take too much memory when running on production.  Can anybody suggest some object reuse techniques to cut down memory usage?  Thanks.
Ey-Chih Chow

Re: avro object reuse

Posted by Scott Carey <sc...@richrelevance.com>.

no, and even GenericData.Record simply writes using a StringBuilder; I doubt this is the culprit.

On 6/1/11 3:14 PM, "ey-chih chow" <ey...@hotmail.com>> wrote:

We use a lot of toString() call on the avro Utf8 object.  Will this cause Jackson call?  Thanks.

Ey-Chih

________________________________
From: scott@richrelevance.com<ma...@richrelevance.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Date: Wed, 1 Jun 2011 13:38:39 -0700
Subject: Re: avro object reuse

This is great info.

Jackson should only be used once when the file is opened, so this is confusing from that point of view.
Is something else using Jackson or initializing an Avro JsonDecoder frequently?  There are over 100000 Jackson DeserializationConfig objects.

Another place that parses the schema is in AvroSerialization.java.  Does the Hadoop getDeserializer() API method get called once per job, or per record?  If this is called more than once per map job, it might explain this.

In principle, Jackson is only used by a mapper during initialization.  The below indicates that this may not be the case or that something outside of Avro is causing a lot of Jackson JSON parsing.

Are you using something that is converting the Avro data to Json form?  toString() on most Avro datum objects will do a lot of work with Jackson, for example — but the below are deserializer objects not serializer objects so that is not likely the issue.

On 6/1/11 11:34 AM, "ey-chih chow" <ey...@hotmail.com>> wrote:

We ran jmap on one of our mapper and found the top usage as follows:

num  #instances #bytes Class description
--------------------------------------------------------------------------
1: 24405 291733256 byte[]
2: 6056 40228984 int[]
3: 388799 19966776 char[]
4: 101779 16284640 org.codehaus.jackson.impl.ReaderBasedParser
5: 369623 11827936 java.lang.String
6: 111059 8769424 java.util.HashMap$Entry[]
7: 204083 8163320 org.codehaus.jackson.impl.JsonReadContext
8: 211374 6763968 java.util.HashMap$Entry
9: 102551 5742856 org.codehaus.jackson.util.TextBuffer
10: 105854 5080992 java.nio.HeapByteBuffer
11: 105821 5079408 java.nio.HeapCharBuffer
12: 104578 5019744 java.util.HashMap
13: 102551 4922448 org.codehaus.jackson.io.IOContext
14: 101782 4885536 org.codehaus.jackson.map.DeserializationConfig
15: 101783 4071320 org.codehaus.jackson.sym.CharsToNameCanonicalizer
16: 101779 4071160 org.codehaus.jackson.map.deser.StdDeserializationContext
17: 101779 4071160 java.io.StringReader
18: 101754 4070160 java.util.HashMap$KeyIterator

It looks like Jackson eats up a lot of memory.  Our mapper reads in files of the avro format.  Does avro use Jackson a lot in reading the avro files?  Is there any way to improve this?  Thanks.

Ey-Chih Chow

________________________________
From: scott@richrelevance.com<ma...@richrelevance.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Date: Tue, 31 May 2011 18:26:23 -0700
Subject: Re: avro object reuse

All of those instances are short-lived.   If you are running out of memory, its not likely due to object reuse.  This tends to cause more CPU time in the garbage collector, but not out of memory conditions.  This can be hard to do on a cluster, but grabbing 'jmap –histo' output from a JVM that has a larger-than-expected JVM heap usage can often be used to quickly identify the cause of memory consumption issues.

I'm not sure if AvroUtf8InputFormat can safely re-use its instances of Utf8 or not.


On 5/31/11 5:40 PM, "ey-chih chow" <ey...@hotmail.com>> wrote:

I actually looked into Avro code to find out how Avro does object reuse.  I looked at AvroUtf8InputFormat and got the following question.  Why a new Utf8 object has to be created each time the method next(AvroWrapper<Utf8> key, NullWritable value) is called ?  Will this eat up too much memory when we call next(key, value) many times?  Since Utf8 is mutable, can we just create one Utf8 object for all the calls to next(key, value)?  Will this save memory?  Thanks.

Ey-Chih Chow

________________________________
From: eychih@hotmail.com<ma...@hotmail.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Subject: avro object reuse
Date: Tue, 31 May 2011 10:38:39 -0700

Hi,

We have several mapreduce jobs using avro.  They take too much memory when running on production.  Can anybody suggest some object reuse techniques to cut down memory usage?  Thanks.

Ey-Chih Chow

RE: avro object reuse

Posted by ey-chih chow <ey...@hotmail.com>.

We use a lot of toString() call on the avro Utf8 object.  Will this cause Jackson call?  Thanks.
Ey-Chih 

From: scott@richrelevance.com
To: user@avro.apache.org
Date: Wed, 1 Jun 2011 13:38:39 -0700
Subject: Re: avro object reuse

This is great info.
Jackson should only be used once when the file is opened, so this is confusing from that point of view.  Is something else using Jackson or initializing an Avro JsonDecoder frequently?  There are over 100000 Jackson DeserializationConfig objects.
Another place that parses the schema is in AvroSerialization.java.  Does the Hadoop getDeserializer() API method get called once per job, or per record?  If this is called more than once per map job, it might explain this.
In principle, Jackson is only used by a mapper during initialization.  The below indicates that this may not be the case or that something outside of Avro is causing a lot of Jackson JSON parsing. 
Are you using something that is converting the Avro data to Json form?  toString() on most Avro datum objects will do a lot of work with Jackson, for example — but the below are deserializer objects not serializer objects so that is not likely the issue.
On 6/1/11 11:34 AM, "ey-chih chow" <ey...@hotmail.com> wrote:

We ran jmap on one of our mapper and found the top usage as follows:
num 	  #instances	#bytes	Class description--------------------------------------------------------------------------1:		24405	291733256	byte[]2:		6056	40228984	int[]3:		388799	19966776	char[]4:		101779	16284640	org.codehaus.jackson.impl.ReaderBasedParser5:		369623	11827936	java.lang.String6:		111059	8769424	java.util.HashMap$Entry[]7:		204083	8163320	org.codehaus.jackson.impl.JsonReadContext8:		211374	6763968	java.util.HashMap$Entry9:		102551	5742856	org.codehaus.jackson.util.TextBuffer10:		105854	5080992	java.nio.HeapByteBuffer11:		105821	5079408	java.nio.HeapCharBuffer12:		104578	5019744	java.util.HashMap13:		102551	4922448	org.codehaus.jackson.io.IOContext14:		101782	4885536	org.codehaus.jackson.map.DeserializationConfig15:		101783	4071320	org.codehaus.jackson.sym.CharsToNameCanonicalizer16:		101779	4071160	org.codehaus.jackson.map.deser.StdDeserializationContext17:		101779	4071160	java.io.StringReader18:		101754	4070160	java.util.HashMap$KeyIterator
It looks like Jackson eats up a lot of memory.  Our mapper reads in files of the avro format.  Does avro use Jackson a lot in reading the avro files?  Is there any way to improve this?  Thanks.
Ey-Chih Chow
From: scott@richrelevance.com
To: user@avro.apache.org
Date: Tue, 31 May 2011 18:26:23 -0700
Subject: Re: avro object reuse

All of those instances are short-lived.   If you are running out of memory, its not likely due to object reuse.  This tends to cause more CPU time in the garbage collector, but not out of memory conditions.  This can be hard to do on a cluster, but grabbing 'jmap –histo' output from a JVM that has a larger-than-expected JVM heap usage can often be used to quickly identify the cause of memory consumption issues.
I'm not sure if AvroUtf8InputFormat can safely re-use its instances of Utf8 or not.

On 5/31/11 5:40 PM, "ey-chih chow" <ey...@hotmail.com> wrote:

I actually looked into Avro code to find out how Avro does object reuse.  I looked at AvroUtf8InputFormat and got the following question.  Why a new Utf8 object has to be created each time the method next(AvroWrapper<Utf8> key, NullWritable value) is called ?  Will this eat up too much memory when we call next(key, value) many times?  Since Utf8 is mutable, can we just create one Utf8 object for all the calls to next(key, value)?  Will this save memory?  Thanks.
Ey-Chih Chow 

From: eychih@hotmail.com
To: user@avro.apache.org
Subject: avro object reuse
Date: Tue, 31 May 2011 10:38:39 -0700

Hi, 
We have several mapreduce jobs using avro.  They take too much memory when running on production.  Can anybody suggest some object reuse techniques to cut down memory usage?  Thanks.
Ey-Chih Chow

Re: avro object reuse

Posted by Scott Carey <sc...@richrelevance.com>.

This is great info.

Jackson should only be used once when the file is opened, so this is confusing from that point of view.
Is something else using Jackson or initializing an Avro JsonDecoder frequently?  There are over 100000 Jackson DeserializationConfig objects.

Another place that parses the schema is in AvroSerialization.java.  Does the Hadoop getDeserializer() API method get called once per job, or per record?  If this is called more than once per map job, it might explain this.

In principle, Jackson is only used by a mapper during initialization.  The below indicates that this may not be the case or that something outside of Avro is causing a lot of Jackson JSON parsing.

Are you using something that is converting the Avro data to Json form?  toString() on most Avro datum objects will do a lot of work with Jackson, for example — but the below are deserializer objects not serializer objects so that is not likely the issue.

On 6/1/11 11:34 AM, "ey-chih chow" <ey...@hotmail.com>> wrote:

We ran jmap on one of our mapper and found the top usage as follows:

num  #instances #bytes Class description
--------------------------------------------------------------------------
1: 24405 291733256 byte[]
2: 6056 40228984 int[]
3: 388799 19966776 char[]
4: 101779 16284640 org.codehaus.jackson.impl.ReaderBasedParser
5: 369623 11827936 java.lang.String
6: 111059 8769424 java.util.HashMap$Entry[]
7: 204083 8163320 org.codehaus.jackson.impl.JsonReadContext
8: 211374 6763968 java.util.HashMap$Entry
9: 102551 5742856 org.codehaus.jackson.util.TextBuffer
10: 105854 5080992 java.nio.HeapByteBuffer
11: 105821 5079408 java.nio.HeapCharBuffer
12: 104578 5019744 java.util.HashMap
13: 102551 4922448 org.codehaus.jackson.io.IOContext
14: 101782 4885536 org.codehaus.jackson.map.DeserializationConfig
15: 101783 4071320 org.codehaus.jackson.sym.CharsToNameCanonicalizer
16: 101779 4071160 org.codehaus.jackson.map.deser.StdDeserializationContext
17: 101779 4071160 java.io.StringReader
18: 101754 4070160 java.util.HashMap$KeyIterator

It looks like Jackson eats up a lot of memory.  Our mapper reads in files of the avro format.  Does avro use Jackson a lot in reading the avro files?  Is there any way to improve this?  Thanks.

Ey-Chih Chow

________________________________
From: scott@richrelevance.com<ma...@richrelevance.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Date: Tue, 31 May 2011 18:26:23 -0700
Subject: Re: avro object reuse

All of those instances are short-lived.   If you are running out of memory, its not likely due to object reuse.  This tends to cause more CPU time in the garbage collector, but not out of memory conditions.  This can be hard to do on a cluster, but grabbing 'jmap –histo' output from a JVM that has a larger-than-expected JVM heap usage can often be used to quickly identify the cause of memory consumption issues.

I'm not sure if AvroUtf8InputFormat can safely re-use its instances of Utf8 or not.


On 5/31/11 5:40 PM, "ey-chih chow" <ey...@hotmail.com>> wrote:

I actually looked into Avro code to find out how Avro does object reuse.  I looked at AvroUtf8InputFormat and got the following question.  Why a new Utf8 object has to be created each time the method next(AvroWrapper<Utf8> key, NullWritable value) is called ?  Will this eat up too much memory when we call next(key, value) many times?  Since Utf8 is mutable, can we just create one Utf8 object for all the calls to next(key, value)?  Will this save memory?  Thanks.

Ey-Chih Chow

________________________________
From: eychih@hotmail.com<ma...@hotmail.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Subject: avro object reuse
Date: Tue, 31 May 2011 10:38:39 -0700

Hi,

We have several mapreduce jobs using avro.  They take too much memory when running on production.  Can anybody suggest some object reuse techniques to cut down memory usage?  Thanks.

Ey-Chih Chow

Re: avro object reuse

Posted by Scott Carey <sc...@richrelevance.com>.

Lower down this list of object counts, what are the top org.apache.avro.** object counts?

How many AvroSerialization objects?  How many AvroMapper,  AvroWrapper, etc?

What about org.apache.hadoop.** objects?

On 6/1/11 11:34 AM, "ey-chih chow" <ey...@hotmail.com>> wrote:

We ran jmap on one of our mapper and found the top usage as follows:

num  #instances #bytes Class description
--------------------------------------------------------------------------
1: 24405 291733256 byte[]
2: 6056 40228984 int[]
3: 388799 19966776 char[]
4: 101779 16284640 org.codehaus.jackson.impl.ReaderBasedParser
5: 369623 11827936 java.lang.String
6: 111059 8769424 java.util.HashMap$Entry[]
7: 204083 8163320 org.codehaus.jackson.impl.JsonReadContext
8: 211374 6763968 java.util.HashMap$Entry
9: 102551 5742856 org.codehaus.jackson.util.TextBuffer
10: 105854 5080992 java.nio.HeapByteBuffer
11: 105821 5079408 java.nio.HeapCharBuffer
12: 104578 5019744 java.util.HashMap
13: 102551 4922448 org.codehaus.jackson.io.IOContext
14: 101782 4885536 org.codehaus.jackson.map.DeserializationConfig
15: 101783 4071320 org.codehaus.jackson.sym.CharsToNameCanonicalizer
16: 101779 4071160 org.codehaus.jackson.map.deser.StdDeserializationContext
17: 101779 4071160 java.io.StringReader
18: 101754 4070160 java.util.HashMap$KeyIterator

It looks like Jackson eats up a lot of memory.  Our mapper reads in files of the avro format.  Does avro use Jackson a lot in reading the avro files?  Is there any way to improve this?  Thanks.

Ey-Chih Chow

________________________________
From: scott@richrelevance.com<ma...@richrelevance.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Date: Tue, 31 May 2011 18:26:23 -0700
Subject: Re: avro object reuse

All of those instances are short-lived.   If you are running out of memory, its not likely due to object reuse.  This tends to cause more CPU time in the garbage collector, but not out of memory conditions.  This can be hard to do on a cluster, but grabbing 'jmap –histo' output from a JVM that has a larger-than-expected JVM heap usage can often be used to quickly identify the cause of memory consumption issues.

I'm not sure if AvroUtf8InputFormat can safely re-use its instances of Utf8 or not.


On 5/31/11 5:40 PM, "ey-chih chow" <ey...@hotmail.com>> wrote:

I actually looked into Avro code to find out how Avro does object reuse.  I looked at AvroUtf8InputFormat and got the following question.  Why a new Utf8 object has to be created each time the method next(AvroWrapper<Utf8> key, NullWritable value) is called ?  Will this eat up too much memory when we call next(key, value) many times?  Since Utf8 is mutable, can we just create one Utf8 object for all the calls to next(key, value)?  Will this save memory?  Thanks.

Ey-Chih Chow

________________________________
From: eychih@hotmail.com<ma...@hotmail.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Subject: avro object reuse
Date: Tue, 31 May 2011 10:38:39 -0700

Hi,

We have several mapreduce jobs using avro.  They take too much memory when running on production.  Can anybody suggest some object reuse techniques to cut down memory usage?  Thanks.

Ey-Chih Chow

RE: avro object reuse

Posted by ey-chih chow <ey...@hotmail.com>.

We ran jmap on one of our mapper and found the top usage as follows:
num 	  #instances	#bytes	Class description--------------------------------------------------------------------------1:		24405	291733256	byte[]2:		6056	40228984	int[]3:		388799	19966776	char[]4:		101779	16284640	org.codehaus.jackson.impl.ReaderBasedParser5:		369623	11827936	java.lang.String6:		111059	8769424	java.util.HashMap$Entry[]7:		204083	8163320	org.codehaus.jackson.impl.JsonReadContext8:		211374	6763968	java.util.HashMap$Entry9:		102551	5742856	org.codehaus.jackson.util.TextBuffer10:		105854	5080992	java.nio.HeapByteBuffer11:		105821	5079408	java.nio.HeapCharBuffer12:		104578	5019744	java.util.HashMap13:		102551	4922448	org.codehaus.jackson.io.IOContext14:		101782	4885536	org.codehaus.jackson.map.DeserializationConfig15:		101783	4071320	org.codehaus.jackson.sym.CharsToNameCanonicalizer16:		101779	4071160	org.codehaus.jackson.map.deser.StdDeserializationContext17:		101779	4071160	java.io.StringReader18:		101754	4070160	java.util.HashMap$KeyIterator
It looks like Jackson eats up a lot of memory.  Our mapper reads in files of the avro format.  Does avro use Jackson a lot in reading the avro files?  Is there any way to improve this?  Thanks.
Ey-Chih Chow
From: scott@richrelevance.com
To: user@avro.apache.org
Date: Tue, 31 May 2011 18:26:23 -0700
Subject: Re: avro object reuse

All of those instances are short-lived.   If you are running out of memory, its not likely due to object reuse.  This tends to cause more CPU time in the garbage collector, but not out of memory conditions.  This can be hard to do on a cluster, but grabbing 'jmap –histo' output from a JVM that has a larger-than-expected JVM heap usage can often be used to quickly identify the cause of memory consumption issues.
I'm not sure if AvroUtf8InputFormat can safely re-use its instances of Utf8 or not.

On 5/31/11 5:40 PM, "ey-chih chow" <ey...@hotmail.com> wrote:

I actually looked into Avro code to find out how Avro does object reuse.  I looked at AvroUtf8InputFormat and got the following question.  Why a new Utf8 object has to be created each time the method next(AvroWrapper<Utf8> key, NullWritable value) is called ?  Will this eat up too much memory when we call next(key, value) many times?  Since Utf8 is mutable, can we just create one Utf8 object for all the calls to next(key, value)?  Will this save memory?  Thanks.
Ey-Chih Chow 

From: eychih@hotmail.com
To: user@avro.apache.org
Subject: avro object reuse
Date: Tue, 31 May 2011 10:38:39 -0700

Hi, 
We have several mapreduce jobs using avro.  They take too much memory when running on production.  Can anybody suggest some object reuse techniques to cut down memory usage?  Thanks.
Ey-Chih Chow

Re: avro object reuse

Posted by Scott Carey <sc...@richrelevance.com>.

All of those instances are short-lived.   If you are running out of memory, its not likely due to object reuse.  This tends to cause more CPU time in the garbage collector, but not out of memory conditions.  This can be hard to do on a cluster, but grabbing 'jmap –histo' output from a JVM that has a larger-than-expected JVM heap usage can often be used to quickly identify the cause of memory consumption issues.

I'm not sure if AvroUtf8InputFormat can safely re-use its instances of Utf8 or not.


On 5/31/11 5:40 PM, "ey-chih chow" <ey...@hotmail.com>> wrote:

I actually looked into Avro code to find out how Avro does object reuse.  I looked at AvroUtf8InputFormat and got the following question.  Why a new Utf8 object has to be created each time the method next(AvroWrapper<Utf8> key, NullWritable value) is called ?  Will this eat up too much memory when we call next(key, value) many times?  Since Utf8 is mutable, can we just create one Utf8 object for all the calls to next(key, value)?  Will this save memory?  Thanks.

Ey-Chih Chow

________________________________
From: eychih@hotmail.com<ma...@hotmail.com>
To: user@avro.apache.org<ma...@avro.apache.org>
Subject: avro object reuse
Date: Tue, 31 May 2011 10:38:39 -0700

Hi,

We have several mapreduce jobs using avro.  They take too much memory when running on production.  Can anybody suggest some object reuse techniques to cut down memory usage?  Thanks.

Ey-Chih Chow

RE: avro object reuse

Posted by ey-chih chow <ey...@hotmail.com>.

I actually looked into Avro code to find out how Avro does object reuse.  I looked at AvroUtf8InputFormat and got the following question.  Why a new Utf8 object has to be created each time the method next(AvroWrapper<Utf8> key, NullWritable value) is called ?  Will this eat up too much memory when we call next(key, value) many times?  Since Utf8 is mutable, can we just create one Utf8 object for all the calls to next(key, value)?  Will this save memory?  Thanks.
Ey-Chih Chow 

From: eychih@hotmail.com
To: user@avro.apache.org
Subject: avro object reuse
Date: Tue, 31 May 2011 10:38:39 -0700

Hi, 
We have several mapreduce jobs using avro.  They take too much memory when running on production.  Can anybody suggest some object reuse techniques to cut down memory usage?  Thanks.
Ey-Chih Chow