You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@impala.apache.org by Barnabás Maidics <ba...@cloudera.com.INVALID> on 2018/08/01 14:01:02 UTC
Re: hadoop.fs.Path impacts on Impala
On Wed, Aug 1, 2018 at 11:17 AM Barnabás Maidics <
barnabas.maidics@cloudera.com> wrote:
> Hi Everyone!
>
> I'm an intern at Cloudera and analysing where the memory goes in Hive. I
> was looking at a heapdump with many partitions, and found a memory waste,
> that comes from HDFS.
>
> We store paths in hadoop.fs.Path objects. This uses java.net.URI that
> stores almost the same strings in 3 different objects (see image and
> further explanation at the link given below). I think it's a waste of
> memory and it could be reduced by replacing the URI objects. This is why
> I've created an issue on HDFS side (HDFS-13752
> <https://issues.apache.org/jira/browse/HDFS-13752>).
>
> I'm curious if you store these objects (hadoop.fs.Path), and if you do how
> much it effects the overall memory usage of Impala. It may be beneficial
> for you as well, if it can be replaced.
>
> Thanks,
>
> Barnabas Maidics
>
>
Re: hadoop.fs.Path impacts on Impala
Posted by Philip Zeyliger <ph...@cloudera.com.INVALID>.
Hi Barnabas,
If I may suggest a way to approach this sort of question, I'd take a
heapdump of an impalad and a catalogd (using "jmap") and then use Eclipse
MAT or http://www.jxray.com/ to see if we're using Path. You'll want to
load some tables and partitions ahead of time. Based on a little quick
sleuthing (I'm not well-versed in this area of the
code), fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java seems
to use flat buffers to store these. It's likely we don't use the HDFS Path
object in steady state.
I did a quick look on a cluster we have lying around and found negligible
use at that moment, but I'm not totally confident about what's on that
cluster at the moment.
[root@... philip]# sudo -u impala /usr/java/jdk1.8.0_111/bin/jmap -histo
47116 > /tmp/histo
[root@.... philip]# cat /tmp/histo | grep Path
85: 247 13832 sun.misc.URLClassPath$JarLoader
567: 3 144 sun.misc.URLClassPath
568: 6 144 sun.misc.URLClassPath$FileLoader
724: 4 96
sun.security.provider.certpath.X509CertPath
762: 2 80 sun.misc.URLClassPath$1
* 893: 4 64 org.apache.hadoop.fs.Path*
927: 2 64 sun.nio.fs.UnixPath
1030: 2 48 java.io.File$PathStatus
1211: 1 40
org.apache.hadoop.hdfs.protocol.proto.EncryptionZonesProtos$GetEZForPathRequestProto
1226: 1 40 sun.misc.URLClassPath$2
1411: 1 24 [Ljava.io.File$PathStatus;
1470: 1 24
com.sun.org.apache.bcel.internal.util.ClassPath
1645: 1 16
[Lcom.sun.org.apache.bcel.internal.util.ClassPath$PathEntry;
2035: 1 16
org.apache.hadoop.hdfs.protocol.proto.EncryptionZonesProtos$GetEZForPathRequestProto$1
[root@... philip]# head /tmp/histo
num #instances #bytes class name
----------------------------------------------
1: 416952 81879632 [B
2: 1324260 42376320 com.codahale.metrics.LongAdder
3: 794556 38138688 com.codahale.metrics.EWMA
4: 1060844 25460256 java.util.concurrent.atomic.AtomicLong
5: 264852 14831712
com.codahale.metrics.ExponentiallyDecayingReservoir
6: 264852 12712896 com.codahale.metrics.Meter
7: 264852 12712896
java.util.concurrent.ConcurrentSkipListMap
-- Philip
On Wed, Aug 1, 2018 at 8:19 AM Barnabás Maidics
<ba...@cloudera.com.invalid> wrote:
> On Wed, Aug 1, 2018 at 11:17 AM Barnabás Maidics <
> barnabas.maidics@cloudera.com> wrote:
>
> > Hi Everyone!
> >
> > I'm an intern at Cloudera and analysing where the memory goes in Hive. I
> > was looking at a heapdump with many partitions, and found a memory waste,
> > that comes from HDFS.
> >
> > We store paths in hadoop.fs.Path objects. This uses java.net.URI that
> > stores almost the same strings in 3 different objects (see image and
> > further explanation at the link given below). I think it's a waste of
> > memory and it could be reduced by replacing the URI objects. This is why
> > I've created an issue on HDFS side (HDFS-13752
> > <https://issues.apache.org/jira/browse/HDFS-13752>).
> >
> > I'm curious if you store these objects (hadoop.fs.Path), and if you do
> how
> > much it effects the overall memory usage of Impala. It may be beneficial
> > for you as well, if it can be replaced.
> >
> > Thanks,
> >
> > Barnabas Maidics
> >
> >
>