You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@impala.apache.org by Barnabás Maidics <ba...@cloudera.com.INVALID> on 2018/08/01 14:01:02 UTC

Re: hadoop.fs.Path impacts on Impala

On Wed, Aug 1, 2018 at 11:17 AM Barnabás Maidics <
barnabas.maidics@cloudera.com> wrote:

> Hi Everyone!
>
> I'm an intern at Cloudera and analysing where the memory goes in Hive. I
> was looking at a heapdump with many partitions, and found a memory waste,
> that comes from HDFS.
>
> We store paths in hadoop.fs.Path objects. This uses java.net.URI that
> stores almost the same strings in 3 different objects (see image and
> further explanation at the link given below). I think it's a waste of
> memory and it could be reduced by replacing the URI objects. This is why
> I've created an issue on HDFS side (HDFS-13752
> <https://issues.apache.org/jira/browse/HDFS-13752>).
>
> I'm curious if you store these objects (hadoop.fs.Path), and if you do how
> much it effects the overall memory usage of Impala. It may be beneficial
> for you as well, if it can be replaced.
>
> Thanks,
>
> Barnabas Maidics
>
>

Re: hadoop.fs.Path impacts on Impala

Posted by Philip Zeyliger <ph...@cloudera.com.INVALID>.
Hi Barnabas,

If I may suggest a way to approach this sort of question, I'd take a
heapdump of an impalad and a catalogd (using "jmap") and then use Eclipse
MAT or http://www.jxray.com/ to see if we're using Path. You'll want to
load some tables and partitions ahead of time. Based on a little quick
sleuthing (I'm not well-versed in this area of the
code), fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java seems
to use flat buffers to store these. It's likely we don't use the HDFS Path
object in steady state.

I did a quick look on a cluster we have lying around and found negligible
use at that moment, but I'm not totally confident about what's on that
cluster at the moment.

[root@... philip]# sudo -u impala /usr/java/jdk1.8.0_111/bin/jmap -histo
47116 > /tmp/histo
[root@.... philip]# cat /tmp/histo | grep Path
  85:           247          13832  sun.misc.URLClassPath$JarLoader
 567:             3            144  sun.misc.URLClassPath
 568:             6            144  sun.misc.URLClassPath$FileLoader
 724:             4             96
sun.security.provider.certpath.X509CertPath
 762:             2             80  sun.misc.URLClassPath$1
* 893:             4             64  org.apache.hadoop.fs.Path*
 927:             2             64  sun.nio.fs.UnixPath
1030:             2             48  java.io.File$PathStatus
1211:             1             40
org.apache.hadoop.hdfs.protocol.proto.EncryptionZonesProtos$GetEZForPathRequestProto
1226:             1             40  sun.misc.URLClassPath$2
1411:             1             24  [Ljava.io.File$PathStatus;
1470:             1             24
com.sun.org.apache.bcel.internal.util.ClassPath
1645:             1             16
[Lcom.sun.org.apache.bcel.internal.util.ClassPath$PathEntry;
2035:             1             16
org.apache.hadoop.hdfs.protocol.proto.EncryptionZonesProtos$GetEZForPathRequestProto$1
[root@... philip]# head /tmp/histo

 num     #instances         #bytes  class name
----------------------------------------------
   1:        416952       81879632  [B
   2:       1324260       42376320  com.codahale.metrics.LongAdder
   3:        794556       38138688  com.codahale.metrics.EWMA
   4:       1060844       25460256  java.util.concurrent.atomic.AtomicLong
   5:        264852       14831712
com.codahale.metrics.ExponentiallyDecayingReservoir
   6:        264852       12712896  com.codahale.metrics.Meter
   7:        264852       12712896
java.util.concurrent.ConcurrentSkipListMap



-- Philip


On Wed, Aug 1, 2018 at 8:19 AM Barnabás Maidics
<ba...@cloudera.com.invalid> wrote:

> On Wed, Aug 1, 2018 at 11:17 AM Barnabás Maidics <
> barnabas.maidics@cloudera.com> wrote:
>
> > Hi Everyone!
> >
> > I'm an intern at Cloudera and analysing where the memory goes in Hive. I
> > was looking at a heapdump with many partitions, and found a memory waste,
> > that comes from HDFS.
> >
> > We store paths in hadoop.fs.Path objects. This uses java.net.URI that
> > stores almost the same strings in 3 different objects (see image and
> > further explanation at the link given below). I think it's a waste of
> > memory and it could be reduced by replacing the URI objects. This is why
> > I've created an issue on HDFS side (HDFS-13752
> > <https://issues.apache.org/jira/browse/HDFS-13752>).
> >
> > I'm curious if you store these objects (hadoop.fs.Path), and if you do
> how
> > much it effects the overall memory usage of Impala. It may be beneficial
> > for you as well, if it can be replaced.
> >
> > Thanks,
> >
> > Barnabas Maidics
> >
> >
>