You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@impala.apache.org by Hashan Gayasri <ha...@gmail.com> on 2020/06/04 10:19:49 UTC

Issues in running locally compiled Impala

Hi all,

I've been trying to get Impala (v3.3.0) that I compiled running. Upon
startup, Catalogd (Impalad binary) seems to crash in native code.
In a dynamically-linked, debug build, the stack trace was as follows.

[1]
...
#4  0x00007ffff1fd0a05 in JVM_handle_linux_signal () from
/home/hashan/jdk1.8.0_191/jre/lib/amd64/server/libjvm.so
#5  0x00007ffff1fc3cd8 in signalHandler(int, siginfo*, void*) () from
/home/hashan/jdk1.8.0_191/jre/lib/amd64/server/libjvm.so
#6  <signal handler called>
#7  initCachedClass (cachedJclass=<optimized out>,
className=<optimized out>, env=0x0)
    at /container.redhat6/build/cdh/hadoop/3.0.0-cdh6.x-SNAPSHOT/rpm/BUILD/hadoop-3.0.0-cdh6.x-SNAPSHOT/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jclasses.c:54
#8  initCachedClasses (env=0x0) at
/container.redhat6/build/cdh/hadoop/3.0.0-cdh6.x-SNAPSHOT/rpm/BUILD/hadoop-3.0.0-cdh6.x-SNAPSHOT/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jclasses.c:117
#9  0x00007ffff26a3f62 in getJNIEnv () at
/container.redhat6/build/cdh/hadoop/3.0.0-cdh6.x-SNAPSHOT/rpm/BUILD/hadoop-3.0.0-cdh6.x-SNAPSHOT/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jni_helper.c:555
#10 0x00007ffff26aa3b1 in hdfsBuilderConnect (bld=0x389b2c0)
    at /container.redhat6/build/cdh/hadoop/3.0.0-cdh6.x-SNAPSHOT/rpm/BUILD/hadoop-3.0.0-cdh6.x-SNAPSHOT/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/hdfs.c:697
#11 0x00007ffff77540e3 in impala::JniUtil::InitLibhdfs () at
/home/hashan/BUILD/impala-3.3.0/impala-3.3.0/be/src/util/jni-util.cc:215
#12 0x00007ffff7753660 in impala::JniUtil::Init () at
/home/hashan/BUILD/impala-3.3.0/impala-3.3.0/be/src/util/jni-util.cc:132
#13 0x00007ffff7e84146 in impala::InitCommonRuntime (argc=1,
argv=0x7fffffff65c8, init_jvm=true,
test_mode=impala::TestInfo::NON_TEST) at
/home/hashan/BUILD/impala-3.3.0/impala-3.3.0/be/src/common/init.cc:364
#14 0x00007ffff3c31bdc in CatalogdMain (argc=1, argv=0x7fffffff65c8)
at /home/hashan/BUILD/impala-3.3.0/impala-3.3.0/be/src/catalog/catalogd-main.cc:62
#15 0x00000000008c60ef in main (argc=1, argv=0x7fffffff65c8) at
/home/hashan/BUILD/impala-3.3.0/impala-3.3.0/be/src/service/daemon-main.cc:41
...

The loaded hdfs native library was:
impala-3.3.0/toolchain/cdh_components-1173663/hadoop-3.0.0-cdh6.x-SNAPSHOT/lib/native/libhdfs.so.0.0.0


After getting the same results when using the native hdfs library
(libhdfs.so.0.0.0) shipped with the rpm package
"hadoop-libhdfs-3.0.0+cdh6.3.0-1279813.el7.x86_64",
I tried using the libhdfs.so.0.0.0 library compiled from hadoop v3.1.3
github sources. This seemed to pass the previous stage.

[2]
This time the /tmp/catalogd.ERROR file contained:

E0604 09:50:31.550122 69397 catalog.cc:91] NoClassDefFoundError:
org/apache/hadoop/hive/metastore/api/Database
CAUSED BY: ClassNotFoundException: org.apache.hadoop.hive.metastore.api.Database
. Impalad exiting.
loadFileSystems error:
ClassNotFoundException:
org.apache.hadoop.fs.FileSystemjava.lang.NoClassDefFoundError:
org/apache/hadoop/fs/FileSystem
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.FileSystem
...
hdfsBuilderConnect(forceNewInstance=0, nn=default, port=0,
kerbTicketCachePath=(NULL), userName=(NULL)) error:
ClassNotFoundException:
org.apache.hadoop.conf.Configurationjava.lang.NoClassDefFoundError:
org/apache/hadoop/conf/Configuration
Caused by: java.lang.ClassNotFoundException:
org.apache.hadoop.conf.Configuration
...

[3]
After adding the following jar files to the CLASSPATH,
* hive-3.1.2/lib/hive-metastore-3.1.2.jar
* hive-3.1.2/lib/hive-standalone-metastore-3.1.2.jar
* hadoop-3.1.3/share/hadoop/client/hadoop-client-runtime-3.1.3.jar

/tmp/catalogd.ERROR file contained:
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.FileSystem
Caused by: java.lang.ClassNotFoundException:
org.apache.hadoop.conf.Configuration

In both latter cases using the newly compiled libhdfs.so.0.0.0
library, although the /tmp/catalogd.ERROR is different, the crash
would occur at:
catalog-server.cc:252
(gdb) bt
#0  impala::CatalogServer::Start (this=0x7fffffff6030) at
/home/hashan/BUILD/impala-3.3.0/impala-3.3.0/be/src/catalog/catalog-server.cc:252
#1  0x00007ffff3c3235e in CatalogdMain (argc=1, argv=0x7fffffff65c8)
at /home/hashan/BUILD/impala-3.3.0/impala-3.3.0/be/src/catalog/catalogd-main.cc:87
#2  0x00000000008c60ef in main (argc=1, argv=0x7fffffff65c8) at
/home/hashan/BUILD/impala-3.3.0/impala-3.3.0/be/src/service/daemon-main.cc:41
(gdb) l
252       catalog_.reset(new Catalog());
(gdb) p catalog_
$2 = {px = 0x0}


1) Does anyone have some idea why the first issue arises when using
the native hdfs library built as a part of the toolchain?

2) Does anyone know if the issue in 2nd and 3rd runs (using locally
built libhdfs) is actually related to missing JAR files and if so,
which JAR files are missing from the classpath?

I'm sorry for the length of this mail. Any help in resolving these
issues would be greatly appreciated.

Thanks in advance.

Regards,
Hashan Gayasri

Re: Issues in running locally compiled Impala

Posted by Tim Armstrong <ta...@cloudera.com>.
On Tue, Jun 9, 2020 at 3:59 AM Hashan Gayasri <ha...@gmail.com>
wrote:

> Thanks for the quick response Tim! Setting the classpath according to
> the file set-classpath.sh resolved both crashes.
>
> Regarding the set-classpath.sh file -
> In Impala v3.3.0 release, in addition to files from maven user home,
> the script adds the following lines:
> * impala-3.3.0/fe/src/test/resources
> * impala-3.3.0/fe/target/classes
> * impala-3.3.0/fe/target/dependency
> * impala-3.3.0/fe/target/test-classes
> * $HIVE_HOME/lib/datanucleus-api-jdo-3.2.1.jar
> * $HIVE_HOME/lib/datanucleus-core-3.2.2.jar
> * $HIVE_HOME/lib/datanucleus-rdbms-3.2.1.jar
>

> But the "datanucleus" versions don't correspond to those of the actual
> jar files in the said path. Do the correct datanucleus-*.jar files
> need to be added to the classpath? From what I noticed, only
> "fe/target/dependency" and "fe/target/classes" were actually needed
> out of the above. Is it okay to just keep those in the classpath?

Yeah, there is some cruft in the classpath - you can safely remove the
datanucleus stuff and the various references to fe/src and fe/target -
those are added for the purposes of various tests.


>
> Also, I couldn't figure out which source/release in github would
> correspond to "hadoop-3.0.0-cdh6.x-SNAPSHOT" as the
> "rel/release-3.0.0" tag didn't contain the source file you linked
> (even though the default branch does).
>
This would be a CDH release derived from Hadoop 3 - essentially the Hadoop
3 release with a bunch of patches from later versions on top of it. This is
kinda messy since the sources aren't published to a public place (they used
to be, but things have changed in various ways). There are some source
artifacts included in the tarballs we build against. This is something I'd
like to clean up - our current story here around dependencies isn't great
and is just the way it is for historical reasons. We did do this decoupling
for Kudu recently, would be nice to do it for more components.

>
> Is there a publicly available page that lists down the version
> requirements of the dependencies? Specifically the Apache Kudu,
> Hadoop, Hive, and HBase version requirements since I'm planning to use
> locally compiled versions of the said components for the Impala build.
> I noticed that the "impala-config.sh" file contains the exact versions
> of the dependent components. But is there a version compatibility
> matrix or something similar?
>
Nothing formal - I think the general feeling in the community is that we
don't want to claim to support things unless we're thoroughly testing them
each release. It is genuinely a lot of work to pull together a bunch of
versions of components that works well together, doesn't have security
vulnerabilities, etc. We were able to build against some fairly divergent
versions of dependencies (Hive2 vs Hive 3, etc), but that required a bunch
of shims and can be a bit brittle. I'd expect it's possible to build
against a wide range of source versions of the dependencies, but it might
require tweaks to work around minor issues (different versions of
dependencies, minor changes to APIs, etc). The hardcoded CDH/CDP versions
are definitely the well-beaten path.

As far as wire compatibility. As a general rule the client/server protocols
of the various dependent services are forward compatible. I.e. older
clients can talk with newer servers. In practice they're also often
backward compatible, e.g. the HDFS client protocol is very stable.

In the same file, I noted that there are sometimes even major version
> differences in the CDH version vs the CDP version. Which version
> should I use to if I am to use github releases of the above mentioned
> dependent components?
>
We've been moving towards using the newer CDP dependencies - CDH was the
default for the Impala 3.x release though. Probably the biggest difference
is the Hive version cause we integrate most closely with that - the CDH set
of dependencies is built around Hive 2, and the CDP is built around Hive 3.


> In order to use the locally built versions of Apache Kudu, Hadoop,
> Hive, and HBase, would it be sufficient to set the following variables
> or are there more steps involved?
>
> * DOWNLOAD_CDH_COMPONENTS=false
> * KUDU_BUILD_DIR and KUDU_CLIENT_DIR
> * HIVE_SRC_DIR_OVERRIDE
> * HADOOP_INCLUDE_DIR_OVERRIDE and HADOOP_LIB_DIR_OVERRIDE
>
That looks right to me.


>
>
>
> Thank you.
>
> Regards,
> Hashan
>
>
> On Thu, Jun 4, 2020 at 11:12 PM Tim Armstrong <ta...@cloudera.com>
> wrote:
> >
> > The first crash is a symptom of some classes being missing from the
> classpath. If you look at the code where it crashed, it's loading a bunch
> of HDFS classes -
> https://github.com/apache/hadoop/blob/18c57cf0464f4d1fa95899d75b2f59cae33c7c33/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jclasses.c#L69
> >
> > You need a lot of things on the classpath, you really need something
> automated to set it up correctly. In the dev environment we generate a file
> that contains the classpath
> https://github.com/apache/impala/blob/master/bin/set-classpath.sh#L45
> >
> > On Thu, Jun 4, 2020 at 3:20 AM Hashan Gayasri <ha...@gmail.com>
> wrote:
> >>
> >> Hi all,
> >>
> >> I've been trying to get Impala (v3.3.0) that I compiled running. Upon
> >> startup, Catalogd (Impalad binary) seems to crash in native code.
> >> In a dynamically-linked, debug build, the stack trace was as follows.
> >>
> >> [1]
> >> ...
> >> #4  0x00007ffff1fd0a05 in JVM_handle_linux_signal () from
> >> /home/hashan/jdk1.8.0_191/jre/lib/amd64/server/libjvm.so
> >> #5  0x00007ffff1fc3cd8 in signalHandler(int, siginfo*, void*) () from
> >> /home/hashan/jdk1.8.0_191/jre/lib/amd64/server/libjvm.so
> >> #6  <signal handler called>
> >> #7  initCachedClass (cachedJclass=<optimized out>,
> >> className=<optimized out>, env=0x0)
> >>     at
> /container.redhat6/build/cdh/hadoop/3.0.0-cdh6.x-SNAPSHOT/rpm/BUILD/hadoop-3.0.0-cdh6.x-SNAPSHOT/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jclasses.c:54
> >> #8  initCachedClasses (env=0x0) at
> >>
> /container.redhat6/build/cdh/hadoop/3.0.0-cdh6.x-SNAPSHOT/rpm/BUILD/hadoop-3.0.0-cdh6.x-SNAPSHOT/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jclasses.c:117
> >> #9  0x00007ffff26a3f62 in getJNIEnv () at
> >>
> /container.redhat6/build/cdh/hadoop/3.0.0-cdh6.x-SNAPSHOT/rpm/BUILD/hadoop-3.0.0-cdh6.x-SNAPSHOT/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jni_helper.c:555
> >> #10 0x00007ffff26aa3b1 in hdfsBuilderConnect (bld=0x389b2c0)
> >>     at
> /container.redhat6/build/cdh/hadoop/3.0.0-cdh6.x-SNAPSHOT/rpm/BUILD/hadoop-3.0.0-cdh6.x-SNAPSHOT/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/hdfs.c:697
> >> #11 0x00007ffff77540e3 in impala::JniUtil::InitLibhdfs () at
> >> /home/hashan/BUILD/impala-3.3.0/impala-3.3.0/be/src/util/jni-util.cc:215
> >> #12 0x00007ffff7753660 in impala::JniUtil::Init () at
> >> /home/hashan/BUILD/impala-3.3.0/impala-3.3.0/be/src/util/jni-util.cc:132
> >> #13 0x00007ffff7e84146 in impala::InitCommonRuntime (argc=1,
> >> argv=0x7fffffff65c8, init_jvm=true,
> >> test_mode=impala::TestInfo::NON_TEST) at
> >> /home/hashan/BUILD/impala-3.3.0/impala-3.3.0/be/src/common/init.cc:364
> >> #14 0x00007ffff3c31bdc in CatalogdMain (argc=1, argv=0x7fffffff65c8)
> >> at
> /home/hashan/BUILD/impala-3.3.0/impala-3.3.0/be/src/catalog/catalogd-main.cc:62
> >> #15 0x00000000008c60ef in main (argc=1, argv=0x7fffffff65c8) at
> >>
> /home/hashan/BUILD/impala-3.3.0/impala-3.3.0/be/src/service/daemon-main.cc:41
> >> ...
> >>
> >> The loaded hdfs native library was:
> >>
> impala-3.3.0/toolchain/cdh_components-1173663/hadoop-3.0.0-cdh6.x-SNAPSHOT/lib/native/libhdfs.so.0.0.0
> >>
> >>
> >> After getting the same results when using the native hdfs library
> >> (libhdfs.so.0.0.0) shipped with the rpm package
> >> "hadoop-libhdfs-3.0.0+cdh6.3.0-1279813.el7.x86_64",
> >> I tried using the libhdfs.so.0.0.0 library compiled from hadoop v3.1.3
> >> github sources. This seemed to pass the previous stage.
> >>
> >> [2]
> >> This time the /tmp/catalogd.ERROR file contained:
> >>
> >> E0604 09:50:31.550122 69397 catalog.cc:91] NoClassDefFoundError:
> >> org/apache/hadoop/hive/metastore/api/Database
> >> CAUSED BY: ClassNotFoundException:
> org.apache.hadoop.hive.metastore.api.Database
> >> . Impalad exiting.
> >> loadFileSystems error:
> >> ClassNotFoundException:
> >> org.apache.hadoop.fs.FileSystemjava.lang.NoClassDefFoundError:
> >> org/apache/hadoop/fs/FileSystem
> >> Caused by: java.lang.ClassNotFoundException:
> org.apache.hadoop.fs.FileSystem
> >> ...
> >> hdfsBuilderConnect(forceNewInstance=0, nn=default, port=0,
> >> kerbTicketCachePath=(NULL), userName=(NULL)) error:
> >> ClassNotFoundException:
> >> org.apache.hadoop.conf.Configurationjava.lang.NoClassDefFoundError:
> >> org/apache/hadoop/conf/Configuration
> >> Caused by: java.lang.ClassNotFoundException:
> >> org.apache.hadoop.conf.Configuration
> >> ...
> >>
> >> [3]
> >> After adding the following jar files to the CLASSPATH,
> >> * hive-3.1.2/lib/hive-metastore-3.1.2.jar
> >> * hive-3.1.2/lib/hive-standalone-metastore-3.1.2.jar
> >> * hadoop-3.1.3/share/hadoop/client/hadoop-client-runtime-3.1.3.jar
> >>
> >> /tmp/catalogd.ERROR file contained:
> >> Caused by: java.lang.ClassNotFoundException:
> org.apache.hadoop.fs.FileSystem
> >> Caused by: java.lang.ClassNotFoundException:
> >> org.apache.hadoop.conf.Configuration
> >>
> >> In both latter cases using the newly compiled libhdfs.so.0.0.0
> >> library, although the /tmp/catalogd.ERROR is different, the crash
> >> would occur at:
> >> catalog-server.cc:252
> >> (gdb) bt
> >> #0  impala::CatalogServer::Start (this=0x7fffffff6030) at
> >>
> /home/hashan/BUILD/impala-3.3.0/impala-3.3.0/be/src/catalog/catalog-server.cc:252
> >> #1  0x00007ffff3c3235e in CatalogdMain (argc=1, argv=0x7fffffff65c8)
> >> at
> /home/hashan/BUILD/impala-3.3.0/impala-3.3.0/be/src/catalog/catalogd-main.cc:87
> >> #2  0x00000000008c60ef in main (argc=1, argv=0x7fffffff65c8) at
> >>
> /home/hashan/BUILD/impala-3.3.0/impala-3.3.0/be/src/service/daemon-main.cc:41
> >> (gdb) l
> >> 252       catalog_.reset(new Catalog());
> >> (gdb) p catalog_
> >> $2 = {px = 0x0}
> >>
> >>
> >> 1) Does anyone have some idea why the first issue arises when using
> >> the native hdfs library built as a part of the toolchain?
> >>
> >> 2) Does anyone know if the issue in 2nd and 3rd runs (using locally
> >> built libhdfs) is actually related to missing JAR files and if so,
> >> which JAR files are missing from the classpath?
> >>
> >> I'm sorry for the length of this mail. Any help in resolving these
> >> issues would be greatly appreciated.
> >>
> >> Thanks in advance.
> >>
> >> Regards,
> >> Hashan Gayasri
>
>
>
> --
> -Hashan Gayasri
>

Re: Issues in running locally compiled Impala

Posted by Hashan Gayasri <ha...@gmail.com>.
Thanks for the quick response Tim! Setting the classpath according to
the file set-classpath.sh resolved both crashes.

Regarding the set-classpath.sh file -
In Impala v3.3.0 release, in addition to files from maven user home,
the script adds the following lines:
* impala-3.3.0/fe/src/test/resources
* impala-3.3.0/fe/target/classes
* impala-3.3.0/fe/target/dependency
* impala-3.3.0/fe/target/test-classes
* $HIVE_HOME/lib/datanucleus-api-jdo-3.2.1.jar
* $HIVE_HOME/lib/datanucleus-core-3.2.2.jar
* $HIVE_HOME/lib/datanucleus-rdbms-3.2.1.jar

But the "datanucleus" versions don't correspond to those of the actual
jar files in the said path. Do the correct datanucleus-*.jar files
need to be added to the classpath? From what I noticed, only
"fe/target/dependency" and "fe/target/classes" were actually needed
out of the above. Is it okay to just keep those in the classpath?


Also, I couldn't figure out which source/release in github would
correspond to "hadoop-3.0.0-cdh6.x-SNAPSHOT" as the
"rel/release-3.0.0" tag didn't contain the source file you linked
(even though the default branch does).

Is there a publicly available page that lists down the version
requirements of the dependencies? Specifically the Apache Kudu,
Hadoop, Hive, and HBase version requirements since I'm planning to use
locally compiled versions of the said components for the Impala build.
I noticed that the "impala-config.sh" file contains the exact versions
of the dependent components. But is there a version compatibility
matrix or something similar?
In the same file, I noted that there are sometimes even major version
differences in the CDH version vs the CDP version. Which version
should I use to if I am to use github releases of the above mentioned
dependent components?

In order to use the locally built versions of Apache Kudu, Hadoop,
Hive, and HBase, would it be sufficient to set the following variables
or are there more steps involved?

* DOWNLOAD_CDH_COMPONENTS=false
* KUDU_BUILD_DIR and KUDU_CLIENT_DIR
* HIVE_SRC_DIR_OVERRIDE
* HADOOP_INCLUDE_DIR_OVERRIDE and HADOOP_LIB_DIR_OVERRIDE



Thank you.

Regards,
Hashan


On Thu, Jun 4, 2020 at 11:12 PM Tim Armstrong <ta...@cloudera.com> wrote:
>
> The first crash is a symptom of some classes being missing from the classpath. If you look at the code where it crashed, it's loading a bunch of HDFS classes - https://github.com/apache/hadoop/blob/18c57cf0464f4d1fa95899d75b2f59cae33c7c33/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jclasses.c#L69
>
> You need a lot of things on the classpath, you really need something automated to set it up correctly. In the dev environment we generate a file that contains the classpath https://github.com/apache/impala/blob/master/bin/set-classpath.sh#L45
>
> On Thu, Jun 4, 2020 at 3:20 AM Hashan Gayasri <ha...@gmail.com> wrote:
>>
>> Hi all,
>>
>> I've been trying to get Impala (v3.3.0) that I compiled running. Upon
>> startup, Catalogd (Impalad binary) seems to crash in native code.
>> In a dynamically-linked, debug build, the stack trace was as follows.
>>
>> [1]
>> ...
>> #4  0x00007ffff1fd0a05 in JVM_handle_linux_signal () from
>> /home/hashan/jdk1.8.0_191/jre/lib/amd64/server/libjvm.so
>> #5  0x00007ffff1fc3cd8 in signalHandler(int, siginfo*, void*) () from
>> /home/hashan/jdk1.8.0_191/jre/lib/amd64/server/libjvm.so
>> #6  <signal handler called>
>> #7  initCachedClass (cachedJclass=<optimized out>,
>> className=<optimized out>, env=0x0)
>>     at /container.redhat6/build/cdh/hadoop/3.0.0-cdh6.x-SNAPSHOT/rpm/BUILD/hadoop-3.0.0-cdh6.x-SNAPSHOT/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jclasses.c:54
>> #8  initCachedClasses (env=0x0) at
>> /container.redhat6/build/cdh/hadoop/3.0.0-cdh6.x-SNAPSHOT/rpm/BUILD/hadoop-3.0.0-cdh6.x-SNAPSHOT/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jclasses.c:117
>> #9  0x00007ffff26a3f62 in getJNIEnv () at
>> /container.redhat6/build/cdh/hadoop/3.0.0-cdh6.x-SNAPSHOT/rpm/BUILD/hadoop-3.0.0-cdh6.x-SNAPSHOT/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jni_helper.c:555
>> #10 0x00007ffff26aa3b1 in hdfsBuilderConnect (bld=0x389b2c0)
>>     at /container.redhat6/build/cdh/hadoop/3.0.0-cdh6.x-SNAPSHOT/rpm/BUILD/hadoop-3.0.0-cdh6.x-SNAPSHOT/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/hdfs.c:697
>> #11 0x00007ffff77540e3 in impala::JniUtil::InitLibhdfs () at
>> /home/hashan/BUILD/impala-3.3.0/impala-3.3.0/be/src/util/jni-util.cc:215
>> #12 0x00007ffff7753660 in impala::JniUtil::Init () at
>> /home/hashan/BUILD/impala-3.3.0/impala-3.3.0/be/src/util/jni-util.cc:132
>> #13 0x00007ffff7e84146 in impala::InitCommonRuntime (argc=1,
>> argv=0x7fffffff65c8, init_jvm=true,
>> test_mode=impala::TestInfo::NON_TEST) at
>> /home/hashan/BUILD/impala-3.3.0/impala-3.3.0/be/src/common/init.cc:364
>> #14 0x00007ffff3c31bdc in CatalogdMain (argc=1, argv=0x7fffffff65c8)
>> at /home/hashan/BUILD/impala-3.3.0/impala-3.3.0/be/src/catalog/catalogd-main.cc:62
>> #15 0x00000000008c60ef in main (argc=1, argv=0x7fffffff65c8) at
>> /home/hashan/BUILD/impala-3.3.0/impala-3.3.0/be/src/service/daemon-main.cc:41
>> ...
>>
>> The loaded hdfs native library was:
>> impala-3.3.0/toolchain/cdh_components-1173663/hadoop-3.0.0-cdh6.x-SNAPSHOT/lib/native/libhdfs.so.0.0.0
>>
>>
>> After getting the same results when using the native hdfs library
>> (libhdfs.so.0.0.0) shipped with the rpm package
>> "hadoop-libhdfs-3.0.0+cdh6.3.0-1279813.el7.x86_64",
>> I tried using the libhdfs.so.0.0.0 library compiled from hadoop v3.1.3
>> github sources. This seemed to pass the previous stage.
>>
>> [2]
>> This time the /tmp/catalogd.ERROR file contained:
>>
>> E0604 09:50:31.550122 69397 catalog.cc:91] NoClassDefFoundError:
>> org/apache/hadoop/hive/metastore/api/Database
>> CAUSED BY: ClassNotFoundException: org.apache.hadoop.hive.metastore.api.Database
>> . Impalad exiting.
>> loadFileSystems error:
>> ClassNotFoundException:
>> org.apache.hadoop.fs.FileSystemjava.lang.NoClassDefFoundError:
>> org/apache/hadoop/fs/FileSystem
>> Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.FileSystem
>> ...
>> hdfsBuilderConnect(forceNewInstance=0, nn=default, port=0,
>> kerbTicketCachePath=(NULL), userName=(NULL)) error:
>> ClassNotFoundException:
>> org.apache.hadoop.conf.Configurationjava.lang.NoClassDefFoundError:
>> org/apache/hadoop/conf/Configuration
>> Caused by: java.lang.ClassNotFoundException:
>> org.apache.hadoop.conf.Configuration
>> ...
>>
>> [3]
>> After adding the following jar files to the CLASSPATH,
>> * hive-3.1.2/lib/hive-metastore-3.1.2.jar
>> * hive-3.1.2/lib/hive-standalone-metastore-3.1.2.jar
>> * hadoop-3.1.3/share/hadoop/client/hadoop-client-runtime-3.1.3.jar
>>
>> /tmp/catalogd.ERROR file contained:
>> Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.FileSystem
>> Caused by: java.lang.ClassNotFoundException:
>> org.apache.hadoop.conf.Configuration
>>
>> In both latter cases using the newly compiled libhdfs.so.0.0.0
>> library, although the /tmp/catalogd.ERROR is different, the crash
>> would occur at:
>> catalog-server.cc:252
>> (gdb) bt
>> #0  impala::CatalogServer::Start (this=0x7fffffff6030) at
>> /home/hashan/BUILD/impala-3.3.0/impala-3.3.0/be/src/catalog/catalog-server.cc:252
>> #1  0x00007ffff3c3235e in CatalogdMain (argc=1, argv=0x7fffffff65c8)
>> at /home/hashan/BUILD/impala-3.3.0/impala-3.3.0/be/src/catalog/catalogd-main.cc:87
>> #2  0x00000000008c60ef in main (argc=1, argv=0x7fffffff65c8) at
>> /home/hashan/BUILD/impala-3.3.0/impala-3.3.0/be/src/service/daemon-main.cc:41
>> (gdb) l
>> 252       catalog_.reset(new Catalog());
>> (gdb) p catalog_
>> $2 = {px = 0x0}
>>
>>
>> 1) Does anyone have some idea why the first issue arises when using
>> the native hdfs library built as a part of the toolchain?
>>
>> 2) Does anyone know if the issue in 2nd and 3rd runs (using locally
>> built libhdfs) is actually related to missing JAR files and if so,
>> which JAR files are missing from the classpath?
>>
>> I'm sorry for the length of this mail. Any help in resolving these
>> issues would be greatly appreciated.
>>
>> Thanks in advance.
>>
>> Regards,
>> Hashan Gayasri



--
-Hashan Gayasri

Re: Issues in running locally compiled Impala

Posted by Tim Armstrong <ta...@cloudera.com>.
The first crash is a symptom of some classes being missing from the
classpath. If you look at the code where it crashed, it's loading a bunch
of HDFS classes -
https://github.com/apache/hadoop/blob/18c57cf0464f4d1fa95899d75b2f59cae33c7c33/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jclasses.c#L69

You need a lot of things on the classpath, you really need something
automated to set it up correctly. In the dev environment we generate a file
that contains the classpath
https://github.com/apache/impala/blob/master/bin/set-classpath.sh#L45

On Thu, Jun 4, 2020 at 3:20 AM Hashan Gayasri <ha...@gmail.com>
wrote:

> Hi all,
>
> I've been trying to get Impala (v3.3.0) that I compiled running. Upon
> startup, Catalogd (Impalad binary) seems to crash in native code.
> In a dynamically-linked, debug build, the stack trace was as follows.
>
> [1]
> ...
> #4  0x00007ffff1fd0a05 in JVM_handle_linux_signal () from
> /home/hashan/jdk1.8.0_191/jre/lib/amd64/server/libjvm.so
> #5  0x00007ffff1fc3cd8 in signalHandler(int, siginfo*, void*) () from
> /home/hashan/jdk1.8.0_191/jre/lib/amd64/server/libjvm.so
> #6  <signal handler called>
> #7  initCachedClass (cachedJclass=<optimized out>,
> className=<optimized out>, env=0x0)
>     at
> /container.redhat6/build/cdh/hadoop/3.0.0-cdh6.x-SNAPSHOT/rpm/BUILD/hadoop-3.0.0-cdh6.x-SNAPSHOT/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jclasses.c:54
> #8  initCachedClasses (env=0x0) at
>
> /container.redhat6/build/cdh/hadoop/3.0.0-cdh6.x-SNAPSHOT/rpm/BUILD/hadoop-3.0.0-cdh6.x-SNAPSHOT/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jclasses.c:117
> #9  0x00007ffff26a3f62 in getJNIEnv () at
>
> /container.redhat6/build/cdh/hadoop/3.0.0-cdh6.x-SNAPSHOT/rpm/BUILD/hadoop-3.0.0-cdh6.x-SNAPSHOT/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jni_helper.c:555
> #10 0x00007ffff26aa3b1 in hdfsBuilderConnect (bld=0x389b2c0)
>     at
> /container.redhat6/build/cdh/hadoop/3.0.0-cdh6.x-SNAPSHOT/rpm/BUILD/hadoop-3.0.0-cdh6.x-SNAPSHOT/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/hdfs.c:697
> #11 0x00007ffff77540e3 in impala::JniUtil::InitLibhdfs () at
> /home/hashan/BUILD/impala-3.3.0/impala-3.3.0/be/src/util/jni-util.cc:215
> #12 0x00007ffff7753660 in impala::JniUtil::Init () at
> /home/hashan/BUILD/impala-3.3.0/impala-3.3.0/be/src/util/jni-util.cc:132
> #13 0x00007ffff7e84146 in impala::InitCommonRuntime (argc=1,
> argv=0x7fffffff65c8, init_jvm=true,
> test_mode=impala::TestInfo::NON_TEST) at
> /home/hashan/BUILD/impala-3.3.0/impala-3.3.0/be/src/common/init.cc:364
> #14 0x00007ffff3c31bdc in CatalogdMain (argc=1, argv=0x7fffffff65c8)
> at
> /home/hashan/BUILD/impala-3.3.0/impala-3.3.0/be/src/catalog/catalogd-main.cc:62
> #15 0x00000000008c60ef in main (argc=1, argv=0x7fffffff65c8) at
>
> /home/hashan/BUILD/impala-3.3.0/impala-3.3.0/be/src/service/daemon-main.cc:41
> ...
>
> The loaded hdfs native library was:
>
> impala-3.3.0/toolchain/cdh_components-1173663/hadoop-3.0.0-cdh6.x-SNAPSHOT/lib/native/libhdfs.so.0.0.0
>
>
> After getting the same results when using the native hdfs library
> (libhdfs.so.0.0.0) shipped with the rpm package
> "hadoop-libhdfs-3.0.0+cdh6.3.0-1279813.el7.x86_64",
> I tried using the libhdfs.so.0.0.0 library compiled from hadoop v3.1.3
> github sources. This seemed to pass the previous stage.
>
> [2]
> This time the /tmp/catalogd.ERROR file contained:
>
> E0604 09:50:31.550122 69397 catalog.cc:91] NoClassDefFoundError:
> org/apache/hadoop/hive/metastore/api/Database
> CAUSED BY: ClassNotFoundException:
> org.apache.hadoop.hive.metastore.api.Database
> . Impalad exiting.
> loadFileSystems error:
> ClassNotFoundException:
> org.apache.hadoop.fs.FileSystemjava.lang.NoClassDefFoundError:
> org/apache/hadoop/fs/FileSystem
> Caused by: java.lang.ClassNotFoundException:
> org.apache.hadoop.fs.FileSystem
> ...
> hdfsBuilderConnect(forceNewInstance=0, nn=default, port=0,
> kerbTicketCachePath=(NULL), userName=(NULL)) error:
> ClassNotFoundException:
> org.apache.hadoop.conf.Configurationjava.lang.NoClassDefFoundError:
> org/apache/hadoop/conf/Configuration
> Caused by: java.lang.ClassNotFoundException:
> org.apache.hadoop.conf.Configuration
> ...
>
> [3]
> After adding the following jar files to the CLASSPATH,
> * hive-3.1.2/lib/hive-metastore-3.1.2.jar
> * hive-3.1.2/lib/hive-standalone-metastore-3.1.2.jar
> * hadoop-3.1.3/share/hadoop/client/hadoop-client-runtime-3.1.3.jar
>
> /tmp/catalogd.ERROR file contained:
> Caused by: java.lang.ClassNotFoundException:
> org.apache.hadoop.fs.FileSystem
> Caused by: java.lang.ClassNotFoundException:
> org.apache.hadoop.conf.Configuration
>
> In both latter cases using the newly compiled libhdfs.so.0.0.0
> library, although the /tmp/catalogd.ERROR is different, the crash
> would occur at:
> catalog-server.cc:252
> (gdb) bt
> #0  impala::CatalogServer::Start (this=0x7fffffff6030) at
>
> /home/hashan/BUILD/impala-3.3.0/impala-3.3.0/be/src/catalog/catalog-server.cc:252
> #1  0x00007ffff3c3235e in CatalogdMain (argc=1, argv=0x7fffffff65c8)
> at
> /home/hashan/BUILD/impala-3.3.0/impala-3.3.0/be/src/catalog/catalogd-main.cc:87
> #2  0x00000000008c60ef in main (argc=1, argv=0x7fffffff65c8) at
>
> /home/hashan/BUILD/impala-3.3.0/impala-3.3.0/be/src/service/daemon-main.cc:41
> (gdb) l
> 252       catalog_.reset(new Catalog());
> (gdb) p catalog_
> $2 = {px = 0x0}
>
>
> 1) Does anyone have some idea why the first issue arises when using
> the native hdfs library built as a part of the toolchain?
>
> 2) Does anyone know if the issue in 2nd and 3rd runs (using locally
> built libhdfs) is actually related to missing JAR files and if so,
> which JAR files are missing from the classpath?
>
> I'm sorry for the length of this mail. Any help in resolving these
> issues would be greatly appreciated.
>
> Thanks in advance.
>
> Regards,
> Hashan Gayasri
>