You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@impala.apache.org by Jim Apple <jb...@cloudera.com> on 2017/07/29 07:47:24 UTC

Unable to start catalog, but with no error message?

I'm seeing https://issues.apache.org/jira/browse/IMPALA-5700 when trying to
bootstrap a new development environment on an EC2 machine with Ubuntu
14.04, 250GB of free disk space and over 60GB of free memory. I've seen
this with and without the -so flag.

I'm running the below script, which I thought was the canonical way to
bootstrap a development environment. When catalog doesn't start, I don't
see anything amiss in any of the logs. I was thinking that maybe a port is
closed that should be open? I only have port 22 open in my ec2 config.

Has anyone else fixed a problem like this before?

#!/bin/bash -eux

IMPALA_REPO_URL=https://git-wip-us.apache.org/repos/asf/incubator-impala.git
IMPALA_REPO_BRANCH=master

sudo apt-get install --yes git

sudo apt-get install --yes openjdk-7-jdk

# JAVA_HOME needed by chef scripts
export JAVA_HOME="/usr/lib/jvm/$(ls -tr /usr/lib/jvm/ | tail -1)"
$JAVA_HOME/bin/javac -version

# TODO: check that df . is large enough.
df -h .

IMPALA_LOCATION=Impala

cd "/home/$(whoami)"

git clone "${IMPALA_REPO_URL}" "${IMPALA_LOCATION}"
cd "${IMPALA_LOCATION}"
git checkout "${IMPALA_REPO_BRANCH}"
GIT_LOG_FILE=$(mktemp)
git log --pretty=oneline >"${GIT_LOG_FILE}"
head "${GIT_LOG_FILE}"

./bin/bootstrap_development.sh

Re: Unable to start catalog, but with no error message?

Posted by Jim Apple <jb...@cloudera.com>.

I can no longer repro this. ¯\_(ツ)_/¯

On Sun, Jul 30, 2017 at 11:48 AM, Bharath Vissapragada
<bh...@cloudera.com> wrote:
> How about attaching 'strace' to the catalogd startup and see where it
> crashes (if its reproducible on demand) ? May be others have better ideas.
>
> On Sat, Jul 29, 2017 at 3:14 PM, Jim Apple <jb...@cloudera.com> wrote:
>
>> To be specific about "no error message": the logs written in the logs
>> directory near the time of the crash are nearly identical to those of a
>> process that got much further on a machine with a configuration that I do
>> not know how to reproduce. The one that ended earlier has output like:
>>
>> Creating /test-warehouse HDFS directory (logging to
>> /home/ubuntu/Impala/logs/data_loading/create-test-warehouse-dir.log)...
>>     OK (Took: 0 min 2 sec)
>> Derived params for create-load-data.sh:
>> EXPLORATION_STRATEGY=exhaustive
>> SKIP_METADATA_LOAD=0
>> SKIP_SNAPSHOT_LOAD=0
>> SNAPSHOT_FILE=
>> CM_HOST=
>> REMOTE_LOAD=
>> Starting Impala cluster (logging to
>> /home/ubuntu/Impala/logs/data_loading/start-impala-cluster.log)...
>>     FAILED (Took: 0 min 11 sec)
>>     '/home/ubuntu/Impala/bin/start-impala-cluster.py
>> --log_dir=/home/ubuntu/Impala/logs/data_loading -s 3' failed. Tail of log:
>> Log for command '/home/ubuntu/Impala/bin/start-impala-cluster.py
>> --log_dir=/home/ubuntu/Impala/logs/data_loading -s 3'
>> Starting State Store logging to
>> /home/ubuntu/Impala/logs/data_loading/statestored.INFO
>> Starting Catalog Service logging to
>> /home/ubuntu/Impala/logs/data_loading/catalogd.INFO
>> Error starting cluster: Unable to start catalogd. Check log or file
>> permissions for more details.
>> Error in /home/ubuntu/Impala/testdata/bin/create-load-data.sh at line 48:
>> LOAD_DATA_ARGS=""
>> + cleanup
>> + rm -rf /tmp/tmp.HVkbPNl08R
>>
>>
>> The one that got further in the process (and I think may be dying due to a
>> spurious out-of-disk failure that I am putting on the back-burner for the
>> moment) has the following output:
>>
>> Creating /test-warehouse HDFS directory (logging to
>> /home/ubuntu/Impala/logs/data_loading/create-test-warehouse-dir.log)...
>>     OK (Took: 0 min 2 sec)
>> Derived params for create-load-data.sh:
>> EXPLORATION_STRATEGY=exhaustive
>> SKIP_METADATA_LOAD=0
>> SKIP_SNAPSHOT_LOAD=0
>> SNAPSHOT_FILE=
>> CM_HOST=
>> REMOTE_LOAD=
>> Starting Impala cluster (logging to
>> /home/ubuntu/Impala/logs/data_loading/start-impala-cluster.log)...
>>     OK (Took: 0 min 11 sec)
>> Setting up HDFS environment (logging to
>> /home/ubuntu/Impala/logs/data_loading/setup-hdfs-env.log)...
>>     OK (Took: 0 min 8 sec)
>> Loading custom schemas (logging to
>> /home/ubuntu/Impala/logs/data_loading/load-custom-schemas.log)...
>>     OK (Took: 0 min 35 sec)
>> Loading functional-query data (logging to
>> /home/ubuntu/Impala/logs/data_loading/load-functional-query.log)...
>>     OK (Took: 37 min 14 sec)
>> Loading TPC-H data (logging to
>> /home/ubuntu/Impala/logs/data_loading/load-tpch.log)...
>>     OK (Took: 14 min 11 sec)
>> Loading nested data (logging to
>> /home/ubuntu/Impala/logs/data_loading/load-nested.log)...
>>     OK (Took: 3 min 41 sec)
>> Loading TPC-DS data (logging to
>> /home/ubuntu/Impala/logs/data_loading/load-tpcds.log)...
>>     FAILED (Took: 5 min 50 sec)
>>     'load-data tpcds core' failed. Tail of log:
>> ss_net_paid_inc_tax,
>> ss_net_profit,
>> ss_sold_date_sk
>> from store_sales_unpartitioned
>> WHERE ss_sold_date_sk < 2451272
>> distribute by ss_sold_date_sk
>> INFO  : Query ID =
>> ubuntu_20170729150909_583df9cf-e54b-44bf-a104-ef5e690cfa0d
>> INFO  : Total jobs = 1
>> INFO  : Launching Job 1 out of 1
>> INFO  : Starting task [Stage-1:MAPRED] in serial mode
>> INFO  : Number of reduce tasks not specified. Estimated from input data
>> size: 2
>> INFO  : In order to change the average load for a reducer (in bytes):
>> INFO  :   set hive.exec.reducers.bytes.per.reducer=<number>
>> INFO  : In order to limit the maximum number of reducers:
>> INFO  :   set hive.exec.reducers.max=<number>
>> INFO  : In order to set a constant number of reducers:
>> INFO  :   set mapreduce.job.reduces=<number>
>> INFO  : number of splits:2
>> INFO  : Submitting tokens for job: job_local1041198115_0826
>> INFO  : The url to track the job: http://localhost:8080/
>> INFO  : Job running in-process (local Hadoop)
>> INFO  : 2017-07-29 15:09:25,495 Stage-1 map = 0%,  reduce = 0%
>> INFO  : 2017-07-29 15:09:32,498 Stage-1 map = 100%,  reduce = 0%
>> ERROR : Ended Job = job_local1041198115_0826 with errors
>> ERROR : FAILED: Execution Error, return code 2 from
>> org.apache.hadoop.hive.ql.exec.mr.MapRedTask
>> INFO  : MapReduce Jobs Launched:
>> INFO  : Stage-Stage-1:  HDFS Read: 17615502357 HDFS Write: 12907849658 FAIL
>> INFO  : Total MapReduce CPU Time Spent: 0 msec
>> INFO  : Completed executing
>> command(queryId=ubuntu_20170729150909_583df9cf-e54b-
>> 44bf-a104-ef5e690cfa0d);
>> Time taken: 18.314 seconds
>> Error: Error while processing statement: FAILED: Execution Error, return
>> code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
>> (state=08S01,code=2)
>> java.sql.SQLException: Error while processing statement: FAILED: Execution
>> Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
>>         at
>> org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:292)
>>         at
>> org.apache.hive.beeline.Commands.executeInternal(Commands.java:989)
>>         at org.apache.hive.beeline.Commands.execute(Commands.java:1203)
>>         at org.apache.hive.beeline.Commands.sql(Commands.java:1117)
>>         at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:1176)
>>         at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:1010)
>>         at org.apache.hive.beeline.BeeLine.executeFile(BeeLine.java:987)
>>         at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:914)
>>         at
>> org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:518)
>>         at org.apache.hive.beeline.BeeLine.main(BeeLine.java:501)
>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>         at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:
>> 57)
>>         at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(
>> DelegatingMethodAccessorImpl.java:43)
>>         at java.lang.reflect.Method.invoke(Method.java:606)
>>         at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>>         at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
>>
>> Closing: 0: jdbc:hive2://localhost:11050/default;auth=none
>> Error executing file from Hive: load-tpcds-core-hive-generated.sql
>> Error in /home/ubuntu/Impala/testdata/bin/create-load-data.sh at line 48:
>> LOAD_DATA_ARGS=""
>> + cleanup
>> + rm -rf /tmp/tmp.Yfeh8QGfi1
>>
>>
>>
>>
>> On Sat, Jul 29, 2017 at 12:47 AM, Jim Apple <jb...@cloudera.com> wrote:
>>
>> > I'm seeing https://issues.apache.org/jira/browse/IMPALA-5700 when trying
>> > to bootstrap a new development environment on an EC2 machine with Ubuntu
>> > 14.04, 250GB of free disk space and over 60GB of free memory. I've seen
>> > this with and without the -so flag.
>> >
>> > I'm running the below script, which I thought was the canonical way to
>> > bootstrap a development environment. When catalog doesn't start, I don't
>> > see anything amiss in any of the logs. I was thinking that maybe a port
>> is
>> > closed that should be open? I only have port 22 open in my ec2 config.
>> >
>> > Has anyone else fixed a problem like this before?
>> >
>> > #!/bin/bash -eux
>> >
>> > IMPALA_REPO_URL=https://git-wip-us.apache.org/repos/asf/
>> > incubator-impala.git
>> > IMPALA_REPO_BRANCH=master
>> >
>> > sudo apt-get install --yes git
>> >
>> > sudo apt-get install --yes openjdk-7-jdk
>> >
>> > # JAVA_HOME needed by chef scripts
>> > export JAVA_HOME="/usr/lib/jvm/$(ls -tr /usr/lib/jvm/ | tail -1)"
>> > $JAVA_HOME/bin/javac -version
>> >
>> > # TODO: check that df . is large enough.
>> > df -h .
>> >
>> > IMPALA_LOCATION=Impala
>> >
>> > cd "/home/$(whoami)"
>> >
>> > git clone "${IMPALA_REPO_URL}" "${IMPALA_LOCATION}"
>> > cd "${IMPALA_LOCATION}"
>> > git checkout "${IMPALA_REPO_BRANCH}"
>> > GIT_LOG_FILE=$(mktemp)
>> > git log --pretty=oneline >"${GIT_LOG_FILE}"
>> > head "${GIT_LOG_FILE}"
>> >
>> > ./bin/bootstrap_development.sh
>> >
>>

Re: Unable to start catalog, but with no error message?

Posted by Bharath Vissapragada <bh...@cloudera.com>.

How about attaching 'strace' to the catalogd startup and see where it
crashes (if its reproducible on demand) ? May be others have better ideas.

On Sat, Jul 29, 2017 at 3:14 PM, Jim Apple <jb...@cloudera.com> wrote:

> To be specific about "no error message": the logs written in the logs
> directory near the time of the crash are nearly identical to those of a
> process that got much further on a machine with a configuration that I do
> not know how to reproduce. The one that ended earlier has output like:
>
> Creating /test-warehouse HDFS directory (logging to
> /home/ubuntu/Impala/logs/data_loading/create-test-warehouse-dir.log)...
>     OK (Took: 0 min 2 sec)
> Derived params for create-load-data.sh:
> EXPLORATION_STRATEGY=exhaustive
> SKIP_METADATA_LOAD=0
> SKIP_SNAPSHOT_LOAD=0
> SNAPSHOT_FILE=
> CM_HOST=
> REMOTE_LOAD=
> Starting Impala cluster (logging to
> /home/ubuntu/Impala/logs/data_loading/start-impala-cluster.log)...
>     FAILED (Took: 0 min 11 sec)
>     '/home/ubuntu/Impala/bin/start-impala-cluster.py
> --log_dir=/home/ubuntu/Impala/logs/data_loading -s 3' failed. Tail of log:
> Log for command '/home/ubuntu/Impala/bin/start-impala-cluster.py
> --log_dir=/home/ubuntu/Impala/logs/data_loading -s 3'
> Starting State Store logging to
> /home/ubuntu/Impala/logs/data_loading/statestored.INFO
> Starting Catalog Service logging to
> /home/ubuntu/Impala/logs/data_loading/catalogd.INFO
> Error starting cluster: Unable to start catalogd. Check log or file
> permissions for more details.
> Error in /home/ubuntu/Impala/testdata/bin/create-load-data.sh at line 48:
> LOAD_DATA_ARGS=""
> + cleanup
> + rm -rf /tmp/tmp.HVkbPNl08R
>
>
> The one that got further in the process (and I think may be dying due to a
> spurious out-of-disk failure that I am putting on the back-burner for the
> moment) has the following output:
>
> Creating /test-warehouse HDFS directory (logging to
> /home/ubuntu/Impala/logs/data_loading/create-test-warehouse-dir.log)...
>     OK (Took: 0 min 2 sec)
> Derived params for create-load-data.sh:
> EXPLORATION_STRATEGY=exhaustive
> SKIP_METADATA_LOAD=0
> SKIP_SNAPSHOT_LOAD=0
> SNAPSHOT_FILE=
> CM_HOST=
> REMOTE_LOAD=
> Starting Impala cluster (logging to
> /home/ubuntu/Impala/logs/data_loading/start-impala-cluster.log)...
>     OK (Took: 0 min 11 sec)
> Setting up HDFS environment (logging to
> /home/ubuntu/Impala/logs/data_loading/setup-hdfs-env.log)...
>     OK (Took: 0 min 8 sec)
> Loading custom schemas (logging to
> /home/ubuntu/Impala/logs/data_loading/load-custom-schemas.log)...
>     OK (Took: 0 min 35 sec)
> Loading functional-query data (logging to
> /home/ubuntu/Impala/logs/data_loading/load-functional-query.log)...
>     OK (Took: 37 min 14 sec)
> Loading TPC-H data (logging to
> /home/ubuntu/Impala/logs/data_loading/load-tpch.log)...
>     OK (Took: 14 min 11 sec)
> Loading nested data (logging to
> /home/ubuntu/Impala/logs/data_loading/load-nested.log)...
>     OK (Took: 3 min 41 sec)
> Loading TPC-DS data (logging to
> /home/ubuntu/Impala/logs/data_loading/load-tpcds.log)...
>     FAILED (Took: 5 min 50 sec)
>     'load-data tpcds core' failed. Tail of log:
> ss_net_paid_inc_tax,
> ss_net_profit,
> ss_sold_date_sk
> from store_sales_unpartitioned
> WHERE ss_sold_date_sk < 2451272
> distribute by ss_sold_date_sk
> INFO  : Query ID =
> ubuntu_20170729150909_583df9cf-e54b-44bf-a104-ef5e690cfa0d
> INFO  : Total jobs = 1
> INFO  : Launching Job 1 out of 1
> INFO  : Starting task [Stage-1:MAPRED] in serial mode
> INFO  : Number of reduce tasks not specified. Estimated from input data
> size: 2
> INFO  : In order to change the average load for a reducer (in bytes):
> INFO  :   set hive.exec.reducers.bytes.per.reducer=<number>
> INFO  : In order to limit the maximum number of reducers:
> INFO  :   set hive.exec.reducers.max=<number>
> INFO  : In order to set a constant number of reducers:
> INFO  :   set mapreduce.job.reduces=<number>
> INFO  : number of splits:2
> INFO  : Submitting tokens for job: job_local1041198115_0826
> INFO  : The url to track the job: http://localhost:8080/
> INFO  : Job running in-process (local Hadoop)
> INFO  : 2017-07-29 15:09:25,495 Stage-1 map = 0%,  reduce = 0%
> INFO  : 2017-07-29 15:09:32,498 Stage-1 map = 100%,  reduce = 0%
> ERROR : Ended Job = job_local1041198115_0826 with errors
> ERROR : FAILED: Execution Error, return code 2 from
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask
> INFO  : MapReduce Jobs Launched:
> INFO  : Stage-Stage-1:  HDFS Read: 17615502357 HDFS Write: 12907849658 FAIL
> INFO  : Total MapReduce CPU Time Spent: 0 msec
> INFO  : Completed executing
> command(queryId=ubuntu_20170729150909_583df9cf-e54b-
> 44bf-a104-ef5e690cfa0d);
> Time taken: 18.314 seconds
> Error: Error while processing statement: FAILED: Execution Error, return
> code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
> (state=08S01,code=2)
> java.sql.SQLException: Error while processing statement: FAILED: Execution
> Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
>         at
> org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:292)
>         at
> org.apache.hive.beeline.Commands.executeInternal(Commands.java:989)
>         at org.apache.hive.beeline.Commands.execute(Commands.java:1203)
>         at org.apache.hive.beeline.Commands.sql(Commands.java:1117)
>         at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:1176)
>         at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:1010)
>         at org.apache.hive.beeline.BeeLine.executeFile(BeeLine.java:987)
>         at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:914)
>         at
> org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:518)
>         at org.apache.hive.beeline.BeeLine.main(BeeLine.java:501)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:
> 57)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
>
> Closing: 0: jdbc:hive2://localhost:11050/default;auth=none
> Error executing file from Hive: load-tpcds-core-hive-generated.sql
> Error in /home/ubuntu/Impala/testdata/bin/create-load-data.sh at line 48:
> LOAD_DATA_ARGS=""
> + cleanup
> + rm -rf /tmp/tmp.Yfeh8QGfi1
>
>
>
>
> On Sat, Jul 29, 2017 at 12:47 AM, Jim Apple <jb...@cloudera.com> wrote:
>
> > I'm seeing https://issues.apache.org/jira/browse/IMPALA-5700 when trying
> > to bootstrap a new development environment on an EC2 machine with Ubuntu
> > 14.04, 250GB of free disk space and over 60GB of free memory. I've seen
> > this with and without the -so flag.
> >
> > I'm running the below script, which I thought was the canonical way to
> > bootstrap a development environment. When catalog doesn't start, I don't
> > see anything amiss in any of the logs. I was thinking that maybe a port
> is
> > closed that should be open? I only have port 22 open in my ec2 config.
> >
> > Has anyone else fixed a problem like this before?
> >
> > #!/bin/bash -eux
> >
> > IMPALA_REPO_URL=https://git-wip-us.apache.org/repos/asf/
> > incubator-impala.git
> > IMPALA_REPO_BRANCH=master
> >
> > sudo apt-get install --yes git
> >
> > sudo apt-get install --yes openjdk-7-jdk
> >
> > # JAVA_HOME needed by chef scripts
> > export JAVA_HOME="/usr/lib/jvm/$(ls -tr /usr/lib/jvm/ | tail -1)"
> > $JAVA_HOME/bin/javac -version
> >
> > # TODO: check that df . is large enough.
> > df -h .
> >
> > IMPALA_LOCATION=Impala
> >
> > cd "/home/$(whoami)"
> >
> > git clone "${IMPALA_REPO_URL}" "${IMPALA_LOCATION}"
> > cd "${IMPALA_LOCATION}"
> > git checkout "${IMPALA_REPO_BRANCH}"
> > GIT_LOG_FILE=$(mktemp)
> > git log --pretty=oneline >"${GIT_LOG_FILE}"
> > head "${GIT_LOG_FILE}"
> >
> > ./bin/bootstrap_development.sh
> >
>

Re: Unable to start catalog, but with no error message?

Posted by Jim Apple <jb...@cloudera.com>.

To be specific about "no error message": the logs written in the logs
directory near the time of the crash are nearly identical to those of a
process that got much further on a machine with a configuration that I do
not know how to reproduce. The one that ended earlier has output like:

Creating /test-warehouse HDFS directory (logging to
/home/ubuntu/Impala/logs/data_loading/create-test-warehouse-dir.log)...
    OK (Took: 0 min 2 sec)
Derived params for create-load-data.sh:
EXPLORATION_STRATEGY=exhaustive
SKIP_METADATA_LOAD=0
SKIP_SNAPSHOT_LOAD=0
SNAPSHOT_FILE=
CM_HOST=
REMOTE_LOAD=
Starting Impala cluster (logging to
/home/ubuntu/Impala/logs/data_loading/start-impala-cluster.log)...
    FAILED (Took: 0 min 11 sec)
    '/home/ubuntu/Impala/bin/start-impala-cluster.py
--log_dir=/home/ubuntu/Impala/logs/data_loading -s 3' failed. Tail of log:
Log for command '/home/ubuntu/Impala/bin/start-impala-cluster.py
--log_dir=/home/ubuntu/Impala/logs/data_loading -s 3'
Starting State Store logging to
/home/ubuntu/Impala/logs/data_loading/statestored.INFO
Starting Catalog Service logging to
/home/ubuntu/Impala/logs/data_loading/catalogd.INFO
Error starting cluster: Unable to start catalogd. Check log or file
permissions for more details.
Error in /home/ubuntu/Impala/testdata/bin/create-load-data.sh at line 48:
LOAD_DATA_ARGS=""
+ cleanup
+ rm -rf /tmp/tmp.HVkbPNl08R


The one that got further in the process (and I think may be dying due to a
spurious out-of-disk failure that I am putting on the back-burner for the
moment) has the following output:

Creating /test-warehouse HDFS directory (logging to
/home/ubuntu/Impala/logs/data_loading/create-test-warehouse-dir.log)...
    OK (Took: 0 min 2 sec)
Derived params for create-load-data.sh:
EXPLORATION_STRATEGY=exhaustive
SKIP_METADATA_LOAD=0
SKIP_SNAPSHOT_LOAD=0
SNAPSHOT_FILE=
CM_HOST=
REMOTE_LOAD=
Starting Impala cluster (logging to
/home/ubuntu/Impala/logs/data_loading/start-impala-cluster.log)...
    OK (Took: 0 min 11 sec)
Setting up HDFS environment (logging to
/home/ubuntu/Impala/logs/data_loading/setup-hdfs-env.log)...
    OK (Took: 0 min 8 sec)
Loading custom schemas (logging to
/home/ubuntu/Impala/logs/data_loading/load-custom-schemas.log)...
    OK (Took: 0 min 35 sec)
Loading functional-query data (logging to
/home/ubuntu/Impala/logs/data_loading/load-functional-query.log)...
    OK (Took: 37 min 14 sec)
Loading TPC-H data (logging to
/home/ubuntu/Impala/logs/data_loading/load-tpch.log)...
    OK (Took: 14 min 11 sec)
Loading nested data (logging to
/home/ubuntu/Impala/logs/data_loading/load-nested.log)...
    OK (Took: 3 min 41 sec)
Loading TPC-DS data (logging to
/home/ubuntu/Impala/logs/data_loading/load-tpcds.log)...
    FAILED (Took: 5 min 50 sec)
    'load-data tpcds core' failed. Tail of log:
ss_net_paid_inc_tax,
ss_net_profit,
ss_sold_date_sk
from store_sales_unpartitioned
WHERE ss_sold_date_sk < 2451272
distribute by ss_sold_date_sk
INFO  : Query ID =
ubuntu_20170729150909_583df9cf-e54b-44bf-a104-ef5e690cfa0d
INFO  : Total jobs = 1
INFO  : Launching Job 1 out of 1
INFO  : Starting task [Stage-1:MAPRED] in serial mode
INFO  : Number of reduce tasks not specified. Estimated from input data
size: 2
INFO  : In order to change the average load for a reducer (in bytes):
INFO  :   set hive.exec.reducers.bytes.per.reducer=<number>
INFO  : In order to limit the maximum number of reducers:
INFO  :   set hive.exec.reducers.max=<number>
INFO  : In order to set a constant number of reducers:
INFO  :   set mapreduce.job.reduces=<number>
INFO  : number of splits:2
INFO  : Submitting tokens for job: job_local1041198115_0826
INFO  : The url to track the job: http://localhost:8080/
INFO  : Job running in-process (local Hadoop)
INFO  : 2017-07-29 15:09:25,495 Stage-1 map = 0%,  reduce = 0%
INFO  : 2017-07-29 15:09:32,498 Stage-1 map = 100%,  reduce = 0%
ERROR : Ended Job = job_local1041198115_0826 with errors
ERROR : FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.mr.MapRedTask
INFO  : MapReduce Jobs Launched:
INFO  : Stage-Stage-1:  HDFS Read: 17615502357 HDFS Write: 12907849658 FAIL
INFO  : Total MapReduce CPU Time Spent: 0 msec
INFO  : Completed executing
command(queryId=ubuntu_20170729150909_583df9cf-e54b-44bf-a104-ef5e690cfa0d);
Time taken: 18.314 seconds
Error: Error while processing statement: FAILED: Execution Error, return
code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
(state=08S01,code=2)
java.sql.SQLException: Error while processing statement: FAILED: Execution
Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
        at
org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:292)
        at
org.apache.hive.beeline.Commands.executeInternal(Commands.java:989)
        at org.apache.hive.beeline.Commands.execute(Commands.java:1203)
        at org.apache.hive.beeline.Commands.sql(Commands.java:1117)
        at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:1176)
        at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:1010)
        at org.apache.hive.beeline.BeeLine.executeFile(BeeLine.java:987)
        at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:914)
        at
org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:518)
        at org.apache.hive.beeline.BeeLine.main(BeeLine.java:501)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

Closing: 0: jdbc:hive2://localhost:11050/default;auth=none
Error executing file from Hive: load-tpcds-core-hive-generated.sql
Error in /home/ubuntu/Impala/testdata/bin/create-load-data.sh at line 48:
LOAD_DATA_ARGS=""
+ cleanup
+ rm -rf /tmp/tmp.Yfeh8QGfi1




On Sat, Jul 29, 2017 at 12:47 AM, Jim Apple <jb...@cloudera.com> wrote:

> I'm seeing https://issues.apache.org/jira/browse/IMPALA-5700 when trying
> to bootstrap a new development environment on an EC2 machine with Ubuntu
> 14.04, 250GB of free disk space and over 60GB of free memory. I've seen
> this with and without the -so flag.
>
> I'm running the below script, which I thought was the canonical way to
> bootstrap a development environment. When catalog doesn't start, I don't
> see anything amiss in any of the logs. I was thinking that maybe a port is
> closed that should be open? I only have port 22 open in my ec2 config.
>
> Has anyone else fixed a problem like this before?
>
> #!/bin/bash -eux
>
> IMPALA_REPO_URL=https://git-wip-us.apache.org/repos/asf/
> incubator-impala.git
> IMPALA_REPO_BRANCH=master
>
> sudo apt-get install --yes git
>
> sudo apt-get install --yes openjdk-7-jdk
>
> # JAVA_HOME needed by chef scripts
> export JAVA_HOME="/usr/lib/jvm/$(ls -tr /usr/lib/jvm/ | tail -1)"
> $JAVA_HOME/bin/javac -version
>
> # TODO: check that df . is large enough.
> df -h .
>
> IMPALA_LOCATION=Impala
>
> cd "/home/$(whoami)"
>
> git clone "${IMPALA_REPO_URL}" "${IMPALA_LOCATION}"
> cd "${IMPALA_LOCATION}"
> git checkout "${IMPALA_REPO_BRANCH}"
> GIT_LOG_FILE=$(mktemp)
> git log --pretty=oneline >"${GIT_LOG_FILE}"
> head "${GIT_LOG_FILE}"
>
> ./bin/bootstrap_development.sh
>