You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@impala.apache.org by Jim Apple <jb...@cloudera.com> on 2017/07/29 07:47:24 UTC
Unable to start catalog, but with no error message?
I'm seeing https://issues.apache.org/jira/browse/IMPALA-5700 when trying to
bootstrap a new development environment on an EC2 machine with Ubuntu
14.04, 250GB of free disk space and over 60GB of free memory. I've seen
this with and without the -so flag.
I'm running the below script, which I thought was the canonical way to
bootstrap a development environment. When catalog doesn't start, I don't
see anything amiss in any of the logs. I was thinking that maybe a port is
closed that should be open? I only have port 22 open in my ec2 config.
Has anyone else fixed a problem like this before?
#!/bin/bash -eux
IMPALA_REPO_URL=https://git-wip-us.apache.org/repos/asf/incubator-impala.git
IMPALA_REPO_BRANCH=master
sudo apt-get install --yes git
sudo apt-get install --yes openjdk-7-jdk
# JAVA_HOME needed by chef scripts
export JAVA_HOME="/usr/lib/jvm/$(ls -tr /usr/lib/jvm/ | tail -1)"
$JAVA_HOME/bin/javac -version
# TODO: check that df . is large enough.
df -h .
IMPALA_LOCATION=Impala
cd "/home/$(whoami)"
git clone "${IMPALA_REPO_URL}" "${IMPALA_LOCATION}"
cd "${IMPALA_LOCATION}"
git checkout "${IMPALA_REPO_BRANCH}"
GIT_LOG_FILE=$(mktemp)
git log --pretty=oneline >"${GIT_LOG_FILE}"
head "${GIT_LOG_FILE}"
./bin/bootstrap_development.sh
Re: Unable to start catalog, but with no error message?
Posted by Jim Apple <jb...@cloudera.com>.
I can no longer repro this. ¯\_(ツ)_/¯
On Sun, Jul 30, 2017 at 11:48 AM, Bharath Vissapragada
<bh...@cloudera.com> wrote:
> How about attaching 'strace' to the catalogd startup and see where it
> crashes (if its reproducible on demand) ? May be others have better ideas.
>
> On Sat, Jul 29, 2017 at 3:14 PM, Jim Apple <jb...@cloudera.com> wrote:
>
>> To be specific about "no error message": the logs written in the logs
>> directory near the time of the crash are nearly identical to those of a
>> process that got much further on a machine with a configuration that I do
>> not know how to reproduce. The one that ended earlier has output like:
>>
>> Creating /test-warehouse HDFS directory (logging to
>> /home/ubuntu/Impala/logs/data_loading/create-test-warehouse-dir.log)...
>> OK (Took: 0 min 2 sec)
>> Derived params for create-load-data.sh:
>> EXPLORATION_STRATEGY=exhaustive
>> SKIP_METADATA_LOAD=0
>> SKIP_SNAPSHOT_LOAD=0
>> SNAPSHOT_FILE=
>> CM_HOST=
>> REMOTE_LOAD=
>> Starting Impala cluster (logging to
>> /home/ubuntu/Impala/logs/data_loading/start-impala-cluster.log)...
>> FAILED (Took: 0 min 11 sec)
>> '/home/ubuntu/Impala/bin/start-impala-cluster.py
>> --log_dir=/home/ubuntu/Impala/logs/data_loading -s 3' failed. Tail of log:
>> Log for command '/home/ubuntu/Impala/bin/start-impala-cluster.py
>> --log_dir=/home/ubuntu/Impala/logs/data_loading -s 3'
>> Starting State Store logging to
>> /home/ubuntu/Impala/logs/data_loading/statestored.INFO
>> Starting Catalog Service logging to
>> /home/ubuntu/Impala/logs/data_loading/catalogd.INFO
>> Error starting cluster: Unable to start catalogd. Check log or file
>> permissions for more details.
>> Error in /home/ubuntu/Impala/testdata/bin/create-load-data.sh at line 48:
>> LOAD_DATA_ARGS=""
>> + cleanup
>> + rm -rf /tmp/tmp.HVkbPNl08R
>>
>>
>> The one that got further in the process (and I think may be dying due to a
>> spurious out-of-disk failure that I am putting on the back-burner for the
>> moment) has the following output:
>>
>> Creating /test-warehouse HDFS directory (logging to
>> /home/ubuntu/Impala/logs/data_loading/create-test-warehouse-dir.log)...
>> OK (Took: 0 min 2 sec)
>> Derived params for create-load-data.sh:
>> EXPLORATION_STRATEGY=exhaustive
>> SKIP_METADATA_LOAD=0
>> SKIP_SNAPSHOT_LOAD=0
>> SNAPSHOT_FILE=
>> CM_HOST=
>> REMOTE_LOAD=
>> Starting Impala cluster (logging to
>> /home/ubuntu/Impala/logs/data_loading/start-impala-cluster.log)...
>> OK (Took: 0 min 11 sec)
>> Setting up HDFS environment (logging to
>> /home/ubuntu/Impala/logs/data_loading/setup-hdfs-env.log)...
>> OK (Took: 0 min 8 sec)
>> Loading custom schemas (logging to
>> /home/ubuntu/Impala/logs/data_loading/load-custom-schemas.log)...
>> OK (Took: 0 min 35 sec)
>> Loading functional-query data (logging to
>> /home/ubuntu/Impala/logs/data_loading/load-functional-query.log)...
>> OK (Took: 37 min 14 sec)
>> Loading TPC-H data (logging to
>> /home/ubuntu/Impala/logs/data_loading/load-tpch.log)...
>> OK (Took: 14 min 11 sec)
>> Loading nested data (logging to
>> /home/ubuntu/Impala/logs/data_loading/load-nested.log)...
>> OK (Took: 3 min 41 sec)
>> Loading TPC-DS data (logging to
>> /home/ubuntu/Impala/logs/data_loading/load-tpcds.log)...
>> FAILED (Took: 5 min 50 sec)
>> 'load-data tpcds core' failed. Tail of log:
>> ss_net_paid_inc_tax,
>> ss_net_profit,
>> ss_sold_date_sk
>> from store_sales_unpartitioned
>> WHERE ss_sold_date_sk < 2451272
>> distribute by ss_sold_date_sk
>> INFO : Query ID =
>> ubuntu_20170729150909_583df9cf-e54b-44bf-a104-ef5e690cfa0d
>> INFO : Total jobs = 1
>> INFO : Launching Job 1 out of 1
>> INFO : Starting task [Stage-1:MAPRED] in serial mode
>> INFO : Number of reduce tasks not specified. Estimated from input data
>> size: 2
>> INFO : In order to change the average load for a reducer (in bytes):
>> INFO : set hive.exec.reducers.bytes.per.reducer=<number>
>> INFO : In order to limit the maximum number of reducers:
>> INFO : set hive.exec.reducers.max=<number>
>> INFO : In order to set a constant number of reducers:
>> INFO : set mapreduce.job.reduces=<number>
>> INFO : number of splits:2
>> INFO : Submitting tokens for job: job_local1041198115_0826
>> INFO : The url to track the job: http://localhost:8080/
>> INFO : Job running in-process (local Hadoop)
>> INFO : 2017-07-29 15:09:25,495 Stage-1 map = 0%, reduce = 0%
>> INFO : 2017-07-29 15:09:32,498 Stage-1 map = 100%, reduce = 0%
>> ERROR : Ended Job = job_local1041198115_0826 with errors
>> ERROR : FAILED: Execution Error, return code 2 from
>> org.apache.hadoop.hive.ql.exec.mr.MapRedTask
>> INFO : MapReduce Jobs Launched:
>> INFO : Stage-Stage-1: HDFS Read: 17615502357 HDFS Write: 12907849658 FAIL
>> INFO : Total MapReduce CPU Time Spent: 0 msec
>> INFO : Completed executing
>> command(queryId=ubuntu_20170729150909_583df9cf-e54b-
>> 44bf-a104-ef5e690cfa0d);
>> Time taken: 18.314 seconds
>> Error: Error while processing statement: FAILED: Execution Error, return
>> code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
>> (state=08S01,code=2)
>> java.sql.SQLException: Error while processing statement: FAILED: Execution
>> Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
>> at
>> org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:292)
>> at
>> org.apache.hive.beeline.Commands.executeInternal(Commands.java:989)
>> at org.apache.hive.beeline.Commands.execute(Commands.java:1203)
>> at org.apache.hive.beeline.Commands.sql(Commands.java:1117)
>> at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:1176)
>> at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:1010)
>> at org.apache.hive.beeline.BeeLine.executeFile(BeeLine.java:987)
>> at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:914)
>> at
>> org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:518)
>> at org.apache.hive.beeline.BeeLine.main(BeeLine.java:501)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:
>> 57)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(
>> DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:606)
>> at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
>>
>> Closing: 0: jdbc:hive2://localhost:11050/default;auth=none
>> Error executing file from Hive: load-tpcds-core-hive-generated.sql
>> Error in /home/ubuntu/Impala/testdata/bin/create-load-data.sh at line 48:
>> LOAD_DATA_ARGS=""
>> + cleanup
>> + rm -rf /tmp/tmp.Yfeh8QGfi1
>>
>>
>>
>>
>> On Sat, Jul 29, 2017 at 12:47 AM, Jim Apple <jb...@cloudera.com> wrote:
>>
>> > I'm seeing https://issues.apache.org/jira/browse/IMPALA-5700 when trying
>> > to bootstrap a new development environment on an EC2 machine with Ubuntu
>> > 14.04, 250GB of free disk space and over 60GB of free memory. I've seen
>> > this with and without the -so flag.
>> >
>> > I'm running the below script, which I thought was the canonical way to
>> > bootstrap a development environment. When catalog doesn't start, I don't
>> > see anything amiss in any of the logs. I was thinking that maybe a port
>> is
>> > closed that should be open? I only have port 22 open in my ec2 config.
>> >
>> > Has anyone else fixed a problem like this before?
>> >
>> > #!/bin/bash -eux
>> >
>> > IMPALA_REPO_URL=https://git-wip-us.apache.org/repos/asf/
>> > incubator-impala.git
>> > IMPALA_REPO_BRANCH=master
>> >
>> > sudo apt-get install --yes git
>> >
>> > sudo apt-get install --yes openjdk-7-jdk
>> >
>> > # JAVA_HOME needed by chef scripts
>> > export JAVA_HOME="/usr/lib/jvm/$(ls -tr /usr/lib/jvm/ | tail -1)"
>> > $JAVA_HOME/bin/javac -version
>> >
>> > # TODO: check that df . is large enough.
>> > df -h .
>> >
>> > IMPALA_LOCATION=Impala
>> >
>> > cd "/home/$(whoami)"
>> >
>> > git clone "${IMPALA_REPO_URL}" "${IMPALA_LOCATION}"
>> > cd "${IMPALA_LOCATION}"
>> > git checkout "${IMPALA_REPO_BRANCH}"
>> > GIT_LOG_FILE=$(mktemp)
>> > git log --pretty=oneline >"${GIT_LOG_FILE}"
>> > head "${GIT_LOG_FILE}"
>> >
>> > ./bin/bootstrap_development.sh
>> >
>>
Re: Unable to start catalog, but with no error message?
Posted by Bharath Vissapragada <bh...@cloudera.com>.
How about attaching 'strace' to the catalogd startup and see where it
crashes (if its reproducible on demand) ? May be others have better ideas.
On Sat, Jul 29, 2017 at 3:14 PM, Jim Apple <jb...@cloudera.com> wrote:
> To be specific about "no error message": the logs written in the logs
> directory near the time of the crash are nearly identical to those of a
> process that got much further on a machine with a configuration that I do
> not know how to reproduce. The one that ended earlier has output like:
>
> Creating /test-warehouse HDFS directory (logging to
> /home/ubuntu/Impala/logs/data_loading/create-test-warehouse-dir.log)...
> OK (Took: 0 min 2 sec)
> Derived params for create-load-data.sh:
> EXPLORATION_STRATEGY=exhaustive
> SKIP_METADATA_LOAD=0
> SKIP_SNAPSHOT_LOAD=0
> SNAPSHOT_FILE=
> CM_HOST=
> REMOTE_LOAD=
> Starting Impala cluster (logging to
> /home/ubuntu/Impala/logs/data_loading/start-impala-cluster.log)...
> FAILED (Took: 0 min 11 sec)
> '/home/ubuntu/Impala/bin/start-impala-cluster.py
> --log_dir=/home/ubuntu/Impala/logs/data_loading -s 3' failed. Tail of log:
> Log for command '/home/ubuntu/Impala/bin/start-impala-cluster.py
> --log_dir=/home/ubuntu/Impala/logs/data_loading -s 3'
> Starting State Store logging to
> /home/ubuntu/Impala/logs/data_loading/statestored.INFO
> Starting Catalog Service logging to
> /home/ubuntu/Impala/logs/data_loading/catalogd.INFO
> Error starting cluster: Unable to start catalogd. Check log or file
> permissions for more details.
> Error in /home/ubuntu/Impala/testdata/bin/create-load-data.sh at line 48:
> LOAD_DATA_ARGS=""
> + cleanup
> + rm -rf /tmp/tmp.HVkbPNl08R
>
>
> The one that got further in the process (and I think may be dying due to a
> spurious out-of-disk failure that I am putting on the back-burner for the
> moment) has the following output:
>
> Creating /test-warehouse HDFS directory (logging to
> /home/ubuntu/Impala/logs/data_loading/create-test-warehouse-dir.log)...
> OK (Took: 0 min 2 sec)
> Derived params for create-load-data.sh:
> EXPLORATION_STRATEGY=exhaustive
> SKIP_METADATA_LOAD=0
> SKIP_SNAPSHOT_LOAD=0
> SNAPSHOT_FILE=
> CM_HOST=
> REMOTE_LOAD=
> Starting Impala cluster (logging to
> /home/ubuntu/Impala/logs/data_loading/start-impala-cluster.log)...
> OK (Took: 0 min 11 sec)
> Setting up HDFS environment (logging to
> /home/ubuntu/Impala/logs/data_loading/setup-hdfs-env.log)...
> OK (Took: 0 min 8 sec)
> Loading custom schemas (logging to
> /home/ubuntu/Impala/logs/data_loading/load-custom-schemas.log)...
> OK (Took: 0 min 35 sec)
> Loading functional-query data (logging to
> /home/ubuntu/Impala/logs/data_loading/load-functional-query.log)...
> OK (Took: 37 min 14 sec)
> Loading TPC-H data (logging to
> /home/ubuntu/Impala/logs/data_loading/load-tpch.log)...
> OK (Took: 14 min 11 sec)
> Loading nested data (logging to
> /home/ubuntu/Impala/logs/data_loading/load-nested.log)...
> OK (Took: 3 min 41 sec)
> Loading TPC-DS data (logging to
> /home/ubuntu/Impala/logs/data_loading/load-tpcds.log)...
> FAILED (Took: 5 min 50 sec)
> 'load-data tpcds core' failed. Tail of log:
> ss_net_paid_inc_tax,
> ss_net_profit,
> ss_sold_date_sk
> from store_sales_unpartitioned
> WHERE ss_sold_date_sk < 2451272
> distribute by ss_sold_date_sk
> INFO : Query ID =
> ubuntu_20170729150909_583df9cf-e54b-44bf-a104-ef5e690cfa0d
> INFO : Total jobs = 1
> INFO : Launching Job 1 out of 1
> INFO : Starting task [Stage-1:MAPRED] in serial mode
> INFO : Number of reduce tasks not specified. Estimated from input data
> size: 2
> INFO : In order to change the average load for a reducer (in bytes):
> INFO : set hive.exec.reducers.bytes.per.reducer=<number>
> INFO : In order to limit the maximum number of reducers:
> INFO : set hive.exec.reducers.max=<number>
> INFO : In order to set a constant number of reducers:
> INFO : set mapreduce.job.reduces=<number>
> INFO : number of splits:2
> INFO : Submitting tokens for job: job_local1041198115_0826
> INFO : The url to track the job: http://localhost:8080/
> INFO : Job running in-process (local Hadoop)
> INFO : 2017-07-29 15:09:25,495 Stage-1 map = 0%, reduce = 0%
> INFO : 2017-07-29 15:09:32,498 Stage-1 map = 100%, reduce = 0%
> ERROR : Ended Job = job_local1041198115_0826 with errors
> ERROR : FAILED: Execution Error, return code 2 from
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask
> INFO : MapReduce Jobs Launched:
> INFO : Stage-Stage-1: HDFS Read: 17615502357 HDFS Write: 12907849658 FAIL
> INFO : Total MapReduce CPU Time Spent: 0 msec
> INFO : Completed executing
> command(queryId=ubuntu_20170729150909_583df9cf-e54b-
> 44bf-a104-ef5e690cfa0d);
> Time taken: 18.314 seconds
> Error: Error while processing statement: FAILED: Execution Error, return
> code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
> (state=08S01,code=2)
> java.sql.SQLException: Error while processing statement: FAILED: Execution
> Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
> at
> org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:292)
> at
> org.apache.hive.beeline.Commands.executeInternal(Commands.java:989)
> at org.apache.hive.beeline.Commands.execute(Commands.java:1203)
> at org.apache.hive.beeline.Commands.sql(Commands.java:1117)
> at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:1176)
> at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:1010)
> at org.apache.hive.beeline.BeeLine.executeFile(BeeLine.java:987)
> at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:914)
> at
> org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:518)
> at org.apache.hive.beeline.BeeLine.main(BeeLine.java:501)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:
> 57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
>
> Closing: 0: jdbc:hive2://localhost:11050/default;auth=none
> Error executing file from Hive: load-tpcds-core-hive-generated.sql
> Error in /home/ubuntu/Impala/testdata/bin/create-load-data.sh at line 48:
> LOAD_DATA_ARGS=""
> + cleanup
> + rm -rf /tmp/tmp.Yfeh8QGfi1
>
>
>
>
> On Sat, Jul 29, 2017 at 12:47 AM, Jim Apple <jb...@cloudera.com> wrote:
>
> > I'm seeing https://issues.apache.org/jira/browse/IMPALA-5700 when trying
> > to bootstrap a new development environment on an EC2 machine with Ubuntu
> > 14.04, 250GB of free disk space and over 60GB of free memory. I've seen
> > this with and without the -so flag.
> >
> > I'm running the below script, which I thought was the canonical way to
> > bootstrap a development environment. When catalog doesn't start, I don't
> > see anything amiss in any of the logs. I was thinking that maybe a port
> is
> > closed that should be open? I only have port 22 open in my ec2 config.
> >
> > Has anyone else fixed a problem like this before?
> >
> > #!/bin/bash -eux
> >
> > IMPALA_REPO_URL=https://git-wip-us.apache.org/repos/asf/
> > incubator-impala.git
> > IMPALA_REPO_BRANCH=master
> >
> > sudo apt-get install --yes git
> >
> > sudo apt-get install --yes openjdk-7-jdk
> >
> > # JAVA_HOME needed by chef scripts
> > export JAVA_HOME="/usr/lib/jvm/$(ls -tr /usr/lib/jvm/ | tail -1)"
> > $JAVA_HOME/bin/javac -version
> >
> > # TODO: check that df . is large enough.
> > df -h .
> >
> > IMPALA_LOCATION=Impala
> >
> > cd "/home/$(whoami)"
> >
> > git clone "${IMPALA_REPO_URL}" "${IMPALA_LOCATION}"
> > cd "${IMPALA_LOCATION}"
> > git checkout "${IMPALA_REPO_BRANCH}"
> > GIT_LOG_FILE=$(mktemp)
> > git log --pretty=oneline >"${GIT_LOG_FILE}"
> > head "${GIT_LOG_FILE}"
> >
> > ./bin/bootstrap_development.sh
> >
>
Re: Unable to start catalog, but with no error message?
Posted by Jim Apple <jb...@cloudera.com>.
To be specific about "no error message": the logs written in the logs
directory near the time of the crash are nearly identical to those of a
process that got much further on a machine with a configuration that I do
not know how to reproduce. The one that ended earlier has output like:
Creating /test-warehouse HDFS directory (logging to
/home/ubuntu/Impala/logs/data_loading/create-test-warehouse-dir.log)...
OK (Took: 0 min 2 sec)
Derived params for create-load-data.sh:
EXPLORATION_STRATEGY=exhaustive
SKIP_METADATA_LOAD=0
SKIP_SNAPSHOT_LOAD=0
SNAPSHOT_FILE=
CM_HOST=
REMOTE_LOAD=
Starting Impala cluster (logging to
/home/ubuntu/Impala/logs/data_loading/start-impala-cluster.log)...
FAILED (Took: 0 min 11 sec)
'/home/ubuntu/Impala/bin/start-impala-cluster.py
--log_dir=/home/ubuntu/Impala/logs/data_loading -s 3' failed. Tail of log:
Log for command '/home/ubuntu/Impala/bin/start-impala-cluster.py
--log_dir=/home/ubuntu/Impala/logs/data_loading -s 3'
Starting State Store logging to
/home/ubuntu/Impala/logs/data_loading/statestored.INFO
Starting Catalog Service logging to
/home/ubuntu/Impala/logs/data_loading/catalogd.INFO
Error starting cluster: Unable to start catalogd. Check log or file
permissions for more details.
Error in /home/ubuntu/Impala/testdata/bin/create-load-data.sh at line 48:
LOAD_DATA_ARGS=""
+ cleanup
+ rm -rf /tmp/tmp.HVkbPNl08R
The one that got further in the process (and I think may be dying due to a
spurious out-of-disk failure that I am putting on the back-burner for the
moment) has the following output:
Creating /test-warehouse HDFS directory (logging to
/home/ubuntu/Impala/logs/data_loading/create-test-warehouse-dir.log)...
OK (Took: 0 min 2 sec)
Derived params for create-load-data.sh:
EXPLORATION_STRATEGY=exhaustive
SKIP_METADATA_LOAD=0
SKIP_SNAPSHOT_LOAD=0
SNAPSHOT_FILE=
CM_HOST=
REMOTE_LOAD=
Starting Impala cluster (logging to
/home/ubuntu/Impala/logs/data_loading/start-impala-cluster.log)...
OK (Took: 0 min 11 sec)
Setting up HDFS environment (logging to
/home/ubuntu/Impala/logs/data_loading/setup-hdfs-env.log)...
OK (Took: 0 min 8 sec)
Loading custom schemas (logging to
/home/ubuntu/Impala/logs/data_loading/load-custom-schemas.log)...
OK (Took: 0 min 35 sec)
Loading functional-query data (logging to
/home/ubuntu/Impala/logs/data_loading/load-functional-query.log)...
OK (Took: 37 min 14 sec)
Loading TPC-H data (logging to
/home/ubuntu/Impala/logs/data_loading/load-tpch.log)...
OK (Took: 14 min 11 sec)
Loading nested data (logging to
/home/ubuntu/Impala/logs/data_loading/load-nested.log)...
OK (Took: 3 min 41 sec)
Loading TPC-DS data (logging to
/home/ubuntu/Impala/logs/data_loading/load-tpcds.log)...
FAILED (Took: 5 min 50 sec)
'load-data tpcds core' failed. Tail of log:
ss_net_paid_inc_tax,
ss_net_profit,
ss_sold_date_sk
from store_sales_unpartitioned
WHERE ss_sold_date_sk < 2451272
distribute by ss_sold_date_sk
INFO : Query ID =
ubuntu_20170729150909_583df9cf-e54b-44bf-a104-ef5e690cfa0d
INFO : Total jobs = 1
INFO : Launching Job 1 out of 1
INFO : Starting task [Stage-1:MAPRED] in serial mode
INFO : Number of reduce tasks not specified. Estimated from input data
size: 2
INFO : In order to change the average load for a reducer (in bytes):
INFO : set hive.exec.reducers.bytes.per.reducer=<number>
INFO : In order to limit the maximum number of reducers:
INFO : set hive.exec.reducers.max=<number>
INFO : In order to set a constant number of reducers:
INFO : set mapreduce.job.reduces=<number>
INFO : number of splits:2
INFO : Submitting tokens for job: job_local1041198115_0826
INFO : The url to track the job: http://localhost:8080/
INFO : Job running in-process (local Hadoop)
INFO : 2017-07-29 15:09:25,495 Stage-1 map = 0%, reduce = 0%
INFO : 2017-07-29 15:09:32,498 Stage-1 map = 100%, reduce = 0%
ERROR : Ended Job = job_local1041198115_0826 with errors
ERROR : FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.mr.MapRedTask
INFO : MapReduce Jobs Launched:
INFO : Stage-Stage-1: HDFS Read: 17615502357 HDFS Write: 12907849658 FAIL
INFO : Total MapReduce CPU Time Spent: 0 msec
INFO : Completed executing
command(queryId=ubuntu_20170729150909_583df9cf-e54b-44bf-a104-ef5e690cfa0d);
Time taken: 18.314 seconds
Error: Error while processing statement: FAILED: Execution Error, return
code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
(state=08S01,code=2)
java.sql.SQLException: Error while processing statement: FAILED: Execution
Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
at
org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:292)
at
org.apache.hive.beeline.Commands.executeInternal(Commands.java:989)
at org.apache.hive.beeline.Commands.execute(Commands.java:1203)
at org.apache.hive.beeline.Commands.sql(Commands.java:1117)
at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:1176)
at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:1010)
at org.apache.hive.beeline.BeeLine.executeFile(BeeLine.java:987)
at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:914)
at
org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:518)
at org.apache.hive.beeline.BeeLine.main(BeeLine.java:501)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Closing: 0: jdbc:hive2://localhost:11050/default;auth=none
Error executing file from Hive: load-tpcds-core-hive-generated.sql
Error in /home/ubuntu/Impala/testdata/bin/create-load-data.sh at line 48:
LOAD_DATA_ARGS=""
+ cleanup
+ rm -rf /tmp/tmp.Yfeh8QGfi1
On Sat, Jul 29, 2017 at 12:47 AM, Jim Apple <jb...@cloudera.com> wrote:
> I'm seeing https://issues.apache.org/jira/browse/IMPALA-5700 when trying
> to bootstrap a new development environment on an EC2 machine with Ubuntu
> 14.04, 250GB of free disk space and over 60GB of free memory. I've seen
> this with and without the -so flag.
>
> I'm running the below script, which I thought was the canonical way to
> bootstrap a development environment. When catalog doesn't start, I don't
> see anything amiss in any of the logs. I was thinking that maybe a port is
> closed that should be open? I only have port 22 open in my ec2 config.
>
> Has anyone else fixed a problem like this before?
>
> #!/bin/bash -eux
>
> IMPALA_REPO_URL=https://git-wip-us.apache.org/repos/asf/
> incubator-impala.git
> IMPALA_REPO_BRANCH=master
>
> sudo apt-get install --yes git
>
> sudo apt-get install --yes openjdk-7-jdk
>
> # JAVA_HOME needed by chef scripts
> export JAVA_HOME="/usr/lib/jvm/$(ls -tr /usr/lib/jvm/ | tail -1)"
> $JAVA_HOME/bin/javac -version
>
> # TODO: check that df . is large enough.
> df -h .
>
> IMPALA_LOCATION=Impala
>
> cd "/home/$(whoami)"
>
> git clone "${IMPALA_REPO_URL}" "${IMPALA_LOCATION}"
> cd "${IMPALA_LOCATION}"
> git checkout "${IMPALA_REPO_BRANCH}"
> GIT_LOG_FILE=$(mktemp)
> git log --pretty=oneline >"${GIT_LOG_FILE}"
> head "${GIT_LOG_FILE}"
>
> ./bin/bootstrap_development.sh
>