You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@impala.apache.org by 黄权隆 <hu...@126.com> on 2017/06/01 07:54:42 UTC

Failed to load test data about TPC-H

Hi friends,


I'm trying to run the impala tests. What I referred is the wiki 'How to load and run Impala tests'. 
Although I just want to run some end-to-end tests, I know I should load the test data first. So I use
|
./buildall.sh -noclean -testdata
|
It succeeded to load the functional test data, but failed to load the tpch data set. Here are some related logs:


/home/CORP/quanlong.huang/workspace/Impala-cdh5.7.3-release/testdata/target
SUCCESS, data generated into /home/CORP/quanlong.huang/workspace/Impala-cdh5.7.3-release/testdata/target
Loading Hive Builtins (logging to load-hive-builtins.log)... OK
Generating HBase data (logging to create-hbase.log)... OK
Creating /test-warehouse HDFS directory (logging to create-test-warehouse-dir.log)... OK
Starting Impala cluster (logging to start-impala-cluster.log)... OK
Setting up HDFS environment (logging to setup-hdfs-env.log)... OK
Loading custom schemas (logging to load-custom-schemas.log)... OK
Loading functional-query data (logging to load-functional-query.log)... OK
Loading TPC-H data (logging to load-tpch.log)... FAILED
'load-data tpch core' failed. Tail of log:
Log for command 'load-data tpch core'
Loading workload 'tpch' Using exploration strategy 'core'. Logging to /home/CORP/quanlong.huang/workspace/Impala-cdh5.7.3-release/cluster_logs/data_loading/data-load-tpch-core.log
Error loading data. The end of the log file is:
        at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
        at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
        at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
        at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hive.ql.parse.SemanticException: Line 1:23 Invalid path ''/home/CORP/quanlong.huang/workspace/Impala-cdh5.7.3-release/testdata/impala-data/tpch/lineitem'': No files matching path file:/home/CORP/quanlong.huang/workspace/Impala-cdh5.7.3-release/testdata/impala-data/tpch/lineitem
        at org.apache.hadoop.hive.ql.parse.LoadSemanticAnalyzer.applyConstraints(LoadSemanticAnalyzer.java:139)
        at org.apache.hadoop.hive.ql.parse.LoadSemanticAnalyzer.analyzeInternal(LoadSemanticAnalyzer.java:230)
        at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:222)
        at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:445)
        at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:311)
        at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1189)
        at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1176)
        at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:134)
        ... 26 more


Closing: 0: jdbc:hive2://localhost:11050/default;auth=none
Error executing file from Hive: load-tpch-core-hive-generated.sql
Error in /home/CORP/quanlong.huang/workspace/Impala-cdh5.7.3-release/testdata/bin/create-load-data.sh at line 41: while [ -n "$*" ]
Error in ./buildall.sh at line 368: ${IMPALA_HOME}/testdata/bin/create-load-data.sh ${CREATE_LOAD_DATA_ARGS} <<< Y


I'm using version cdh5.7.3-release. The directory ${IMPALA_HOME}/testdata/impala-data dose not exist.


Could you tell me how to generate this data set? Or where can I download the snapshot file of test-warehouse so I can skip this step?


Thanks
----
Quanlong



【网易自营|30天无忧退货】德国Birkenstock制造商“经典软木凉拖”限时仅69.9元>>        



【网易自营|30天无忧退货】德国Birkenstock制造商“经典软木凉拖”限时仅69.9元>>        



【网易自营|30天无忧退货】德国Birkenstock制造商“经典软木凉拖”限时仅69.9元>>        

Re:Re: Re: Failed to load test data about TPC-H

Posted by 黄权隆 <hu...@126.com>.
OK. Thanks!


----

Quanlong

在 2017-06-02 00:21:40,"Tim Armstrong" <ta...@cloudera.com> 写道:

We don't test with mixed versions like that unfortunately.



On Thu, Jun 1, 2017 at 8:02 AM, 黄权隆 <hu...@126.com> wrote:

Hi Tim,


Thanks for you reply! I'll try these scripts later. One more question.
Is the latest Impala compatible with components in CDH-5.7.3? 
For example, Hadoop-2.6.0 and Hive-1.1.0?


We use the old version cdh-5.7.3-release just due to the concern
of incompatibility.


Thanks
----

Quanlong



At 2017-06-01 21:31:17, "Tim Armstrong" <ta...@cloudera.com> wrote:
>Hi Quanlong,
>  It looks like you're missing the TPC-H data. In older versions of Impala
>you had to generate the data manually and put it in that directory. We've
>automated that in more recent versions (I think probably since a year ago).
>If you can switch to a newer version, then this will just work. Data
>loading is a lot more reliable now.
>
>Otherwise this is the script that generates the data. You can probably copy
>this script to your repository and run it by hand:
>
>https://github.com/apache/incubator-impala/blob/master/testdata/datasets/tpch/preload
>
>You will also need to do the same for TPC-DS:
>https://github.com/apache/incubator-impala/blob/master/testdata/datasets/tpcds/preload
>
>
>Cheers,
>Tim
>
>On Thu, Jun 1, 2017 at 12:54 AM, 黄权隆 <hu...@126.com> wrote:
>
>> Hi friends,
>>
>>
>> I'm trying to run the impala tests. What I referred is the wiki 'How to
>> load and run Impala tests'.
>> Although I just want to run some end-to-end tests, I know I should load
>> the test data first. So I use
>> |
>> ./buildall.sh -noclean -testdata
>> |
>> It succeeded to load the functional test data, but failed to load the tpch
>> data set. Here are some related logs:
>>
>>
>> /home/CORP/quanlong.huang/workspace/Impala-cdh5.7.3-
>> release/testdata/target
>> SUCCESS, data generated into /home/CORP/quanlong.huang/
>> workspace/Impala-cdh5.7.3-release/testdata/target
>> Loading Hive Builtins (logging to load-hive-builtins.log)... OK
>> Generating HBase data (logging to create-hbase.log)... OK
>> Creating /test-warehouse HDFS directory (logging to
>> create-test-warehouse-dir.log)... OK
>> Starting Impala cluster (logging to start-impala-cluster.log)... OK
>> Setting up HDFS environment (logging to setup-hdfs-env.log)... OK
>> Loading custom schemas (logging to load-custom-schemas.log)... OK
>> Loading functional-query data (logging to load-functional-query.log)... OK
>> Loading TPC-H data (logging to load-tpch.log)... FAILED
>> 'load-data tpch core' failed. Tail of log:
>> Log for command 'load-data tpch core'
>> Loading workload 'tpch' Using exploration strategy 'core'. Logging to
>> /home/CORP/quanlong.huang/workspace/Impala-cdh5.7.3-
>> release/cluster_logs/data_loading/data-load-tpch-core.log
>> Error loading data. The end of the log file is:
>>         at org.apache.thrift.ProcessFunction.process(
>> ProcessFunction.java:39)
>>         at org.apache.thrift.TBaseProcessor.process(
>> TBaseProcessor.java:39)
>>         at org.apache.hive.service.auth.TSetIpAddressProcessor.process(
>> TSetIpAddressProcessor.java:56)
>>         at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(
>> TThreadPoolServer.java:285)
>>         at java.util.concurrent.ThreadPoolExecutor.runWorker(
>> ThreadPoolExecutor.java:1145)
>>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(
>> ThreadPoolExecutor.java:615)
>>         at java.lang.Thread.run(Thread.java:745)
>> Caused by: org.apache.hadoop.hive.ql.parse.SemanticException: Line 1:23
>> Invalid path ''/home/CORP/quanlong.huang/workspace/Impala-cdh5.7.3-
>> release/testdata/impala-data/tpch/lineitem'': No files matching path
>> file:/home/CORP/quanlong.huang/workspace/Impala-cdh5.7.
>> 3-release/testdata/impala-data/tpch/lineitem
>>         at org.apache.hadoop.hive.ql.parse.LoadSemanticAnalyzer.
>> applyConstraints(LoadSemanticAnalyzer.java:139)
>>         at org.apache.hadoop.hive.ql.parse.LoadSemanticAnalyzer.
>> analyzeInternal(LoadSemanticAnalyzer.java:230)
>>         at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.
>> analyze(BaseSemanticAnalyzer.java:222)
>>         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:445)
>>         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:311)
>>         at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.
>> java:1189)
>>         at org.apache.hadoop.hive.ql.Driver.compileAndRespond(
>> Driver.java:1176)
>>         at org.apache.hive.service.cli.operation.SQLOperation.
>> prepare(SQLOperation.java:134)
>>         ... 26 more
>>
>>
>> Closing: 0: jdbc:hive2://localhost:11050/default;auth=none
>> Error executing file from Hive: load-tpch-core-hive-generated.sql
>> Error in /home/CORP/quanlong.huang/workspace/Impala-cdh5.7.3-
>> release/testdata/bin/create-load-data.sh at line 41: while [ -n "$*" ]
>> Error in ./buildall.sh at line 368: ${IMPALA_HOME}/testdata/bin/create-load-data.sh
>> ${CREATE_LOAD_DATA_ARGS} <<< Y
>>
>>
>> I'm using version cdh5.7.3-release. The directory ${IMPALA_HOME}/testdata/impala-data
>> dose not exist.
>>
>>
>> Could you tell me how to generate this data set? Or where can I download
>> the snapshot file of test-warehouse so I can skip this step?
>>
>>
>> Thanks
>> ----
>> Quanlong
>>
>>
>>
>> 【网易自营|30天无忧退货】德国Birkenstock制造商“经典软木凉拖”限时仅69.9元>>
>>
>>
>>
>> 【网易自营|30天无忧退货】德国Birkenstock制造商“经典软木凉拖”限时仅69.9元>>
>>
>>
>>
>> 【网易自营|30天无忧退货】德国Birkenstock制造商“经典软木凉拖”限时仅69.9元>>





 



Re: Re: Failed to load test data about TPC-H

Posted by Tim Armstrong <ta...@cloudera.com>.
We don't test with mixed versions like that unfortunately.

On Thu, Jun 1, 2017 at 8:02 AM, 黄权隆 <hu...@126.com> wrote:

> Hi Tim,
>
> Thanks for you reply! I'll try these scripts later. One more question.
> Is the latest Impala compatible with components in CDH-5.7.3?
> For example, Hadoop-2.6.0 and Hive-1.1.0?
>
> We use the old version cdh-5.7.3-release just due to the concern
> of incompatibility.
>
> Thanks
> ----
> Quanlong
>
>
> At 2017-06-01 21:31:17, "Tim Armstrong" <ta...@cloudera.com> wrote:
> >Hi Quanlong,
> >  It looks like you're missing the TPC-H data. In older versions of Impala
> >you had to generate the data manually and put it in that directory. We've
> >automated that in more recent versions (I think probably since a year ago).
> >If you can switch to a newer version, then this will just work. Data
> >loading is a lot more reliable now.
> >
> >Otherwise this is the script that generates the data. You can probably copy
> >this script to your repository and run it by hand:
> >
> >https://github.com/apache/incubator-impala/blob/master/testdata/datasets/tpch/preload
> >
> >You will also need to do the same for TPC-DS:
> >https://github.com/apache/incubator-impala/blob/master/testdata/datasets/tpcds/preload
> >
> >
> >Cheers,
> >Tim
> >
> >On Thu, Jun 1, 2017 at 12:54 AM, 黄权隆 <hu...@126.com> wrote:
> >
> >> Hi friends,
> >>
> >>
> >> I'm trying to run the impala tests. What I referred is the wiki 'How to
> >> load and run Impala tests'.
> >> Although I just want to run some end-to-end tests, I know I should load
> >> the test data first. So I use
> >> |
> >> ./buildall.sh -noclean -testdata
> >> |
> >> It succeeded to load the functional test data, but failed to load the tpch
> >> data set. Here are some related logs:
> >>
> >>
> >> /home/CORP/quanlong.huang/workspace/Impala-cdh5.7.3-
> >> release/testdata/target
> >> SUCCESS, data generated into /home/CORP/quanlong.huang/
> >> workspace/Impala-cdh5.7.3-release/testdata/target
> >> Loading Hive Builtins (logging to load-hive-builtins.log)... OK
> >> Generating HBase data (logging to create-hbase.log)... OK
> >> Creating /test-warehouse HDFS directory (logging to
> >> create-test-warehouse-dir.log)... OK
> >> Starting Impala cluster (logging to start-impala-cluster.log)... OK
> >> Setting up HDFS environment (logging to setup-hdfs-env.log)... OK
> >> Loading custom schemas (logging to load-custom-schemas.log)... OK
> >> Loading functional-query data (logging to load-functional-query.log)... OK
> >> Loading TPC-H data (logging to load-tpch.log)... FAILED
> >> 'load-data tpch core' failed. Tail of log:
> >> Log for command 'load-data tpch core'
> >> Loading workload 'tpch' Using exploration strategy 'core'. Logging to
> >> /home/CORP/quanlong.huang/workspace/Impala-cdh5.7.3-
> >> release/cluster_logs/data_loading/data-load-tpch-core.log
> >> Error loading data. The end of the log file is:
> >>         at org.apache.thrift.ProcessFunction.process(
> >> ProcessFunction.java:39)
> >>         at org.apache.thrift.TBaseProcessor.process(
> >> TBaseProcessor.java:39)
> >>         at org.apache.hive.service.auth.TSetIpAddressProcessor.process(
> >> TSetIpAddressProcessor.java:56)
> >>         at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(
> >> TThreadPoolServer.java:285)
> >>         at java.util.concurrent.ThreadPoolExecutor.runWorker(
> >> ThreadPoolExecutor.java:1145)
> >>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> >> ThreadPoolExecutor.java:615)
> >>         at java.lang.Thread.run(Thread.java:745)
> >> Caused by: org.apache.hadoop.hive.ql.parse.SemanticException: Line 1:23
> >> Invalid path ''/home/CORP/quanlong.huang/workspace/Impala-cdh5.7.3-
> >> release/testdata/impala-data/tpch/lineitem'': No files matching path
> >> file:/home/CORP/quanlong.huang/workspace/Impala-cdh5.7.
> >> 3-release/testdata/impala-data/tpch/lineitem
> >>         at org.apache.hadoop.hive.ql.parse.LoadSemanticAnalyzer.
> >> applyConstraints(LoadSemanticAnalyzer.java:139)
> >>         at org.apache.hadoop.hive.ql.parse.LoadSemanticAnalyzer.
> >> analyzeInternal(LoadSemanticAnalyzer.java:230)
> >>         at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.
> >> analyze(BaseSemanticAnalyzer.java:222)
> >>         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:445)
> >>         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:311)
> >>         at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.
> >> java:1189)
> >>         at org.apache.hadoop.hive.ql.Driver.compileAndRespond(
> >> Driver.java:1176)
> >>         at org.apache.hive.service.cli.operation.SQLOperation.
> >> prepare(SQLOperation.java:134)
> >>         ... 26 more
> >>
> >>
> >> Closing: 0: jdbc:hive2://localhost:11050/default;auth=none
> >> Error executing file from Hive: load-tpch-core-hive-generated.sql
> >> Error in /home/CORP/quanlong.huang/workspace/Impala-cdh5.7.3-
> >> release/testdata/bin/create-load-data.sh at line 41: while [ -n "$*" ]
> >> Error in ./buildall.sh at line 368: ${IMPALA_HOME}/testdata/bin/create-load-data.sh
> >> ${CREATE_LOAD_DATA_ARGS} <<< Y
> >>
> >>
> >> I'm using version cdh5.7.3-release. The directory ${IMPALA_HOME}/testdata/impala-data
> >> dose not exist.
> >>
> >>
> >> Could you tell me how to generate this data set? Or where can I download
> >> the snapshot file of test-warehouse so I can skip this step?
> >>
> >>
> >> Thanks
> >> ----
> >> Quanlong
> >>
> >>
> >>
> >> 【网易自营|30天无忧退货】德国Birkenstock制造商“经典软木凉拖”限时仅69.9元>>
> >>
> >>
> >>
> >> 【网易自营|30天无忧退货】德国Birkenstock制造商“经典软木凉拖”限时仅69.9元>>
> >>
> >>
> >>
> >> 【网易自营|30天无忧退货】德国Birkenstock制造商“经典软木凉拖”限时仅69.9元>>
>
>
>
>
>

Re:Re: Failed to load test data about TPC-H

Posted by 黄权隆 <hu...@126.com>.
Hi Tim,


Thanks for you reply! I'll try these scripts later. One more question.
Is the latest Impala compatible with components in CDH-5.7.3? 
For example, Hadoop-2.6.0 and Hive-1.1.0?


We use the old version cdh-5.7.3-release just due to the concern
of incompatibility.


Thanks
----

Quanlong



At 2017-06-01 21:31:17, "Tim Armstrong" <ta...@cloudera.com> wrote:
>Hi Quanlong,
>  It looks like you're missing the TPC-H data. In older versions of Impala
>you had to generate the data manually and put it in that directory. We've
>automated that in more recent versions (I think probably since a year ago).
>If you can switch to a newer version, then this will just work. Data
>loading is a lot more reliable now.
>
>Otherwise this is the script that generates the data. You can probably copy
>this script to your repository and run it by hand:
>
>https://github.com/apache/incubator-impala/blob/master/testdata/datasets/tpch/preload
>
>You will also need to do the same for TPC-DS:
>https://github.com/apache/incubator-impala/blob/master/testdata/datasets/tpcds/preload
>
>
>Cheers,
>Tim
>
>On Thu, Jun 1, 2017 at 12:54 AM, 黄权隆 <hu...@126.com> wrote:
>
>> Hi friends,
>>
>>
>> I'm trying to run the impala tests. What I referred is the wiki 'How to
>> load and run Impala tests'.
>> Although I just want to run some end-to-end tests, I know I should load
>> the test data first. So I use
>> |
>> ./buildall.sh -noclean -testdata
>> |
>> It succeeded to load the functional test data, but failed to load the tpch
>> data set. Here are some related logs:
>>
>>
>> /home/CORP/quanlong.huang/workspace/Impala-cdh5.7.3-
>> release/testdata/target
>> SUCCESS, data generated into /home/CORP/quanlong.huang/
>> workspace/Impala-cdh5.7.3-release/testdata/target
>> Loading Hive Builtins (logging to load-hive-builtins.log)... OK
>> Generating HBase data (logging to create-hbase.log)... OK
>> Creating /test-warehouse HDFS directory (logging to
>> create-test-warehouse-dir.log)... OK
>> Starting Impala cluster (logging to start-impala-cluster.log)... OK
>> Setting up HDFS environment (logging to setup-hdfs-env.log)... OK
>> Loading custom schemas (logging to load-custom-schemas.log)... OK
>> Loading functional-query data (logging to load-functional-query.log)... OK
>> Loading TPC-H data (logging to load-tpch.log)... FAILED
>> 'load-data tpch core' failed. Tail of log:
>> Log for command 'load-data tpch core'
>> Loading workload 'tpch' Using exploration strategy 'core'. Logging to
>> /home/CORP/quanlong.huang/workspace/Impala-cdh5.7.3-
>> release/cluster_logs/data_loading/data-load-tpch-core.log
>> Error loading data. The end of the log file is:
>>         at org.apache.thrift.ProcessFunction.process(
>> ProcessFunction.java:39)
>>         at org.apache.thrift.TBaseProcessor.process(
>> TBaseProcessor.java:39)
>>         at org.apache.hive.service.auth.TSetIpAddressProcessor.process(
>> TSetIpAddressProcessor.java:56)
>>         at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(
>> TThreadPoolServer.java:285)
>>         at java.util.concurrent.ThreadPoolExecutor.runWorker(
>> ThreadPoolExecutor.java:1145)
>>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(
>> ThreadPoolExecutor.java:615)
>>         at java.lang.Thread.run(Thread.java:745)
>> Caused by: org.apache.hadoop.hive.ql.parse.SemanticException: Line 1:23
>> Invalid path ''/home/CORP/quanlong.huang/workspace/Impala-cdh5.7.3-
>> release/testdata/impala-data/tpch/lineitem'': No files matching path
>> file:/home/CORP/quanlong.huang/workspace/Impala-cdh5.7.
>> 3-release/testdata/impala-data/tpch/lineitem
>>         at org.apache.hadoop.hive.ql.parse.LoadSemanticAnalyzer.
>> applyConstraints(LoadSemanticAnalyzer.java:139)
>>         at org.apache.hadoop.hive.ql.parse.LoadSemanticAnalyzer.
>> analyzeInternal(LoadSemanticAnalyzer.java:230)
>>         at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.
>> analyze(BaseSemanticAnalyzer.java:222)
>>         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:445)
>>         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:311)
>>         at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.
>> java:1189)
>>         at org.apache.hadoop.hive.ql.Driver.compileAndRespond(
>> Driver.java:1176)
>>         at org.apache.hive.service.cli.operation.SQLOperation.
>> prepare(SQLOperation.java:134)
>>         ... 26 more
>>
>>
>> Closing: 0: jdbc:hive2://localhost:11050/default;auth=none
>> Error executing file from Hive: load-tpch-core-hive-generated.sql
>> Error in /home/CORP/quanlong.huang/workspace/Impala-cdh5.7.3-
>> release/testdata/bin/create-load-data.sh at line 41: while [ -n "$*" ]
>> Error in ./buildall.sh at line 368: ${IMPALA_HOME}/testdata/bin/create-load-data.sh
>> ${CREATE_LOAD_DATA_ARGS} <<< Y
>>
>>
>> I'm using version cdh5.7.3-release. The directory ${IMPALA_HOME}/testdata/impala-data
>> dose not exist.
>>
>>
>> Could you tell me how to generate this data set? Or where can I download
>> the snapshot file of test-warehouse so I can skip this step?
>>
>>
>> Thanks
>> ----
>> Quanlong
>>
>>
>>
>> 【网易自营|30天无忧退货】德国Birkenstock制造商“经典软木凉拖”限时仅69.9元>>
>>
>>
>>
>> 【网易自营|30天无忧退货】德国Birkenstock制造商“经典软木凉拖”限时仅69.9元>>
>>
>>
>>
>> 【网易自营|30天无忧退货】德国Birkenstock制造商“经典软木凉拖”限时仅69.9元>>

Re: Failed to load test data about TPC-H

Posted by Tim Armstrong <ta...@cloudera.com>.
Hi Quanlong,
  It looks like you're missing the TPC-H data. In older versions of Impala
you had to generate the data manually and put it in that directory. We've
automated that in more recent versions (I think probably since a year ago).
If you can switch to a newer version, then this will just work. Data
loading is a lot more reliable now.

Otherwise this is the script that generates the data. You can probably copy
this script to your repository and run it by hand:

https://github.com/apache/incubator-impala/blob/master/testdata/datasets/tpch/preload

You will also need to do the same for TPC-DS:
https://github.com/apache/incubator-impala/blob/master/testdata/datasets/tpcds/preload


Cheers,
Tim

On Thu, Jun 1, 2017 at 12:54 AM, 黄权隆 <hu...@126.com> wrote:

> Hi friends,
>
>
> I'm trying to run the impala tests. What I referred is the wiki 'How to
> load and run Impala tests'.
> Although I just want to run some end-to-end tests, I know I should load
> the test data first. So I use
> |
> ./buildall.sh -noclean -testdata
> |
> It succeeded to load the functional test data, but failed to load the tpch
> data set. Here are some related logs:
>
>
> /home/CORP/quanlong.huang/workspace/Impala-cdh5.7.3-
> release/testdata/target
> SUCCESS, data generated into /home/CORP/quanlong.huang/
> workspace/Impala-cdh5.7.3-release/testdata/target
> Loading Hive Builtins (logging to load-hive-builtins.log)... OK
> Generating HBase data (logging to create-hbase.log)... OK
> Creating /test-warehouse HDFS directory (logging to
> create-test-warehouse-dir.log)... OK
> Starting Impala cluster (logging to start-impala-cluster.log)... OK
> Setting up HDFS environment (logging to setup-hdfs-env.log)... OK
> Loading custom schemas (logging to load-custom-schemas.log)... OK
> Loading functional-query data (logging to load-functional-query.log)... OK
> Loading TPC-H data (logging to load-tpch.log)... FAILED
> 'load-data tpch core' failed. Tail of log:
> Log for command 'load-data tpch core'
> Loading workload 'tpch' Using exploration strategy 'core'. Logging to
> /home/CORP/quanlong.huang/workspace/Impala-cdh5.7.3-
> release/cluster_logs/data_loading/data-load-tpch-core.log
> Error loading data. The end of the log file is:
>         at org.apache.thrift.ProcessFunction.process(
> ProcessFunction.java:39)
>         at org.apache.thrift.TBaseProcessor.process(
> TBaseProcessor.java:39)
>         at org.apache.hive.service.auth.TSetIpAddressProcessor.process(
> TSetIpAddressProcessor.java:56)
>         at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(
> TThreadPoolServer.java:285)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.parse.SemanticException: Line 1:23
> Invalid path ''/home/CORP/quanlong.huang/workspace/Impala-cdh5.7.3-
> release/testdata/impala-data/tpch/lineitem'': No files matching path
> file:/home/CORP/quanlong.huang/workspace/Impala-cdh5.7.
> 3-release/testdata/impala-data/tpch/lineitem
>         at org.apache.hadoop.hive.ql.parse.LoadSemanticAnalyzer.
> applyConstraints(LoadSemanticAnalyzer.java:139)
>         at org.apache.hadoop.hive.ql.parse.LoadSemanticAnalyzer.
> analyzeInternal(LoadSemanticAnalyzer.java:230)
>         at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.
> analyze(BaseSemanticAnalyzer.java:222)
>         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:445)
>         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:311)
>         at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.
> java:1189)
>         at org.apache.hadoop.hive.ql.Driver.compileAndRespond(
> Driver.java:1176)
>         at org.apache.hive.service.cli.operation.SQLOperation.
> prepare(SQLOperation.java:134)
>         ... 26 more
>
>
> Closing: 0: jdbc:hive2://localhost:11050/default;auth=none
> Error executing file from Hive: load-tpch-core-hive-generated.sql
> Error in /home/CORP/quanlong.huang/workspace/Impala-cdh5.7.3-
> release/testdata/bin/create-load-data.sh at line 41: while [ -n "$*" ]
> Error in ./buildall.sh at line 368: ${IMPALA_HOME}/testdata/bin/create-load-data.sh
> ${CREATE_LOAD_DATA_ARGS} <<< Y
>
>
> I'm using version cdh5.7.3-release. The directory ${IMPALA_HOME}/testdata/impala-data
> dose not exist.
>
>
> Could you tell me how to generate this data set? Or where can I download
> the snapshot file of test-warehouse so I can skip this step?
>
>
> Thanks
> ----
> Quanlong
>
>
>
> 【网易自营|30天无忧退货】德国Birkenstock制造商“经典软木凉拖”限时仅69.9元>>
>
>
>
> 【网易自营|30天无忧退货】德国Birkenstock制造商“经典软木凉拖”限时仅69.9元>>
>
>
>
> 【网易自营|30天无忧退货】德国Birkenstock制造商“经典软木凉拖”限时仅69.9元>>