You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Tim Harsch <th...@yarcdata.com> on 2014/07/17 01:09:07 UTC

Turning on Tez for Hive

Hi all,
Is there a wiki page somewhere that shows how to turn on Tez for Hive?

I found "hive.execution.engine" in hive-default.xml.template.  But I'm sure there must be more.  Do I have to install Tez separately?

Thanks,
Tim

Re: Turning on Tez for Hive

Posted by Lefty Leverenz <le...@gmail.com>.
Also, Tez configuration parameters are listed here:
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-Tez
.

-- Lefty


On Wed, Jul 16, 2014 at 9:16 PM, Lefty Leverenz <le...@gmail.com>
wrote:

> The "Hive on Tez" design doc has a couple of links in the Installation
> and Configuration
> <https://cwiki.apache.org/confluence/display/Hive/Hive+on+Tez#HiveonTez-InstallationandConfiguration>
> section.
>
> -- Lefty
>
>
> On Wed, Jul 16, 2014 at 7:09 PM, Tim Harsch <th...@yarcdata.com> wrote:
>
>>  Hi all,
>> Is there a wiki page somewhere that shows how to turn on Tez for Hive?
>>
>>  I found "hive.execution.engine" in hive-default.xml.template.  But I'm
>> sure there must be more.  Do I have to install Tez separately?
>>
>>  Thanks,
>> Tim
>>
>
>

Re: Turning on Tez for Hive

Posted by Lefty Leverenz <le...@gmail.com>.
The "Hive on Tez" design doc has a couple of links in the Installation and
Configuration
<https://cwiki.apache.org/confluence/display/Hive/Hive+on+Tez#HiveonTez-InstallationandConfiguration>
section.

-- Lefty


On Wed, Jul 16, 2014 at 7:09 PM, Tim Harsch <th...@yarcdata.com> wrote:

>  Hi all,
> Is there a wiki page somewhere that shows how to turn on Tez for Hive?
>
>  I found "hive.execution.engine" in hive-default.xml.template.  But I'm
> sure there must be more.  Do I have to install Tez separately?
>
>  Thanks,
> Tim
>

Re: Turning on Tez for Hive

Posted by Tim Harsch <th...@yarcdata.com>.
Hi Lefty,
I came across those documents as well.  They gave me some good hints, but
were in some places too specific to Horton Works.  For the record, I did
get Hive 0.13.1 and Tez 0.4.1 working together.  (and I immediately saw a
200% speed up on my corpus of queries).  For future travelers here are my
(somewhat raw) notes:

Get the source distro at:
http://www.apache.org/dyn/closer.cgi/incubator/tez/tez-0.4.1-incubating/

explode it, and run 'mvm package -DskipTests -Dtar'

explode the tar from tez-dist/target/tez-0.4.1-incubating.tar.gz to a
location which will become TEZ_INSTALL_DIR.

Tez needs a user dir for to match unix user, or you will get
java.io.FileNotFoundException: File does not exist: hdfs:/user/tharsch:
% hdfs dfs -mkdir /user/tharsch; hdfs dfs -chmod g+w /user/tharsch

# NOTE: set HIVE_AUX_JARS_PATH using if -z test or get
java.lang.IllegalArgumentException: Can not create a Path from an empty
string
# NOTE: TEZ_JARS cannot include "*" or you will get error
"java.io.FileNotFoundException: File
file:/home/users/tharsch/apps/tez/tez-0.4.1-incubating/* does not exist"

export TEZ_INSTALL_DIR=/home/users/tharsch/apps/tez/tez-0.4.1-incubating
export TEZ_CONF_DIR=$TEZ_INSTALL_DIR/conf
export TEZ_JARS=$(echo "$TEZ_INSTALL_DIR"/*.jar | tr ' ' ':'):$(echo
"$TEZ_INSTALL_DIR"/lib/*.jar | tr ' ' ':')
if [ -z "$HIVE_AUX_JARS_PATH" ]; then
     export HIVE_AUX_JARS_PATH="$TEZ_JARS"
else
     export HIVE_AUX_JARS_PATH="$HIVE_AUX_JARS_PATH:$TEZ_JARS"
fi

NOTE:  Be sure to copy TEZ jars to HDFS and set HADOOP_CLASSPATH or you
will get:
org.apache.tez.dag.api.TezUncheckedException: Invalid configuration of tez
jars, tez.lib.uris is not defined in the configurartion

export HADOOP_CLASSPATH="${TEZ_CONF_DIR}:${TEZ_JARS}/*:${TEZ_JARS}/lib/*"


% hdfs dfs -mkdir -p /apps/tez-0.4.1; hdfs dfs -chmod g+w /apps/tez-0.4.1
% hdfs dfs -copyFromLocal $TEZ_INSTALL_DIR/* /apps/tez-0.4.1
% hdfs dfs -copyFromLocal $TEZ_INSTALL_DIR/lib/* /apps/tez-0.4.1

NOTE: Set mapred-site.xml
        <property>
                <name>mapreduce.framework.name
<http://mapreduce.framework.name/></name>
                <value>yarn-tez</value>
        </property>


NOTE: create tez-site.xml in $TEZ_CONF_DIR
<configuration>
     <property>
          <name>tez.lib.uris</name>
          <value>${fs.default.name}/apps/tez-0.4.1/</value>
     </property>
</configuration>


NOTE: HIVE Settings:
set hive.execution.engine=tez;
set hive.use.tez.natively=true;
set hive.enable.mrr=true;



Re: Turning on Tez for Hive

Posted by Lefty Leverenz <le...@gmail.com>.
Actually those links don't quite match your component versions.  (Close,
but they're for Hive 0.13.0 instead of 0.13.1.)  The HDP-2.1.3 docs
<http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1-latest/bk_releasenotes_hdp_2.1/content/ch_relnotes-hdp-2.1.3-product.html>
cover
Hadoop 2.4.0, Hive 0.13.1, and Tez 0.4.0.  Here are the Tez setup links:

   - Set Up the Hive/HCatalog Configuration Files
   <http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.3/bk_installing_manually_book/content/rpm-chap6-3.html>
(see
   3.1 Configure Hive and HiveServer2 for Tez, which includes the
   configuration information that the previous links found in the Tez chapter)
   - Installing and Configuring Apache Tez
   <http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.3/bk_installing_manually_book/content/rpm-chap-tez.html>
    (see 10.4 Enable Tez for Hive Queries
   <http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.3/bk_installing_manually_book/content/rpm-chap-tez-enable_tez_for_hive_queries.html>
   )


-- Lefty


On Thu, Jul 17, 2014 at 4:56 PM, Lefty Leverenz <le...@gmail.com>
wrote:

> You might also find useful information in HDP's Hive and HCatalog
> installation instructions (section 3.1 "Configure Hive and HiveServer2 for
> Tez") here:
> http://dev.hortonworks.com.s3.amazonaws.com/HDPDocuments/HDP2/HDP-2.1.2/bk_installing_manually_book/content/rpm-chap6-3.html
>
>
> -- Lefty
>
>
> On Thu, Jul 17, 2014 at 12:34 PM, Tim Harsch <th...@yarcdata.com> wrote:
>
>>  Hi Alex,
>> Thanks for the reply.  I should state I am using vanilla apache hadoop
>> 2.4.0 and hive 0.13.1 (not Horton works).  And I don't have root access on
>> my pseudo distributed hadoop cluster (single node).
>>
>>  I tried your suggestion for a query:
>>
>>  hive> explain select count(*) from web_sales;
>> Exception in thread "main" java.lang.NoClassDefFoundError:
>> org/apache/tez/dag/api/client/StatusGetOpts
>>
>>  I'm thinking I must install Tez separately from source.  And then set
>> the env vars and HADOOP_CLASSPATH as per:
>>
>> http://dev.hortonworks.com.s3.amazonaws.com/HDPDocuments/HDP2/HDP-2.1.2/bk_installing_manually_book/content/rpm-chap-tez_configure_tez.html
>>
>>  Tim
>>
>>
>>   From: Alexander Alten-Lorenz <wg...@gmail.com>
>> Reply-To: "user@hive.apache.org" <us...@hive.apache.org>, Alexander
>> Alten-Lorenz <wg...@gmail.com>
>> Date: Wednesday, July 16, 2014 11:27 PM
>> To: "user@hive.apache.org" <us...@hive.apache.org>
>> Subject: Re: Turning on Tez for Hive
>>
>>   Just use the execution engine, can be done per Hive query too:
>> hive > set hive.execution.engine=tez;
>> hive > select count (*) from what_ever;
>>
>> If you want to use HS2 with Tez, follow the documentation here:
>>
>> http://dev.hortonworks.com.s3.amazonaws.com/HDPDocuments/HDP2/HDP-2.1.2/bk_installing_manually_book/content/rpm-chap-tez-configure_hive_for_tez.html
>>
>> - Alex
>>
>> ------ Originalnachricht ------
>> Von: "Tim Harsch" <th...@yarcdata.com>
>> An: "user@hive.apache.org" <us...@hive.apache.org>
>> Gesendet: 17.07.2014 01:09:07
>> Betreff: Turning on Tez for Hive
>>
>>
>> Hi all,
>> Is there a wiki page somewhere that shows how to turn on Tez for Hive?
>>
>>  I found "hive.execution.engine" in hive-default.xml.template.  But I'm
>> sure there must be more.  Do I have to install Tez separately?
>>
>>  Thanks,
>> Tim
>>
>>
>

Re: Turning on Tez for Hive

Posted by Lefty Leverenz <le...@gmail.com>.
You might also find useful information in HDP's Hive and HCatalog
installation instructions (section 3.1 "Configure Hive and HiveServer2 for
Tez") here:
http://dev.hortonworks.com.s3.amazonaws.com/HDPDocuments/HDP2/HDP-2.1.2/bk_installing_manually_book/content/rpm-chap6-3.html


-- Lefty


On Thu, Jul 17, 2014 at 12:34 PM, Tim Harsch <th...@yarcdata.com> wrote:

>  Hi Alex,
> Thanks for the reply.  I should state I am using vanilla apache hadoop
> 2.4.0 and hive 0.13.1 (not Horton works).  And I don't have root access on
> my pseudo distributed hadoop cluster (single node).
>
>  I tried your suggestion for a query:
>
>  hive> explain select count(*) from web_sales;
> Exception in thread "main" java.lang.NoClassDefFoundError:
> org/apache/tez/dag/api/client/StatusGetOpts
>
>  I'm thinking I must install Tez separately from source.  And then set
> the env vars and HADOOP_CLASSPATH as per:
>
> http://dev.hortonworks.com.s3.amazonaws.com/HDPDocuments/HDP2/HDP-2.1.2/bk_installing_manually_book/content/rpm-chap-tez_configure_tez.html
>
>  Tim
>
>
>   From: Alexander Alten-Lorenz <wg...@gmail.com>
> Reply-To: "user@hive.apache.org" <us...@hive.apache.org>, Alexander
> Alten-Lorenz <wg...@gmail.com>
> Date: Wednesday, July 16, 2014 11:27 PM
> To: "user@hive.apache.org" <us...@hive.apache.org>
> Subject: Re: Turning on Tez for Hive
>
>   Just use the execution engine, can be done per Hive query too:
> hive > set hive.execution.engine=tez;
> hive > select count (*) from what_ever;
>
> If you want to use HS2 with Tez, follow the documentation here:
>
> http://dev.hortonworks.com.s3.amazonaws.com/HDPDocuments/HDP2/HDP-2.1.2/bk_installing_manually_book/content/rpm-chap-tez-configure_hive_for_tez.html
>
> - Alex
>
> ------ Originalnachricht ------
> Von: "Tim Harsch" <th...@yarcdata.com>
> An: "user@hive.apache.org" <us...@hive.apache.org>
> Gesendet: 17.07.2014 01:09:07
> Betreff: Turning on Tez for Hive
>
>
> Hi all,
> Is there a wiki page somewhere that shows how to turn on Tez for Hive?
>
>  I found "hive.execution.engine" in hive-default.xml.template.  But I'm
> sure there must be more.  Do I have to install Tez separately?
>
>  Thanks,
> Tim
>
>

Re: Turning on Tez for Hive

Posted by Tim Harsch <th...@yarcdata.com>.
Hi Alex,
Thanks for the reply.  I should state I am using vanilla apache hadoop 2.4.0 and hive 0.13.1 (not Horton works).  And I don't have root access on my pseudo distributed hadoop cluster (single node).

I tried your suggestion for a query:

hive> explain select count(*) from web_sales;
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/tez/dag/api/client/StatusGetOpts

I'm thinking I must install Tez separately from source.  And then set the env vars and HADOOP_CLASSPATH as per:
http://dev.hortonworks.com.s3.amazonaws.com/HDPDocuments/HDP2/HDP-2.1.2/bk_installing_manually_book/content/rpm-chap-tez_configure_tez.html

Tim


From: Alexander Alten-Lorenz <wg...@gmail.com>>
Reply-To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>, Alexander Alten-Lorenz <wg...@gmail.com>>
Date: Wednesday, July 16, 2014 11:27 PM
To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Subject: Re: Turning on Tez for Hive

Just use the execution engine, can be done per Hive query too:
hive > set hive.execution.engine=tez;
hive > select count (*) from what_ever;

If you want to use HS2 with Tez, follow the documentation here:
http://dev.hortonworks.com.s3.amazonaws.com/HDPDocuments/HDP2/HDP-2.1.2/bk_installing_manually_book/content/rpm-chap-tez-configure_hive_for_tez.html

- Alex

------ Originalnachricht ------
Von: "Tim Harsch" <th...@yarcdata.com>>
An: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Gesendet: 17.07.2014 01:09:07
Betreff: Turning on Tez for Hive

Hi all,
Is there a wiki page somewhere that shows how to turn on Tez for Hive?

I found "hive.execution.engine" in hive-default.xml.template.  But I'm sure there must be more.  Do I have to install Tez separately?

Thanks,
Tim

Re: Turning on Tez for Hive

Posted by Alexander Alten-Lorenz <wg...@gmail.com>.
Just use the execution engine, can be done per Hive query too:
hive > set hive.execution.engine=tez;
hive > select count (*) from what_ever;

If you want to use HS2 with Tez, follow the documentation here:
http://dev.hortonworks.com.s3.amazonaws.com/HDPDocuments/HDP2/HDP-2.1.2/bk_installing_manually_book/content/rpm-chap-tez-configure_hive_for_tez.html

- Alex

------ Originalnachricht ------
Von: "Tim Harsch" <th...@yarcdata.com>
An: "user@hive.apache.org" <us...@hive.apache.org>
Gesendet: 17.07.2014 01:09:07
Betreff: Turning on Tez for Hive

>Hi all,
>Is there a wiki page somewhere that shows how to turn on Tez for Hive?
>
>I found "hive.execution.engine" in hive-default.xml.template.  But I'm 
>sure there must be more.  Do I have to install Tez separately?
>
>Thanks,
>Tim