You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Aizhamal Nurmamat kyzy (JIRA)" <ji...@apache.org> on 2019/05/17 21:18:01 UTC
[jira] [Resolved] (AIRFLOW-3047) HiveCliHook does not work properly
with Beeline
[ https://issues.apache.org/jira/browse/AIRFLOW-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Aizhamal Nurmamat kyzy resolved AIRFLOW-3047.
---------------------------------------------
Resolution: Fixed
Resolving reopened issues for component refactor.
> HiveCliHook does not work properly with Beeline
> -----------------------------------------------
>
> Key: AIRFLOW-3047
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3047
> Project: Apache Airflow
> Issue Type: Bug
> Components: hive_hooks, hooks
> Affects Versions: 1.10.0
> Reporter: Vladislav Glinskiy
> Priority: Major
> Labels: hive, hive-hooks
>
> Simple _HiveOperator_ does not work properly in the case when _hive_cli_default_ connection configured to use _Beeline_.
>
> *Steps to reproduce:*
> 1. Setup Hive/HiveServer2 and Airflow environment with _beeline_ in _PATH_
> 2. Create test _datetimes_ table
> As example:
> {code:java}
> CREATE EXTERNAL TABLE datetimes (
> datetimes STRING)
> STORED AS PARQUET
> LOCATION '/opt/apps/datetimes';{code}
>
> 3. Edit _hive_cli_default_ connection:
> {code:java}
> airflow connections --delete --conn_id hive_cli_default
> airflow connections --add --conn_id hive_cli_default --conn_type hive_cli --conn_host $HOST --conn_port 10000 --conn_schema default --conn_login $CONN_LOGIN --conn_password $CONN_PASSWORD --conn_extra "{\"use_beeline\": true, \"auth\": \"null;user=$HS_USER;password=$HS_PASSWORD\"}"
> {code}
> Set variables according to your environment.
>
> 4. Create simple DAG:
> {code:java}
> """
> ###
> Sample DAG, which declares single Hive task.
> """
> import datetime
> import airflow
> from airflow import DAG
> from airflow.operators.hive_operator import HiveOperator
> from datetime import timedelta
> default_args = {
> 'owner': 'airflow',
> 'depends_on_past': False,
> 'start_date': airflow.utils.dates.days_ago(0, hour=0, minute=0, second=1),
> 'email': ['airflow@example.com'],
> 'email_on_failure': False,
> 'email_on_retry': False,
> 'retries': 1,
> 'retry_delay': timedelta(minutes=5),
> 'provide_context': True
> }
> dag = DAG(
> 'hive_task_dag',
> default_args=default_args,
> description='Single task DAG',
> schedule_interval=timedelta(minutes=15))
> insert_current_datetime = HiveOperator(
> task_id='insert_current_datetime_task',
> hql="insert into table datetimes values ('" + datetime.datetime.now().strftime("%I:%M%p on %B %d, %Y") + "');",
> dag=dag)
> dag.doc_md = __doc__
> {code}
>
> 5. Trigger DAG execution. Ensure that DAG completes successfully.
> 6. Check _datetimes_ table. It will be empty.
>
> As it turned out the issue is caused by an invalid temporary script file. The problem will be fixed if we add new-line character at the end of the script.
> So, a possible fix is to change:
> *hive_hooks.py:182*
> {code:java}
> if schema:
> hql = "USE {schema};\n{hql}".format(**locals())
> {code}
> to
> {code:java}
> if schema:
> hql = "USE {schema};\n{hql}\n".format(**locals())
> {code}
> Don't know how it can affect _hive shell_ queries since it is tested only against _beeline_.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)