You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@falcon.apache.org by "Shwetha G S (JIRA)" <ji...@apache.org> on 2014/03/25 10:10:42 UTC

[jira] [Comment Edited] (FALCON-365) Remove the checked in oozie xsds

    [ https://issues.apache.org/jira/browse/FALCON-365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13946319#comment-13946319 ] 

Shwetha G S edited comment on FALCON-365 at 3/25/14 9:09 AM:
-------------------------------------------------------------

I haven't changed any java code that prepares the hive workflow(not even imports) and its the java code that sets params for hive workflow. Since the compilation is fine, I don't see any issue why hive workflow should fail. Moreover there was no difference between hive xsd that was there in falcon and the one in oozie-client.

Anyway, I decided to test a process with hive workflow and it doesn't work. I think the issue is with the oozie EL extension that we use(which is not related to this patch). Here are the details of hql:
{noformat}
Script [wordcount.hql] content: 
------------------------
INSERT OVERWRITE TABLE $falcon_output_table PARTITION($falcon_output_dataout_partitions) SELECT word, SUM(cnt) as cnt FROM $falcon_input_table WHERE $falcon_input_filter GROUP BY word;

------------------------

Parameters:
------------------------
  falcon_input_table=in_table
  falcon_input_database=default
  falcon_input_storage_type=TABLE
  falcon_input_catalog_url=thrift://localhost:12000
  falcon_input_filter=(ds='2013-11-15-00-04') OR (ds='2013-11-15-00-03') OR (ds='2013-11-15-00-02') OR (ds='2013-11-15-00-01') OR (ds='2013-11-15-00-00')
  falcon_output_catalog_url=thrift://localhost:12000
  falcon_output_dataout_partitions='ds=2013-11-15-00-05'
  falcon_output_dated_partition_value=2013-11-15-00-05
  falcon_output_storage_type=TABLE
  falcon_output_database=default
  falcon_output_table=out_table
------------------------

Hive command arguments :
             --hivevar
             falcon_input_table=in_table
             --hivevar
             falcon_input_database=default
             --hivevar
             falcon_input_storage_type=TABLE
             --hivevar
             falcon_input_catalog_url=thrift://localhost:12000
             --hivevar
             falcon_input_filter=(ds='2013-11-15-00-04') OR (ds='2013-11-15-00-03') OR (ds='2013-11-15-00-02') OR (ds='2013-11-15-00-01') OR (ds='2013-11-15-00-00')
             --hivevar
             falcon_output_catalog_url=thrift://localhost:12000
             --hivevar
             falcon_output_dataout_partitions='ds=2013-11-15-00-05'
             --hivevar
             falcon_output_dated_partition_value=2013-11-15-00-05
             --hivevar
             falcon_output_storage_type=TABLE
             --hivevar
             falcon_output_database=default
             --hivevar
             falcon_output_table=out_table
             -f
             wordcount.hql
{noformat}

The issue is, this generates hql as
{noformat}
INSERT OVERWRITE TABLE out_table PARTITION(ds=2013-11-15-00-05) SELECT word, SUM(cnt) as cnt FROM in_table WHERE (ds='2013-11-15-00-04') OR (ds='2013-11-15-00-03') OR (ds='2013-11-15-00-02') OR (ds='2013-11-15-00-01') OR (ds='2013-11-15-00-00') GROUP BY word;
{noformat}

it should be (quote around output partition value)
{noformat}
INSERT OVERWRITE TABLE out_table PARTITION(ds='2013-11-15-00-05') SELECT word, SUM(cnt) as cnt FROM in_table WHERE (ds='2013-11-15-00-04') OR (ds='2013-11-15-00-03') OR (ds='2013-11-15-00-02') OR (ds='2013-11-15-00-01') OR (ds='2013-11-15-00-00') GROUP BY word;
{noformat}



was (Author: shwethags):
I haven't changed any java code that prepares the hive workflow(not even imports) and its the java code that sets params for hive workflow. Since the compilation is fine, I don't see any issue why hive workflow should fail. Moreover there was no difference between hive xsd that was there in falcon and the one in oozie-client.

Anyway, I decided to test a process with hive workflow and it doesn't work. I think the issue is with the oozie EL extension that we use. Here are the details of hql:
{noformat}
Script [wordcount.hql] content: 
------------------------
INSERT OVERWRITE TABLE $falcon_output_table PARTITION($falcon_output_dataout_partitions) SELECT word, SUM(cnt) as cnt FROM $falcon_input_table WHERE $falcon_input_filter GROUP BY word;

------------------------

Parameters:
------------------------
  falcon_input_table=in_table
  falcon_input_database=default
  falcon_input_storage_type=TABLE
  falcon_input_catalog_url=thrift://localhost:12000
  falcon_input_filter=(ds='2013-11-15-00-04') OR (ds='2013-11-15-00-03') OR (ds='2013-11-15-00-02') OR (ds='2013-11-15-00-01') OR (ds='2013-11-15-00-00')
  falcon_output_catalog_url=thrift://localhost:12000
  falcon_output_dataout_partitions='ds=2013-11-15-00-05'
  falcon_output_dated_partition_value=2013-11-15-00-05
  falcon_output_storage_type=TABLE
  falcon_output_database=default
  falcon_output_table=out_table
------------------------

Hive command arguments :
             --hivevar
             falcon_input_table=in_table
             --hivevar
             falcon_input_database=default
             --hivevar
             falcon_input_storage_type=TABLE
             --hivevar
             falcon_input_catalog_url=thrift://localhost:12000
             --hivevar
             falcon_input_filter=(ds='2013-11-15-00-04') OR (ds='2013-11-15-00-03') OR (ds='2013-11-15-00-02') OR (ds='2013-11-15-00-01') OR (ds='2013-11-15-00-00')
             --hivevar
             falcon_output_catalog_url=thrift://localhost:12000
             --hivevar
             falcon_output_dataout_partitions='ds=2013-11-15-00-05'
             --hivevar
             falcon_output_dated_partition_value=2013-11-15-00-05
             --hivevar
             falcon_output_storage_type=TABLE
             --hivevar
             falcon_output_database=default
             --hivevar
             falcon_output_table=out_table
             -f
             wordcount.hql
{noformat}

The issue is, this generates hql as
{noformat}
INSERT OVERWRITE TABLE out_table PARTITION(ds=2013-11-15-00-05) SELECT word, SUM(cnt) as cnt FROM in_table WHERE (ds='2013-11-15-00-04') OR (ds='2013-11-15-00-03') OR (ds='2013-11-15-00-02') OR (ds='2013-11-15-00-01') OR (ds='2013-11-15-00-00') GROUP BY word;
{noformat}

it should be (quote around output partition value)
{noformat}
INSERT OVERWRITE TABLE out_table PARTITION(ds='2013-11-15-00-05') SELECT word, SUM(cnt) as cnt FROM in_table WHERE (ds='2013-11-15-00-04') OR (ds='2013-11-15-00-03') OR (ds='2013-11-15-00-02') OR (ds='2013-11-15-00-01') OR (ds='2013-11-15-00-00') GROUP BY word;
{noformat}


> Remove the checked in oozie xsds
> --------------------------------
>
>                 Key: FALCON-365
>                 URL: https://issues.apache.org/jira/browse/FALCON-365
>             Project: Falcon
>          Issue Type: Bug
>            Reporter: Shwetha G S
>            Assignee: Shwetha G S
>             Fix For: 0.5
>
>         Attachments: FALCON-365-v2.patch, FALCON-365.patch
>
>
> Oozie xsds for workflow, coordinator, bundle etc are part of oozie-client. We should use the xsds from the client jar



--
This message was sent by Atlassian JIRA
(v6.2#6252)