You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@falcon.apache.org by "Shwetha G S (JIRA)" <ji...@apache.org> on 2014/03/25 10:10:42 UTC
[jira] [Comment Edited] (FALCON-365) Remove the checked in oozie
xsds
[ https://issues.apache.org/jira/browse/FALCON-365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13946319#comment-13946319 ]
Shwetha G S edited comment on FALCON-365 at 3/25/14 9:09 AM:
-------------------------------------------------------------
I haven't changed any java code that prepares the hive workflow(not even imports) and its the java code that sets params for hive workflow. Since the compilation is fine, I don't see any issue why hive workflow should fail. Moreover there was no difference between hive xsd that was there in falcon and the one in oozie-client.
Anyway, I decided to test a process with hive workflow and it doesn't work. I think the issue is with the oozie EL extension that we use(which is not related to this patch). Here are the details of hql:
{noformat}
Script [wordcount.hql] content:
------------------------
INSERT OVERWRITE TABLE $falcon_output_table PARTITION($falcon_output_dataout_partitions) SELECT word, SUM(cnt) as cnt FROM $falcon_input_table WHERE $falcon_input_filter GROUP BY word;
------------------------
Parameters:
------------------------
falcon_input_table=in_table
falcon_input_database=default
falcon_input_storage_type=TABLE
falcon_input_catalog_url=thrift://localhost:12000
falcon_input_filter=(ds='2013-11-15-00-04') OR (ds='2013-11-15-00-03') OR (ds='2013-11-15-00-02') OR (ds='2013-11-15-00-01') OR (ds='2013-11-15-00-00')
falcon_output_catalog_url=thrift://localhost:12000
falcon_output_dataout_partitions='ds=2013-11-15-00-05'
falcon_output_dated_partition_value=2013-11-15-00-05
falcon_output_storage_type=TABLE
falcon_output_database=default
falcon_output_table=out_table
------------------------
Hive command arguments :
--hivevar
falcon_input_table=in_table
--hivevar
falcon_input_database=default
--hivevar
falcon_input_storage_type=TABLE
--hivevar
falcon_input_catalog_url=thrift://localhost:12000
--hivevar
falcon_input_filter=(ds='2013-11-15-00-04') OR (ds='2013-11-15-00-03') OR (ds='2013-11-15-00-02') OR (ds='2013-11-15-00-01') OR (ds='2013-11-15-00-00')
--hivevar
falcon_output_catalog_url=thrift://localhost:12000
--hivevar
falcon_output_dataout_partitions='ds=2013-11-15-00-05'
--hivevar
falcon_output_dated_partition_value=2013-11-15-00-05
--hivevar
falcon_output_storage_type=TABLE
--hivevar
falcon_output_database=default
--hivevar
falcon_output_table=out_table
-f
wordcount.hql
{noformat}
The issue is, this generates hql as
{noformat}
INSERT OVERWRITE TABLE out_table PARTITION(ds=2013-11-15-00-05) SELECT word, SUM(cnt) as cnt FROM in_table WHERE (ds='2013-11-15-00-04') OR (ds='2013-11-15-00-03') OR (ds='2013-11-15-00-02') OR (ds='2013-11-15-00-01') OR (ds='2013-11-15-00-00') GROUP BY word;
{noformat}
it should be (quote around output partition value)
{noformat}
INSERT OVERWRITE TABLE out_table PARTITION(ds='2013-11-15-00-05') SELECT word, SUM(cnt) as cnt FROM in_table WHERE (ds='2013-11-15-00-04') OR (ds='2013-11-15-00-03') OR (ds='2013-11-15-00-02') OR (ds='2013-11-15-00-01') OR (ds='2013-11-15-00-00') GROUP BY word;
{noformat}
was (Author: shwethags):
I haven't changed any java code that prepares the hive workflow(not even imports) and its the java code that sets params for hive workflow. Since the compilation is fine, I don't see any issue why hive workflow should fail. Moreover there was no difference between hive xsd that was there in falcon and the one in oozie-client.
Anyway, I decided to test a process with hive workflow and it doesn't work. I think the issue is with the oozie EL extension that we use. Here are the details of hql:
{noformat}
Script [wordcount.hql] content:
------------------------
INSERT OVERWRITE TABLE $falcon_output_table PARTITION($falcon_output_dataout_partitions) SELECT word, SUM(cnt) as cnt FROM $falcon_input_table WHERE $falcon_input_filter GROUP BY word;
------------------------
Parameters:
------------------------
falcon_input_table=in_table
falcon_input_database=default
falcon_input_storage_type=TABLE
falcon_input_catalog_url=thrift://localhost:12000
falcon_input_filter=(ds='2013-11-15-00-04') OR (ds='2013-11-15-00-03') OR (ds='2013-11-15-00-02') OR (ds='2013-11-15-00-01') OR (ds='2013-11-15-00-00')
falcon_output_catalog_url=thrift://localhost:12000
falcon_output_dataout_partitions='ds=2013-11-15-00-05'
falcon_output_dated_partition_value=2013-11-15-00-05
falcon_output_storage_type=TABLE
falcon_output_database=default
falcon_output_table=out_table
------------------------
Hive command arguments :
--hivevar
falcon_input_table=in_table
--hivevar
falcon_input_database=default
--hivevar
falcon_input_storage_type=TABLE
--hivevar
falcon_input_catalog_url=thrift://localhost:12000
--hivevar
falcon_input_filter=(ds='2013-11-15-00-04') OR (ds='2013-11-15-00-03') OR (ds='2013-11-15-00-02') OR (ds='2013-11-15-00-01') OR (ds='2013-11-15-00-00')
--hivevar
falcon_output_catalog_url=thrift://localhost:12000
--hivevar
falcon_output_dataout_partitions='ds=2013-11-15-00-05'
--hivevar
falcon_output_dated_partition_value=2013-11-15-00-05
--hivevar
falcon_output_storage_type=TABLE
--hivevar
falcon_output_database=default
--hivevar
falcon_output_table=out_table
-f
wordcount.hql
{noformat}
The issue is, this generates hql as
{noformat}
INSERT OVERWRITE TABLE out_table PARTITION(ds=2013-11-15-00-05) SELECT word, SUM(cnt) as cnt FROM in_table WHERE (ds='2013-11-15-00-04') OR (ds='2013-11-15-00-03') OR (ds='2013-11-15-00-02') OR (ds='2013-11-15-00-01') OR (ds='2013-11-15-00-00') GROUP BY word;
{noformat}
it should be (quote around output partition value)
{noformat}
INSERT OVERWRITE TABLE out_table PARTITION(ds='2013-11-15-00-05') SELECT word, SUM(cnt) as cnt FROM in_table WHERE (ds='2013-11-15-00-04') OR (ds='2013-11-15-00-03') OR (ds='2013-11-15-00-02') OR (ds='2013-11-15-00-01') OR (ds='2013-11-15-00-00') GROUP BY word;
{noformat}
> Remove the checked in oozie xsds
> --------------------------------
>
> Key: FALCON-365
> URL: https://issues.apache.org/jira/browse/FALCON-365
> Project: Falcon
> Issue Type: Bug
> Reporter: Shwetha G S
> Assignee: Shwetha G S
> Fix For: 0.5
>
> Attachments: FALCON-365-v2.patch, FALCON-365.patch
>
>
> Oozie xsds for workflow, coordinator, bundle etc are part of oozie-client. We should use the xsds from the client jar
--
This message was sent by Atlassian JIRA
(v6.2#6252)