You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lens.apache.org by "Angad Singh (JIRA)" <ji...@apache.org> on 2015/06/10 12:00:03 UTC

[jira] [Updated] (LENS-598) Dimtable selection not honoring requirements of multiple inter-related join chains properly

     [ https://issues.apache.org/jira/browse/LENS-598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Angad Singh updated LENS-598:
-----------------------------
    Description: 
Scenario:

We have a cubeX, dimension1, dimension2
table1, table2 are dimtables of dimension1
table3 is a dimtable of dimension2
We have joinchain1 from cubeX to dimension1 to dimension2, and joinchain2 from cubeX to dimension1 directly.

Problem is that when a cube query is run which selects columns from cubex, dimension1 and dimension2, lens should ideally look at requirements of both joinchain1 and joinchain2 when selecting dimtables of dimension1, but it is not.

At times when it is failing to join table3, it is selecting the wrong dimtable of dimension1 (table2).

E.g.:

Query:
{noformat}
cube select apps_owned_title from user j where apps_owned.source='raf'
{noformat}

correctly generated hive query (when it selected dimtable table1):
{code:sql}
SELECT ( apps_owned_metadata . apptitle ) FROM user.uh1_hdfs_user_attributestore_er_cubetable_dimension_gender_state j join user.uh1_hdfs_user_attributestore_er_cubetable_dimension_bundleid_install apps_owned on j.userid = apps_owned.userid and (apps_owned.dt = 'latest') join user.uh1_hdfs_hive_dimtable_uac_app apps_owned_metadata on apps_owned.bundleid = apps_owned_metadata.bundleid and (apps_owned_metadata.dt = 'latest') WHERE (((( apps_owned . source ) =  'raf' ) AND ((j.dt = 'latest'))))
{code}

incorrectly generated query:
{code:sql}
SELECT ( apps_owned_metadata . apptitle ) FROM user.uh1_hdfs_user_attributestore_er_cubetable_dimension_gender_state j join user.uh1_hdfs_user_attributestore_er_cubetable_dimension_app_install apps_owned on j.userid = apps_owned.userid and (apps_owned.dt = 'latest') WHERE (((( apps_owned . source ) =  'raf' ) AND ((apps_owned_metadata.dt = 'latest')) AND ((j.dt = 'latest')))) 
{code}

Exception in lensserver logs as a result of above query:
{noformat}
10 Jun 2015 08:56:27,850 [QuerySubmitter] INFO  org.apache.lens.driver.hive.HiveDriver  - Hive driver query:INSERT OVERWRITE DIRECTORY "/tmp/lensreports/hdfsout/1be8556b-64b2-42f0-a32b-45d3be75537f"  SELECT ( apps_owned_metadata . apptitle ) FROM user.uh1_hdfs_user_attributestore_er_cubetable_dimension_gender_state j join user.uh1_hdfs_user_attributestore_er_cubetable_dimension_app_install apps_owned on j.userid = apps_owned.userid and (apps_owned.dt = 'latest') WHERE (((( apps_owned . source ) =  'raf' ) AND ((apps_owned_metadata.dt = 'latest')) AND ((j.dt = 'latest')))) 
10 Jun 2015 08:56:27,851 [QuerySubmitter] INFO  org.apache.lens.driver.hive.HiveDriver  - whetherCalculatePriority: true
10 Jun 2015 08:56:27,852 [QuerySubmitter] INFO  org.apache.lens.server.api.priority.QueryPriorityDecider  - Deciding Priority VERY_HIGH since cost = 0.0
10 Jun 2015 08:56:27,852 [QuerySubmitter] WARN  org.apache.hadoop.conf.Configuration  - mapred.job.priority is deprecated. Instead, use mapreduce.job.priority
10 Jun 2015 08:56:27,852 [QuerySubmitter] INFO  org.apache.lens.driver.hive.HiveDriver  - set priority to VERY_HIGH
10 Jun 2015 08:56:28,468 [QuerySubmitter] INFO  org.apache.lens.driver.hive.HiveDriver  - No hive operation available for 1be8556b-64b2-42f0-a32b-45d3be75537f
10 Jun 2015 08:56:28,469 [QuerySubmitter] ERROR org.apache.lens.server.query.QueryExecutionServiceImpl  - Error launching query 1be8556b-64b2-42f0-a32b-45d3be75537f
org.apache.lens.server.api.error.LensException: Error executing async query
	at org.apache.lens.driver.hive.HiveDriver.executeAsync(HiveDriver.java:524)
	at org.apache.lens.server.query.QueryExecutionServiceImpl$QuerySubmitter.run(QueryExecutionServiceImpl.java:497)
	at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException [Error 10004]: Line 1:409 Invalid table alias or column reference 'apps_owned_metadata': (possible column names are: j.userid, j.timestamp, j.gender, j.gender_state_score, j.gender_state_confidence, j.dt, apps_owned.userid, apps_owned.timestamp, apps_owned.appid, apps_owned.app_install_confidence, apps_owned.dt, apps_owned.source)
	at org.apache.hive.service.cli.thrift.ThriftCLIServiceClient.checkStatus(ThriftCLIServiceClient.java:52)
	at org.apache.hive.service.cli.thrift.ThriftCLIServiceClient.executeStatementInternal(ThriftCLIServiceClient.java:151)
	at org.apache.hive.service.cli.thrift.ThriftCLIServiceClient.executeStatementAsync(ThriftCLIServiceClient.java:139)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hive.service.cli.thrift.RetryingThriftCLIServiceClient.invokeInternal(RetryingThriftCLIServiceClient.java:301)
	at org.apache.hive.service.cli.thrift.RetryingThriftCLIServiceClient.invoke(RetryingThriftCLIServiceClient.java:329)
	at com.sun.proxy.$Proxy59.executeStatementAsync(Unknown Source)
	at org.apache.hive.service.cli.thrift.RetryingThriftCLIServiceClient$CLIServiceClientWrapper.executeStatementAsync(RetryingThriftCLIServiceClient.java:118)
	at org.apache.lens.driver.hive.HiveDriver.executeAsync(HiveDriver.java:515)
	... 2 more
{noformat}

Note: in above example, cubeX is "user", dimension1 is "user_owned_apps", dimension2 is "hive_dimension_uac_app". table 1 is "user_attributestore_er_cubetable_dimension_bundleid_install", table2 is "user_attributestore_er_cubetable_dimension_app_install", table3 is "hive_dimtable_uac_app".

  was:
Scenario:

We have a cubeX, dimension1, dimension2
table1, table2 are dimtables of dimension1
table3 is a dimtable of dimension2
We have joinchain1 from cubeX to dimension1 to dimension2, and joinchain2 from cubeX to dimension1 directly.

Problem is that when a cube query is run which selects columns from cubex, dimension1 and dimension2, lens should ideally look at requirements of both joinchain1 and joinchain2 when selecting dimtables of dimension1, but it is not.

At times when it is failing to join table3, it is selecting the wrong dimtable of dimension1 (table2).

E.g.:

Query:
{noformat}
cube select apps_owned_title from user j where apps_owned.source='raf'
{noformat}

correctly generated hive query (when it selected dimtable table1):
{code:sql}
SELECT ( apps_owned_metadata . apptitle ) FROM user.uh1_hdfs_user_attributestore_er_cubetable_dimension_gender_state j join user.uh1_hdfs_user_attributestore_er_cubetable_dimension_bundleid_install apps_owned on j.userid = apps_owned.userid and (apps_owned.dt = 'latest') join user.uh1_hdfs_hive_dimtable_uac_app apps_owned_metadata on apps_owned.bundleid = apps_owned_metadata.bundleid and (apps_owned_metadata.dt = 'latest') WHERE (((( apps_owned . source ) =  'raf' ) AND ((j.dt = 'latest'))))
{code}

incorrectly generated query:
{code:sql}
SELECT ( apps_owned_metadata . apptitle ) FROM user.uh1_hdfs_user_attributestore_er_cubetable_dimension_gender_state j join user.uh1_hdfs_user_attributestore_er_cubetable_dimension_app_install apps_owned on j.userid = apps_owned.userid and (apps_owned.dt = 'latest') WHERE (((( apps_owned . source ) =  'raf' ) AND ((apps_owned_metadata.dt = 'latest')) AND ((j.dt = 'latest')))) 
{code}

Exception in lensserver logs as a result of above query:
{noformat}
10 Jun 2015 08:56:27,850 [QuerySubmitter] INFO  org.apache.lens.driver.hive.HiveDriver  - Hive driver query:INSERT OVERWRITE DIRECTORY "/tmp/lensreports/hdfsout/1be8556b-64b2-42f0-a32b-45d3be75537f"  SELECT ( apps_owned_metadata . apptitle ) FROM user.uh1_hdfs_user_attributestore_er_cubetable_dimension_gender_state j join user.uh1_hdfs_user_attributestore_er_cubetable_dimension_app_install apps_owned on j.userid = apps_owned.userid and (apps_owned.dt = 'latest') WHERE (((( apps_owned . source ) =  'raf' ) AND ((apps_owned_metadata.dt = 'latest')) AND ((j.dt = 'latest')))) 
10 Jun 2015 08:56:27,851 [QuerySubmitter] INFO  org.apache.lens.driver.hive.HiveDriver  - whetherCalculatePriority: true
10 Jun 2015 08:56:27,852 [QuerySubmitter] INFO  org.apache.lens.server.api.priority.QueryPriorityDecider  - Deciding Priority VERY_HIGH since cost = 0.0
10 Jun 2015 08:56:27,852 [QuerySubmitter] WARN  org.apache.hadoop.conf.Configuration  - mapred.job.priority is deprecated. Instead, use mapreduce.job.priority
10 Jun 2015 08:56:27,852 [QuerySubmitter] INFO  org.apache.lens.driver.hive.HiveDriver  - set priority to VERY_HIGH
10 Jun 2015 08:56:28,468 [QuerySubmitter] INFO  org.apache.lens.driver.hive.HiveDriver  - No hive operation available for 1be8556b-64b2-42f0-a32b-45d3be75537f
10 Jun 2015 08:56:28,469 [QuerySubmitter] ERROR org.apache.lens.server.query.QueryExecutionServiceImpl  - Error launching query 1be8556b-64b2-42f0-a32b-45d3be75537f
org.apache.lens.server.api.error.LensException: Error executing async query
	at org.apache.lens.driver.hive.HiveDriver.executeAsync(HiveDriver.java:524)
	at org.apache.lens.server.query.QueryExecutionServiceImpl$QuerySubmitter.run(QueryExecutionServiceImpl.java:497)
	at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException [Error 10004]: Line 1:409 Invalid table alias or column reference 'apps_owned_metadata': (possible column names are: j.userid, j.timestamp, j.gender, j.gender_state_score, j.gender_state_confidence, j.dt, apps_owned.userid, apps_owned.timestamp, apps_owned.appid, apps_owned.app_install_confidence, apps_owned.dt, apps_owned.source)
	at org.apache.hive.service.cli.thrift.ThriftCLIServiceClient.checkStatus(ThriftCLIServiceClient.java:52)
	at org.apache.hive.service.cli.thrift.ThriftCLIServiceClient.executeStatementInternal(ThriftCLIServiceClient.java:151)
	at org.apache.hive.service.cli.thrift.ThriftCLIServiceClient.executeStatementAsync(ThriftCLIServiceClient.java:139)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hive.service.cli.thrift.RetryingThriftCLIServiceClient.invokeInternal(RetryingThriftCLIServiceClient.java:301)
	at org.apache.hive.service.cli.thrift.RetryingThriftCLIServiceClient.invoke(RetryingThriftCLIServiceClient.java:329)
	at com.sun.proxy.$Proxy59.executeStatementAsync(Unknown Source)
	at org.apache.hive.service.cli.thrift.RetryingThriftCLIServiceClient$CLIServiceClientWrapper.executeStatementAsync(RetryingThriftCLIServiceClient.java:118)
	at org.apache.lens.driver.hive.HiveDriver.executeAsync(HiveDriver.java:515)
	... 2 more
{noformat}


> Dimtable selection not honoring requirements of multiple inter-related join chains properly
> -------------------------------------------------------------------------------------------
>
>                 Key: LENS-598
>                 URL: https://issues.apache.org/jira/browse/LENS-598
>             Project: Apache Lens
>          Issue Type: Bug
>            Reporter: Angad Singh
>            Assignee: Amareshwari Sriramadasu
>
> Scenario:
> We have a cubeX, dimension1, dimension2
> table1, table2 are dimtables of dimension1
> table3 is a dimtable of dimension2
> We have joinchain1 from cubeX to dimension1 to dimension2, and joinchain2 from cubeX to dimension1 directly.
> Problem is that when a cube query is run which selects columns from cubex, dimension1 and dimension2, lens should ideally look at requirements of both joinchain1 and joinchain2 when selecting dimtables of dimension1, but it is not.
> At times when it is failing to join table3, it is selecting the wrong dimtable of dimension1 (table2).
> E.g.:
> Query:
> {noformat}
> cube select apps_owned_title from user j where apps_owned.source='raf'
> {noformat}
> correctly generated hive query (when it selected dimtable table1):
> {code:sql}
> SELECT ( apps_owned_metadata . apptitle ) FROM user.uh1_hdfs_user_attributestore_er_cubetable_dimension_gender_state j join user.uh1_hdfs_user_attributestore_er_cubetable_dimension_bundleid_install apps_owned on j.userid = apps_owned.userid and (apps_owned.dt = 'latest') join user.uh1_hdfs_hive_dimtable_uac_app apps_owned_metadata on apps_owned.bundleid = apps_owned_metadata.bundleid and (apps_owned_metadata.dt = 'latest') WHERE (((( apps_owned . source ) =  'raf' ) AND ((j.dt = 'latest'))))
> {code}
> incorrectly generated query:
> {code:sql}
> SELECT ( apps_owned_metadata . apptitle ) FROM user.uh1_hdfs_user_attributestore_er_cubetable_dimension_gender_state j join user.uh1_hdfs_user_attributestore_er_cubetable_dimension_app_install apps_owned on j.userid = apps_owned.userid and (apps_owned.dt = 'latest') WHERE (((( apps_owned . source ) =  'raf' ) AND ((apps_owned_metadata.dt = 'latest')) AND ((j.dt = 'latest')))) 
> {code}
> Exception in lensserver logs as a result of above query:
> {noformat}
> 10 Jun 2015 08:56:27,850 [QuerySubmitter] INFO  org.apache.lens.driver.hive.HiveDriver  - Hive driver query:INSERT OVERWRITE DIRECTORY "/tmp/lensreports/hdfsout/1be8556b-64b2-42f0-a32b-45d3be75537f"  SELECT ( apps_owned_metadata . apptitle ) FROM user.uh1_hdfs_user_attributestore_er_cubetable_dimension_gender_state j join user.uh1_hdfs_user_attributestore_er_cubetable_dimension_app_install apps_owned on j.userid = apps_owned.userid and (apps_owned.dt = 'latest') WHERE (((( apps_owned . source ) =  'raf' ) AND ((apps_owned_metadata.dt = 'latest')) AND ((j.dt = 'latest')))) 
> 10 Jun 2015 08:56:27,851 [QuerySubmitter] INFO  org.apache.lens.driver.hive.HiveDriver  - whetherCalculatePriority: true
> 10 Jun 2015 08:56:27,852 [QuerySubmitter] INFO  org.apache.lens.server.api.priority.QueryPriorityDecider  - Deciding Priority VERY_HIGH since cost = 0.0
> 10 Jun 2015 08:56:27,852 [QuerySubmitter] WARN  org.apache.hadoop.conf.Configuration  - mapred.job.priority is deprecated. Instead, use mapreduce.job.priority
> 10 Jun 2015 08:56:27,852 [QuerySubmitter] INFO  org.apache.lens.driver.hive.HiveDriver  - set priority to VERY_HIGH
> 10 Jun 2015 08:56:28,468 [QuerySubmitter] INFO  org.apache.lens.driver.hive.HiveDriver  - No hive operation available for 1be8556b-64b2-42f0-a32b-45d3be75537f
> 10 Jun 2015 08:56:28,469 [QuerySubmitter] ERROR org.apache.lens.server.query.QueryExecutionServiceImpl  - Error launching query 1be8556b-64b2-42f0-a32b-45d3be75537f
> org.apache.lens.server.api.error.LensException: Error executing async query
> 	at org.apache.lens.driver.hive.HiveDriver.executeAsync(HiveDriver.java:524)
> 	at org.apache.lens.server.query.QueryExecutionServiceImpl$QuerySubmitter.run(QueryExecutionServiceImpl.java:497)
> 	at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException [Error 10004]: Line 1:409 Invalid table alias or column reference 'apps_owned_metadata': (possible column names are: j.userid, j.timestamp, j.gender, j.gender_state_score, j.gender_state_confidence, j.dt, apps_owned.userid, apps_owned.timestamp, apps_owned.appid, apps_owned.app_install_confidence, apps_owned.dt, apps_owned.source)
> 	at org.apache.hive.service.cli.thrift.ThriftCLIServiceClient.checkStatus(ThriftCLIServiceClient.java:52)
> 	at org.apache.hive.service.cli.thrift.ThriftCLIServiceClient.executeStatementInternal(ThriftCLIServiceClient.java:151)
> 	at org.apache.hive.service.cli.thrift.ThriftCLIServiceClient.executeStatementAsync(ThriftCLIServiceClient.java:139)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:606)
> 	at org.apache.hive.service.cli.thrift.RetryingThriftCLIServiceClient.invokeInternal(RetryingThriftCLIServiceClient.java:301)
> 	at org.apache.hive.service.cli.thrift.RetryingThriftCLIServiceClient.invoke(RetryingThriftCLIServiceClient.java:329)
> 	at com.sun.proxy.$Proxy59.executeStatementAsync(Unknown Source)
> 	at org.apache.hive.service.cli.thrift.RetryingThriftCLIServiceClient$CLIServiceClientWrapper.executeStatementAsync(RetryingThriftCLIServiceClient.java:118)
> 	at org.apache.lens.driver.hive.HiveDriver.executeAsync(HiveDriver.java:515)
> 	... 2 more
> {noformat}
> Note: in above example, cubeX is "user", dimension1 is "user_owned_apps", dimension2 is "hive_dimension_uac_app". table 1 is "user_attributestore_er_cubetable_dimension_bundleid_install", table2 is "user_attributestore_er_cubetable_dimension_app_install", table3 is "hive_dimtable_uac_app".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)