You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@atlas.apache.org by "Suma Shivaprasad (JIRA)" <ji...@apache.org> on 2016/06/20 03:50:05 UTC
[jira] [Comment Edited] (ATLAS-904) Hive hook fails due to session state not being set

    [ https://issues.apache.org/jira/browse/ATLAS-904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15338949#comment-15338949 ] 

Suma Shivaprasad edited comment on ATLAS-904 at 6/20/16 3:49 AM:
-----------------------------------------------------------------

Changes to address [~yhemanth] review comments. 

1. Process qualified name = HiveOperation.name + sorted inputs + sorted outputs
2. HiveOperation.name doesnt provide identifiers for identiifying  INSERT, INSERT_OVERWRITE, UPDATE, DELETE etc separately . Hence adding WriteEntity.WriteType as well which exhibits the following behaviour

a. If there are multiple outputs, for each output, adds the query type(WriteType)
b. if query being run if is type INSERT [into/overwrite] TABLE [PARTITION],  WriteType is INSERT/INSERT_OVERWRITE
b. If query is of type INSERT OVERWRITE hdfs_path, adds WriteType as PATH_WRITE
c. If query is of type UPDATE/DELETE, adds type as UPDATE/DELETE [ Note - linage is not available for this since this is single table operation]
3.When input is of type local dir or hdfs path currently, it doesnt add it to qualified name. The reason is that partition based paths cause a lot of processes to be created in this case instead of updating the same process.


Pending:

Address [~shwethags] suggestion to add hdfs paths to process qualified name only in case of non-partition based queries. This needs to be done per HiveOperation type

1. if HiveOperation = LOAD, IMPORT, EXPORT - detect if the current query context is dealing with partitions and do not add if it is partition based.
2. If HiveOperation = INSERT OVERWRITE DFS_PATH/LOCAL_PATH , then detect if the query context is dealing with a partitioned table in inputs and decide if we need to add or not.









was (Author: suma.shivaprasad):
Changes to address [~yhemanth] review comments. 

1. Process qualified name = HiveOperation.name + sorted inputs + sorted outputs
2. HiveOperation.name doesnt provide identifiers for identiifying  INSERT, INSERT_OVERWRITE, UPDATE, DELETE etc separately . Hence adding WriteEntity.WriteType as well which exhibits the following behaviour

a. If there are multiple outputs, for each output, adds the query type(WriteType)
b. if query being run if is type INSERT [into/overwrite] TABLE [PARTITION],  WriteType is INSERT/INSERT_OVERWRITE
b. If query is of type INSERT OVERWRITE hdfs_path, adds WriteType as PATH_WRITE
c. If query is of type UPDATE/DELETE, adds type as UPDATE/DELETE [ Note - linage is not available for this since this is single table operation]





> Hive hook fails due to session state not being set
> --------------------------------------------------
>
>                 Key: ATLAS-904
>                 URL: https://issues.apache.org/jira/browse/ATLAS-904
>             Project: Atlas
>          Issue Type: Bug
>    Affects Versions: 0.7-incubating
>            Reporter: Suma Shivaprasad
>            Assignee: Suma Shivaprasad
>            Priority: Blocker
>             Fix For: 0.7-incubating
>
>         Attachments: ATLAS-904.1.patch, ATLAS-904.2.patch, ATLAS-904.patch
>
>
> {noformat}
> 2016-06-15 11:34:30,423 WARN  [Atlas Logger 0]: hook.HiveHook (HiveHook.java:normalize(557)) - Could not rewrite query due to error. Proceeding with original query EXPORT TABLE test_export_table to 'hdfs://localhost:9000/hive_tables/test_path1'
> java.lang.NullPointerException: Conf non-local session path expected to be non-null
> 	at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:204)
> 	at org.apache.hadoop.hive.ql.session.SessionState.getHDFSSessionPath(SessionState.java:641)
> 	at org.apache.hadoop.hive.ql.Context.<init>(Context.java:133)
> 	at org.apache.hadoop.hive.ql.Context.<init>(Context.java:120)
> 	at org.apache.atlas.hive.rewrite.HiveASTRewriter.<init>(HiveASTRewriter.java:44)
> 	at org.apache.atlas.hive.hook.HiveHook.normalize(HiveHook.java:554)
> 	at org.apache.atlas.hive.hook.HiveHook.getProcessReferenceable(HiveHook.java:702)
> 	at org.apache.atlas.hive.hook.HiveHook.registerProcess(HiveHook.java:596)
> 	at org.apache.atlas.hive.hook.HiveHook.fireAndForget(HiveHook.java:222)
> 	at org.apache.atlas.hive.hook.HiveHook.access$200(HiveHook.java:77)
> 	at org.apache.atlas.hive.hook.HiveHook$2.run(HiveHook.java:182)
> 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> 	at java.lang.Thread.run(Thread.java:745)
> 2016-06-15 11:34:30,423 ERROR [Atlas Logger 0]: hook.HiveHook (HiveHook.java:run(184)) - Atlas hook failed due to error
> java.lang.NullPointerException
> 	at java.lang.StringBuilder.<init>(StringBuilder.java:109)
> 	at org.apache.atlas.hive.hook.HiveHook.getProcessQualifiedName(HiveHook.java:738)
> 	at org.apache.atlas.hive.hook.HiveHook.getProcessReferenceable(HiveHook.java:703)
> 	at org.apache.atlas.hive.hook.HiveHook.registerProcess(HiveHook.java:596)
> 	at org.apache.atlas.hive.hook.HiveHook.fireAndForget(HiveHook.java:222)
> 	at org.apache.atlas.hive.hook.HiveHook.access$200(HiveHook.java:77)
> 	at org.apache.atlas.hive.hook.HiveHook$2.run(HiveHook.java:182)
> 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> 	at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)