You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by patcharee <Pa...@uni.no> on 2015/04/24 09:33:01 UTC

hive on Tez - merging orc files

Hi,

Is there anyone using hortonworks sandbox 2.2? I am trying to use hive 
on Tez on the sandbox. I set the running engine in hive-site.xml to Tez.

     <property>
       <name>hive.execution.engine</name>
       <value>tez</value>
     </property>

Then I ran the script that alters a table to merge small orc files 
(alter table orc_merge5a partition(st=0.8) concatenate;). The merging 
feature worked, but Hive does not use Tez, it used MapReduce, so weird!

Another point, I tried to run the same script on the production cluster 
which is on always Tez, the merging feature sometimes worked, sometimes 
did not.

I would appreciate any suggestions.

BR,
Patcharee

Re: hive on Tez - merging orc files

Posted by patcharee <Pa...@uni.no>.
Hi,

I generated the new hive-exec.jar as you suggested. On the sandbox, Hive 
0.14 with the new jar file is now using Tez to alter table concate, and 
It concatenates files correctly on Tez. Thanks!

However I also tested on the production cluster using Hive 0.14 as well, 
merging did not work and generated another exception below

2015-04-24 13:01:52,259 INFO [main] app.DAGAppMaster: Running DAG: alter 
table orc_merge5a partit...concatenate
2015-04-24 13:01:52,355 INFO [IPC Server handler 0 on 46526] ipc.Server: 
IPC Server handler 0 on 46526, call 
org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPB.getDAGStatus 
from 10.2.1.254:39356 Call#361 Retry#0
org.apache.tez.dag.api.TezException: No running dag at present
         at 
org.apache.tez.dag.api.client.DAGClientHandler.getDAG(DAGClientHandler.java:84)
         at 
org.apache.tez.dag.api.client.DAGClientHandler.getACLManager(DAGClientHandler.java:151)
         at 
org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPBServerImpl.getDAGStatus(DAGClientAMProtocolBlockingPBServerImpl.java:94)
         at 
org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolRPC$DAGClientAMProtocol$2.callBlockingMethod(DAGClientAMProtocolRPC.java:7375)
         at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
         at java.security.AccessController.doPrivileged(Native Method)
         at javax.security.auth.Subject.doAs(Subject.java:415)
         at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)

What could be the cause of this exception? Any ideas?

BR,
Patcharee

On 24. april 2015 10:27, Prasanth Jayachandran wrote:
> You can download the branch-0.14 source code from https://github.com/apache/hive/tree/branch-0.14, apply HIVE-9529-branch-1.0.0.patch from https://issues.apache.org/jira/browse/HIVE-9529 and compile it using “mvn clean install -DskipTests -Phadoop-2,dist”. This will generate tar file under hive/packaging/target. You can extract the tar file, copy the hive-exec-x.x.x.jar into /usr/hdp/2.2.*.*/hive/lib/ (take backup of hive-exec.jar and replace with the new one). Rerunning hive cli should use the new hive-exec jar with the patch.
>
> Thanks
> Prasanth
>
>> On Apr 24, 2015, at 1:15 AM, patcharee <Pa...@uni.no> wrote:
>>
>> Hi,
>>
>> The sandbox 2.2 comes with hive 0.14. Does it also have the bug? If so, how can I patch hive on sandbox?
>>
>> BR,
>> Patcharee
>>
>> On 24. april 2015 09:42, Prasanth Jayachandran wrote:
>>> Hi
>>>
>>> This has been fixed recently https://issues.apache.org/jira/browse/HIVE-9529. Merging is triggered in two different ways. INSERT/CTAS can trigger merging of small files and CONCATENATE can trigger merging of small files. The later had a bug which generated MR task instead of TEZ task which was fixed recently. Earlier one will use TEZ task always.
>>>
>>> Thanks
>>> Prasanth
>>>
>>>> On Apr 24, 2015, at 12:33 AM, patcharee <Pa...@uni.no> wrote:
>>>>
>>>> Hi,
>>>>
>>>> Is there anyone using hortonworks sandbox 2.2? I am trying to use hive on Tez on the sandbox. I set the running engine in hive-site.xml to Tez.
>>>>
>>>>     <property>
>>>>       <name>hive.execution.engine</name>
>>>>       <value>tez</value>
>>>>     </property>
>>>>
>>>> Then I ran the script that alters a table to merge small orc files (alter table orc_merge5a partition(st=0.8) concatenate;). The merging feature worked, but Hive does not use Tez, it used MapReduce, so weird!
>>>>
>>>> Another point, I tried to run the same script on the production cluster which is on always Tez, the merging feature sometimes worked, sometimes did not.
>>>>
>>>> I would appreciate any suggestions.
>>>>
>>>> BR,
>>>> Patcharee


Re: hive on Tez - merging orc files

Posted by Prasanth Jayachandran <pj...@hortonworks.com>.
You can download the branch-0.14 source code from https://github.com/apache/hive/tree/branch-0.14, apply HIVE-9529-branch-1.0.0.patch from https://issues.apache.org/jira/browse/HIVE-9529 and compile it using “mvn clean install -DskipTests -Phadoop-2,dist”. This will generate tar file under hive/packaging/target. You can extract the tar file, copy the hive-exec-x.x.x.jar into /usr/hdp/2.2.*.*/hive/lib/ (take backup of hive-exec.jar and replace with the new one). Rerunning hive cli should use the new hive-exec jar with the patch.

Thanks
Prasanth

> On Apr 24, 2015, at 1:15 AM, patcharee <Pa...@uni.no> wrote:
> 
> Hi,
> 
> The sandbox 2.2 comes with hive 0.14. Does it also have the bug? If so, how can I patch hive on sandbox?
> 
> BR,
> Patcharee
> 
> On 24. april 2015 09:42, Prasanth Jayachandran wrote:
>> Hi
>> 
>> This has been fixed recently https://issues.apache.org/jira/browse/HIVE-9529. Merging is triggered in two different ways. INSERT/CTAS can trigger merging of small files and CONCATENATE can trigger merging of small files. The later had a bug which generated MR task instead of TEZ task which was fixed recently. Earlier one will use TEZ task always.
>> 
>> Thanks
>> Prasanth
>> 
>>> On Apr 24, 2015, at 12:33 AM, patcharee <Pa...@uni.no> wrote:
>>> 
>>> Hi,
>>> 
>>> Is there anyone using hortonworks sandbox 2.2? I am trying to use hive on Tez on the sandbox. I set the running engine in hive-site.xml to Tez.
>>> 
>>>    <property>
>>>      <name>hive.execution.engine</name>
>>>      <value>tez</value>
>>>    </property>
>>> 
>>> Then I ran the script that alters a table to merge small orc files (alter table orc_merge5a partition(st=0.8) concatenate;). The merging feature worked, but Hive does not use Tez, it used MapReduce, so weird!
>>> 
>>> Another point, I tried to run the same script on the production cluster which is on always Tez, the merging feature sometimes worked, sometimes did not.
>>> 
>>> I would appreciate any suggestions.
>>> 
>>> BR,
>>> Patcharee
> 


Re: hive on Tez - merging orc files

Posted by patcharee <Pa...@uni.no>.
Hi,

The sandbox 2.2 comes with hive 0.14. Does it also have the bug? If so, 
how can I patch hive on sandbox?

BR,
Patcharee

On 24. april 2015 09:42, Prasanth Jayachandran wrote:
> Hi
>
> This has been fixed recently https://issues.apache.org/jira/browse/HIVE-9529. Merging is triggered in two different ways. INSERT/CTAS can trigger merging of small files and CONCATENATE can trigger merging of small files. The later had a bug which generated MR task instead of TEZ task which was fixed recently. Earlier one will use TEZ task always.
>
> Thanks
> Prasanth
>
>> On Apr 24, 2015, at 12:33 AM, patcharee <Pa...@uni.no> wrote:
>>
>> Hi,
>>
>> Is there anyone using hortonworks sandbox 2.2? I am trying to use hive on Tez on the sandbox. I set the running engine in hive-site.xml to Tez.
>>
>>     <property>
>>       <name>hive.execution.engine</name>
>>       <value>tez</value>
>>     </property>
>>
>> Then I ran the script that alters a table to merge small orc files (alter table orc_merge5a partition(st=0.8) concatenate;). The merging feature worked, but Hive does not use Tez, it used MapReduce, so weird!
>>
>> Another point, I tried to run the same script on the production cluster which is on always Tez, the merging feature sometimes worked, sometimes did not.
>>
>> I would appreciate any suggestions.
>>
>> BR,
>> Patcharee


Re: hive on Tez - merging orc files

Posted by Prasanth Jayachandran <pj...@hortonworks.com>.
Hi

This has been fixed recently https://issues.apache.org/jira/browse/HIVE-9529. Merging is triggered in two different ways. INSERT/CTAS can trigger merging of small files and CONCATENATE can trigger merging of small files. The later had a bug which generated MR task instead of TEZ task which was fixed recently. Earlier one will use TEZ task always.

Thanks
Prasanth

> On Apr 24, 2015, at 12:33 AM, patcharee <Pa...@uni.no> wrote:
> 
> Hi,
> 
> Is there anyone using hortonworks sandbox 2.2? I am trying to use hive on Tez on the sandbox. I set the running engine in hive-site.xml to Tez.
> 
>    <property>
>      <name>hive.execution.engine</name>
>      <value>tez</value>
>    </property>
> 
> Then I ran the script that alters a table to merge small orc files (alter table orc_merge5a partition(st=0.8) concatenate;). The merging feature worked, but Hive does not use Tez, it used MapReduce, so weird!
> 
> Another point, I tried to run the same script on the production cluster which is on always Tez, the merging feature sometimes worked, sometimes did not.
> 
> I would appreciate any suggestions.
> 
> BR,
> Patcharee