You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@oozie.apache.org by Andrew O'Neill <ao...@paytronix.com> on 2014/02/07 18:39:14 UTC

MSCK REPAIR TABLE not working with oozie hive action

Hello everyone,

I am having an issue running the command “msck repair table <tablename>” from an ooze hive action. My hive action is completing successfully; the table is created but the data doesn’t appear in hive until I manually run “msck repair table checks;”. Why would this command work when run manually, but not work when run via oozie hive action? Again, every other part of the hive script appears to be working as expected. My hive script “updatepartitions.q” is as follows:

create database IF NOT EXISTS ${scope};
use ${scope};
drop table checks;
create external table checks
partitioned by (p_date string, p_merchantId bigint)
row format serde      'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
stored as inputformat 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
outputformat          'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
location              '${datasetBase}/${scope}/checks'
tblproperties (
    'avro.schema.url'='${avroSchemaFile}'
);
msck repair table checks;


And my hive action is:

        <hive xmlns="uri:oozie:hive-action:0.2">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <job-xml>${currentAutomation}/job-defaults.xml</job-xml>
            <script>updatepartitions.q</script>
            <param>scope=${scope}</param>
            <param>datasetBase=${datasetBase}</param>
            <param>avroSchemaFile=${avroSchemaFile}</param>
        </hive>

Any thoughts or feedback are much appreciated. Thanks!


Best,
Andrew

Re: MSCK REPAIR TABLE not working with oozie hive action

Posted by Harsh J <ha...@cloudera.com>.
This seems to be a Hive problem. Are any of your partition directories
already loaded with data? There was a bug in past releases of Hive,
https://issues.apache.org/jira/browse/HIVE-3231, which caused MSCK to
ignore file-loaded partition directories.

If this scenario is likely, but your local Hive version carries the
fix, check if your Oozie sharelib Hive jars are also up-to-date and
matches the same version as your cluster?

On Mon, Feb 10, 2014 at 10:04 PM, Andrew O'Neill <ao...@paytronix.com> wrote:
> Hello,
>
> When I say ³not working,² I mean that the new partitions are not loaded
> into hive. Strangely, the hive action completes successfully and does not
> report any errors. From the MR job logs:
>
> OK
>                               Time taken: 1.49 seconds
>                               OK
>                               Time taken: 0.023 seconds
>                               OK
>                               Time taken: 1.811 seconds
>                               OK
>                               Time taken: 0.328 seconds
>                               OK
>                               Time taken: 1.566 seconds
>
>
> I assume that the five ³OK² messages correspond to the five hive commands
> in my script. From that, it appears that the OEmsck repair table checks¹
> command completes successfully. However, I believe it should be taking
> much longer than 1.5 seconds. For reference, when my hive script is run
> manually, we have something like:
>
> OK
> Time taken: 1.33 seconds
> OK
> Time taken: 0.01 seconds
> OK
> Time taken: 2.681 seconds
> OK
> Time taken: 0.155 seconds
> OK
> Partitions not in
> metastore:      checks:p_date=2013-04-01/p_merchant_id=98       checks:p_date=2013-04-
> 02/p_merchant_id=98     checks:p_date=2013-04-03/p_merchant_id=98       checks:p_date
> =2013-04-04/p_merchant_id=98    checks:p_date=2013-04-05/p_merchant_id=98       chec
> ks:p_date=2013-04-06/p_merchant_id=98   checks:p_date=2013-04-07/p_merchant_i
> d=98    checks:p_date=2013-04-08/p_merchant_id=98       checks:p_date=2013-04-09/p_m
> erchant_id=98   checks:p_date=2013-04-10/p_merchant_id=98       checks:p_date=2013-
> 04-11/p_merchant_id=98  checks:p_date=2013-04-12/p_merchant_id=98       checks:p_d
> ate=2013-04-13/p_merchant_id=98 checks:p_date=2013-04-14/p_merchant_id=98       c
> hecks:p_date=2013-04-15/p_merchant_id=98        checks:p_date=2013-04-16/p_merchan
> t_id=98 checks:p_date=2013-04-17/p_merchant_id=98       checks:p_date=2013-04-18/
> p_merchant_id=98        checks:p_date=2013-04-19/p_merchant_id=98       checks:p_date=20
> 13-04-20/p_merchant_id=98       checks:p_date=2013-04-21/p_merchant_id=98       checks:
> p_date=2013-04-22/p_merchant_id=98      checks:p_date=2013-04-23/p_merchant_id=9
> 8       checks:p_date=2013-04-24/p_merchant_id=98       checks:p_date=2013-04-25/p_merc
> hant_id=98      checks:p_date=2013-04-26/p_merchant_id=98       checks:p_date=2013-04-
> 27/p_merchant_id=98     checks:p_date=2013-04-28/p_merchant_id=98       checks:p_date
> =2013-04-29/p_merchant_id=98    checks:p_date=2013-04-30/p_merchant_id=98       chec
> ks:p_date=2013-05-01/p_merchant_id=98   checks:p_date=2013-11-01/p_merchant_i
> d=142   checks:p_date=2013-11-02/p_merchant_id=142
> Repair: Added partition to metastore
> checks:p_date=2013-04-01/p_merchant_id=98
> Repair: Added partition to metastore
> checks:p_date=2013-04-02/p_merchant_id=98
> Repair: Added partition to metastore
> checks:p_date=2013-04-03/p_merchant_id=98
> Repair: Added partition to metastore
> checks:p_date=2013-04-04/p_merchant_id=98
> Repair: Added partition to metastore
> checks:p_date=2013-04-05/p_merchant_id=98
> Repair: Added partition to metastore
> checks:p_date=2013-04-06/p_merchant_id=98
> Repair: Added partition to metastore
> checks:p_date=2013-04-07/p_merchant_id=98
> Repair: Added partition to metastore
> checks:p_date=2013-04-08/p_merchant_id=98
> Repair: Added partition to metastore
> checks:p_date=2013-04-09/p_merchant_id=98
> Repair: Added partition to metastore
> checks:p_date=2013-04-10/p_merchant_id=98
> Repair: Added partition to metastore
> checks:p_date=2013-04-11/p_merchant_id=98
> Repair: Added partition to metastore
> checks:p_date=2013-04-12/p_merchant_id=98
> Repair: Added partition to metastore
> checks:p_date=2013-04-13/p_merchant_id=98
> Repair: Added partition to metastore
> checks:p_date=2013-04-14/p_merchant_id=98
> Repair: Added partition to metastore
> checks:p_date=2013-04-15/p_merchant_id=98
> Repair: Added partition to metastore
> checks:p_date=2013-04-16/p_merchant_id=98
> Repair: Added partition to metastore
> checks:p_date=2013-04-17/p_merchant_id=98
> Repair: Added partition to metastore
> checks:p_date=2013-04-18/p_merchant_id=98
> Repair: Added partition to metastore
> checks:p_date=2013-04-19/p_merchant_id=98
> Repair: Added partition to metastore
> checks:p_date=2013-04-20/p_merchant_id=98
> Repair: Added partition to metastore
> checks:p_date=2013-04-21/p_merchant_id=98
> Repair: Added partition to metastore
> checks:p_date=2013-04-22/p_merchant_id=98
> Repair: Added partition to metastore
> checks:p_date=2013-04-23/p_merchant_id=98
> Repair: Added partition to metastore
> checks:p_date=2013-04-24/p_merchant_id=98
> Repair: Added partition to metastore
> checks:p_date=2013-04-25/p_merchant_id=98
> Repair: Added partition to metastore
> checks:p_date=2013-04-26/p_merchant_id=98
> Repair: Added partition to metastore
> checks:p_date=2013-04-27/p_merchant_id=98
> Repair: Added partition to metastore
> checks:p_date=2013-04-28/p_merchant_id=98
> Repair: Added partition to metastore
> checks:p_date=2013-04-29/p_merchant_id=98
> Repair: Added partition to metastore
> checks:p_date=2013-04-30/p_merchant_id=98
> Repair: Added partition to metastore
> checks:p_date=2013-05-01/p_merchant_id=98
> Repair: Added partition to metastore
> checks:p_date=2013-11-01/p_merchant_id=142
> Repair: Added partition to metastore
> checks:p_date=2013-11-02/p_merchant_id=142
> Time taken: 6.12 seconds
>
>
>
> I am confused as to why running the script manually would work but the
> hive action would not. Thanks for your help!
>
>
>
> Best,
> Andrew
>
>
> On 2/8/14, 1:02 PM, "Harsh J" <ha...@cloudera.com> wrote:
>
>>When you say it is 'not working', what do you mean? What error do you
>>observe when you check task logs of the spawned launcher MR job of the
>>Hive Action?
>>
>>On Fri, Feb 7, 2014 at 11:09 PM, Andrew O'Neill <ao...@paytronix.com>
>>wrote:
>>> Hello everyone,
>>>
>>> I am having an issue running the command "msck repair table
>>><tablename>" from an ooze hive action. My hive action is completing
>>>successfully; the table is created but the data doesn't appear in hive
>>>until I manually run "msck repair table checks;". Why would this command
>>>work when run manually, but not work when run via oozie hive action?
>>>Again, every other part of the hive script appears to be working as
>>>expected. My hive script "updatepartitions.q" is as follows:
>>>
>>> create database IF NOT EXISTS ${scope};
>>> use ${scope};
>>> drop table checks;
>>> create external table checks
>>> partitioned by (p_date string, p_merchantId bigint)
>>> row format serde      'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
>>> stored as inputformat
>>>'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
>>> outputformat
>>>'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
>>> location              '${datasetBase}/${scope}/checks'
>>> tblproperties (
>>>     'avro.schema.url'='${avroSchemaFile}'
>>> );
>>> msck repair table checks;
>>>
>>>
>>> And my hive action is:
>>>
>>>         <hive xmlns="uri:oozie:hive-action:0.2">
>>>             <job-tracker>${jobTracker}</job-tracker>
>>>             <name-node>${nameNode}</name-node>
>>>             <job-xml>${currentAutomation}/job-defaults.xml</job-xml>
>>>             <script>updatepartitions.q</script>
>>>             <param>scope=${scope}</param>
>>>             <param>datasetBase=${datasetBase}</param>
>>>             <param>avroSchemaFile=${avroSchemaFile}</param>
>>>         </hive>
>>>
>>> Any thoughts or feedback are much appreciated. Thanks!
>>>
>>>
>>> Best,
>>> Andrew
>>
>>
>>
>>--
>>Harsh J
>



-- 
Harsh J

Re: MSCK REPAIR TABLE not working with oozie hive action

Posted by Andrew O'Neill <ao...@paytronix.com>.
Hello,

When I say ³not working,² I mean that the new partitions are not loaded
into hive. Strangely, the hive action completes successfully and does not
report any errors. From the MR job logs:

OK
                              Time taken: 1.49 seconds
                              OK
                              Time taken: 0.023 seconds
                              OK
                              Time taken: 1.811 seconds
                              OK
                              Time taken: 0.328 seconds
                              OK
                              Time taken: 1.566 seconds


I assume that the five ³OK² messages correspond to the five hive commands
in my script. From that, it appears that the Œmsck repair table checks¹
command completes successfully. However, I believe it should be taking
much longer than 1.5 seconds. For reference, when my hive script is run
manually, we have something like:

OK
Time taken: 1.33 seconds
OK
Time taken: 0.01 seconds
OK
Time taken: 2.681 seconds
OK
Time taken: 0.155 seconds
OK
Partitions not in 
metastore:	checks:p_date=2013-04-01/p_merchant_id=98	checks:p_date=2013-04-
02/p_merchant_id=98	checks:p_date=2013-04-03/p_merchant_id=98	checks:p_date
=2013-04-04/p_merchant_id=98	checks:p_date=2013-04-05/p_merchant_id=98	chec
ks:p_date=2013-04-06/p_merchant_id=98	checks:p_date=2013-04-07/p_merchant_i
d=98	checks:p_date=2013-04-08/p_merchant_id=98	checks:p_date=2013-04-09/p_m
erchant_id=98	checks:p_date=2013-04-10/p_merchant_id=98	checks:p_date=2013-
04-11/p_merchant_id=98	checks:p_date=2013-04-12/p_merchant_id=98	checks:p_d
ate=2013-04-13/p_merchant_id=98	checks:p_date=2013-04-14/p_merchant_id=98	c
hecks:p_date=2013-04-15/p_merchant_id=98	checks:p_date=2013-04-16/p_merchan
t_id=98	checks:p_date=2013-04-17/p_merchant_id=98	checks:p_date=2013-04-18/
p_merchant_id=98	checks:p_date=2013-04-19/p_merchant_id=98	checks:p_date=20
13-04-20/p_merchant_id=98	checks:p_date=2013-04-21/p_merchant_id=98	checks:
p_date=2013-04-22/p_merchant_id=98	checks:p_date=2013-04-23/p_merchant_id=9
8	checks:p_date=2013-04-24/p_merchant_id=98	checks:p_date=2013-04-25/p_merc
hant_id=98	checks:p_date=2013-04-26/p_merchant_id=98	checks:p_date=2013-04-
27/p_merchant_id=98	checks:p_date=2013-04-28/p_merchant_id=98	checks:p_date
=2013-04-29/p_merchant_id=98	checks:p_date=2013-04-30/p_merchant_id=98	chec
ks:p_date=2013-05-01/p_merchant_id=98	checks:p_date=2013-11-01/p_merchant_i
d=142	checks:p_date=2013-11-02/p_merchant_id=142
Repair: Added partition to metastore
checks:p_date=2013-04-01/p_merchant_id=98
Repair: Added partition to metastore
checks:p_date=2013-04-02/p_merchant_id=98
Repair: Added partition to metastore
checks:p_date=2013-04-03/p_merchant_id=98
Repair: Added partition to metastore
checks:p_date=2013-04-04/p_merchant_id=98
Repair: Added partition to metastore
checks:p_date=2013-04-05/p_merchant_id=98
Repair: Added partition to metastore
checks:p_date=2013-04-06/p_merchant_id=98
Repair: Added partition to metastore
checks:p_date=2013-04-07/p_merchant_id=98
Repair: Added partition to metastore
checks:p_date=2013-04-08/p_merchant_id=98
Repair: Added partition to metastore
checks:p_date=2013-04-09/p_merchant_id=98
Repair: Added partition to metastore
checks:p_date=2013-04-10/p_merchant_id=98
Repair: Added partition to metastore
checks:p_date=2013-04-11/p_merchant_id=98
Repair: Added partition to metastore
checks:p_date=2013-04-12/p_merchant_id=98
Repair: Added partition to metastore
checks:p_date=2013-04-13/p_merchant_id=98
Repair: Added partition to metastore
checks:p_date=2013-04-14/p_merchant_id=98
Repair: Added partition to metastore
checks:p_date=2013-04-15/p_merchant_id=98
Repair: Added partition to metastore
checks:p_date=2013-04-16/p_merchant_id=98
Repair: Added partition to metastore
checks:p_date=2013-04-17/p_merchant_id=98
Repair: Added partition to metastore
checks:p_date=2013-04-18/p_merchant_id=98
Repair: Added partition to metastore
checks:p_date=2013-04-19/p_merchant_id=98
Repair: Added partition to metastore
checks:p_date=2013-04-20/p_merchant_id=98
Repair: Added partition to metastore
checks:p_date=2013-04-21/p_merchant_id=98
Repair: Added partition to metastore
checks:p_date=2013-04-22/p_merchant_id=98
Repair: Added partition to metastore
checks:p_date=2013-04-23/p_merchant_id=98
Repair: Added partition to metastore
checks:p_date=2013-04-24/p_merchant_id=98
Repair: Added partition to metastore
checks:p_date=2013-04-25/p_merchant_id=98
Repair: Added partition to metastore
checks:p_date=2013-04-26/p_merchant_id=98
Repair: Added partition to metastore
checks:p_date=2013-04-27/p_merchant_id=98
Repair: Added partition to metastore
checks:p_date=2013-04-28/p_merchant_id=98
Repair: Added partition to metastore
checks:p_date=2013-04-29/p_merchant_id=98
Repair: Added partition to metastore
checks:p_date=2013-04-30/p_merchant_id=98
Repair: Added partition to metastore
checks:p_date=2013-05-01/p_merchant_id=98
Repair: Added partition to metastore
checks:p_date=2013-11-01/p_merchant_id=142
Repair: Added partition to metastore
checks:p_date=2013-11-02/p_merchant_id=142
Time taken: 6.12 seconds



I am confused as to why running the script manually would work but the
hive action would not. Thanks for your help!



Best,
Andrew


On 2/8/14, 1:02 PM, "Harsh J" <ha...@cloudera.com> wrote:

>When you say it is 'not working', what do you mean? What error do you
>observe when you check task logs of the spawned launcher MR job of the
>Hive Action?
>
>On Fri, Feb 7, 2014 at 11:09 PM, Andrew O'Neill <ao...@paytronix.com>
>wrote:
>> Hello everyone,
>>
>> I am having an issue running the command "msck repair table
>><tablename>" from an ooze hive action. My hive action is completing
>>successfully; the table is created but the data doesn't appear in hive
>>until I manually run "msck repair table checks;". Why would this command
>>work when run manually, but not work when run via oozie hive action?
>>Again, every other part of the hive script appears to be working as
>>expected. My hive script "updatepartitions.q" is as follows:
>>
>> create database IF NOT EXISTS ${scope};
>> use ${scope};
>> drop table checks;
>> create external table checks
>> partitioned by (p_date string, p_merchantId bigint)
>> row format serde      'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
>> stored as inputformat
>>'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
>> outputformat    
>>'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
>> location              '${datasetBase}/${scope}/checks'
>> tblproperties (
>>     'avro.schema.url'='${avroSchemaFile}'
>> );
>> msck repair table checks;
>>
>>
>> And my hive action is:
>>
>>         <hive xmlns="uri:oozie:hive-action:0.2">
>>             <job-tracker>${jobTracker}</job-tracker>
>>             <name-node>${nameNode}</name-node>
>>             <job-xml>${currentAutomation}/job-defaults.xml</job-xml>
>>             <script>updatepartitions.q</script>
>>             <param>scope=${scope}</param>
>>             <param>datasetBase=${datasetBase}</param>
>>             <param>avroSchemaFile=${avroSchemaFile}</param>
>>         </hive>
>>
>> Any thoughts or feedback are much appreciated. Thanks!
>>
>>
>> Best,
>> Andrew
>
>
>
>-- 
>Harsh J


Re: MSCK REPAIR TABLE not working with oozie hive action

Posted by Harsh J <ha...@cloudera.com>.
When you say it is 'not working', what do you mean? What error do you
observe when you check task logs of the spawned launcher MR job of the
Hive Action?

On Fri, Feb 7, 2014 at 11:09 PM, Andrew O'Neill <ao...@paytronix.com> wrote:
> Hello everyone,
>
> I am having an issue running the command "msck repair table <tablename>" from an ooze hive action. My hive action is completing successfully; the table is created but the data doesn't appear in hive until I manually run "msck repair table checks;". Why would this command work when run manually, but not work when run via oozie hive action? Again, every other part of the hive script appears to be working as expected. My hive script "updatepartitions.q" is as follows:
>
> create database IF NOT EXISTS ${scope};
> use ${scope};
> drop table checks;
> create external table checks
> partitioned by (p_date string, p_merchantId bigint)
> row format serde      'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
> stored as inputformat 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
> outputformat          'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
> location              '${datasetBase}/${scope}/checks'
> tblproperties (
>     'avro.schema.url'='${avroSchemaFile}'
> );
> msck repair table checks;
>
>
> And my hive action is:
>
>         <hive xmlns="uri:oozie:hive-action:0.2">
>             <job-tracker>${jobTracker}</job-tracker>
>             <name-node>${nameNode}</name-node>
>             <job-xml>${currentAutomation}/job-defaults.xml</job-xml>
>             <script>updatepartitions.q</script>
>             <param>scope=${scope}</param>
>             <param>datasetBase=${datasetBase}</param>
>             <param>avroSchemaFile=${avroSchemaFile}</param>
>         </hive>
>
> Any thoughts or feedback are much appreciated. Thanks!
>
>
> Best,
> Andrew



-- 
Harsh J