You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by Chunky Gupta <ch...@vizury.com> on 2012/11/05 12:34:05 UTC

Alter table is giving error

Hi,

I am having a cluster setup on EC2 with Hadoop version 0.20.2 and Hive
version 0.8.1 (I configured everything) . I have created a table using :-

CREATE EXTERNAL TABLE XXX ( YYY )PARTITIONED BY ( ZZZ )ROW FORMAT DELIMITED
FIELDS TERMINATED BY 'WWW' LOCATION 's3://my-location/data/';

Now I am trying to recover partition using :-

ALTER TABLE XXX RECOVER PARTITIONS;

but I am getting this error :- "FAILED: Parse Error: line 1:12 cannot
recognize input near 'XXX' 'RECOVER' 'PARTITIONS' in alter table statement"

Doing same steps on a cluster setup on EMR with Hadoop version 1.0.3 and
Hive version 0.8.1 (Configured by EMR), works fine.

So is this a version issue or am I missing some configuration changes in
EC2 setup ?
I am not able to find exact solution for this problem on internet. Please
help me.

Thanks,
Chunky.

Re: Alter table is giving error

Posted by Dean Wampler <de...@thinkbiganalytics.com>.

Right, your CREATE TABLE statement now points to your S3 location, so you
don't need to do anything else. However, queries will pull this data from
S3 every time, which will be a little slower and you'll incur a small
charge for reading from S3. However, parking data there is great when you
only need occasional access to it, not frequent access where using an HDFS
location is better.

However, as a side note, the message informs you that you can't use an S3
location in a LOAD DATA statement. So, if you ever define a
managed/internal table and want to populate it with S3 data, you'll have to
copy the data from S3 to your cluster first, then load it from there.

dean

On Tue, Nov 27, 2012 at 2:53 PM, Mark Grover <gr...@gmail.com>wrote:

> Chunky,
> You have an external table that points at the location s3://location/
>
> No need to load the data. All files (or partitions folders) under
> s3://location/ should be available via the table.
> Just run your queries on it.
>
> Load data will move the data from one HDFS location to another. You don't
> need/want to do that in this case.
>
> Mark
>
> On Tue, Nov 27, 2012 at 12:18 PM, Chunky Gupta <ch...@vizury.com>wrote:
>
>> Hi,
>>
>> Now when I am trying to load a csv file to any table I created, its not
>> working.
>>
>> I created a table :-
>> CREATE EXTERNAL TABLE someidtable (
>> someid STRING,
>> )
>> ROW FORMAT
>> DELIMITED FIELDS TERMINATED BY '\t'
>> LINES TERMINATED BY '\n'
>> LOCATION 's3://location/';
>>
>> Then
>>
>> LOAD DATA INPATH 's3://location/someidexcel.csv' INTO TABLE someidtable;
>>
>> It gives this error:-
>> "Error in semantic analysis: Line 1:17 Invalid path
>> ''s3n://location/someidexcel.csv'': only "file" or "hdfs" file systems
>> accepted"
>>
>> Please help me in resolving this issue.
>> Thanks,
>> Chunky.
>>
>>
>> On Wed, Nov 7, 2012 at 6:43 PM, Chunky Gupta <ch...@vizury.com>wrote:
>>
>>> Okay Mark, I will be looking into this JIRA regularly.
>>> Thanks again for helping.
>>> Chunky.
>>>
>>>
>>> On Wed, Nov 7, 2012 at 12:22 PM, Mark Grover <
>>> grover.markgrover@gmail.com> wrote:
>>>
>>>> Chunky,
>>>> I just tried it myself. It turns out that the directory you are adding
>>>> as partition has to be empty for msck repair to work. This is obviously
>>>> sub-optimal and there is a JIRA in place (
>>>> https://issues.apache.org/jira/browse/HIVE-3231) to fix it.
>>>>
>>>> So, I'd suggest you keep an eye out for the next version for that fix
>>>> to come in. In the meanwhile, run msck after you create your partition
>>>> directory but before you populate your directory with data.
>>>>
>>>> Mark
>>>>
>>>>
>>>> On Tue, Nov 6, 2012 at 10:33 PM, Chunky Gupta <ch...@vizury.com>wrote:
>>>>
>>>>> Hi Mark,
>>>>> Sorry, I forgot to mention. I have also tried
>>>>>                 msck repair table <Table name>;
>>>>> and same output I got which I got from msck only.
>>>>> Do I need to do any other settings for this to work, because I have
>>>>> prepared Hadoop and Hive setup from start on EC2.
>>>>>
>>>>> Thanks,
>>>>> Chunky.
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Nov 7, 2012 at 11:58 AM, Mark Grover <
>>>>> grover.markgrover@gmail.com> wrote:
>>>>>
>>>>>> Chunky,
>>>>>> You should have run:
>>>>>> msck repair table <Table name>;
>>>>>>
>>>>>> Sorry, I should have made it clear in my last reply. I have added an
>>>>>> entry to Hive wiki for benefit of others:
>>>>>>
>>>>>> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Recoverpartitions
>>>>>>
>>>>>> Mark
>>>>>>
>>>>>>
>>>>>> On Tue, Nov 6, 2012 at 9:55 PM, Chunky Gupta <chunky.gupta@vizury.com
>>>>>> > wrote:
>>>>>>
>>>>>>> Hi Mark,
>>>>>>> I didn't get any error.
>>>>>>> I ran this on hive console:-
>>>>>>>          "msck table Table_Name;"
>>>>>>> It says Ok and showed the execution time as 1.050 sec.
>>>>>>> But when I checked partitions for table using
>>>>>>>           "show partitions Table_Name;"
>>>>>>> It didn't show me any partitions.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Chunky.
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Nov 6, 2012 at 10:38 PM, Mark Grover <
>>>>>>> grover.markgrover@gmail.com> wrote:
>>>>>>>
>>>>>>>> Glad to hear, Chunky.
>>>>>>>>
>>>>>>>> Out of curiosity, what errors did you get when using msck?
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Nov 6, 2012 at 5:14 AM, Chunky Gupta <
>>>>>>>> chunky.gupta@vizury.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Mark,
>>>>>>>>> I tried msck, but it is not working for me. I have written a
>>>>>>>>> python script to partition the data individually.
>>>>>>>>>
>>>>>>>>> Thank you Edward, Mark and Dean.
>>>>>>>>> Chunky.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Nov 5, 2012 at 11:08 PM, Mark Grover <
>>>>>>>>> grover.markgrover@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Chunky,
>>>>>>>>>> I have used "recover partitions" command on EMR, and that worked
>>>>>>>>>> fine.
>>>>>>>>>>
>>>>>>>>>> However, take a look at
>>>>>>>>>> https://issues.apache.org/jira/browse/HIVE-874. Seems like msck
>>>>>>>>>> command in Apache Hive does the same thing. Try it out and let us know it
>>>>>>>>>> goes.
>>>>>>>>>>
>>>>>>>>>> Mark
>>>>>>>>>>
>>>>>>>>>> On Mon, Nov 5, 2012 at 7:56 AM, Edward Capriolo <
>>>>>>>>>> edlinuxguru@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Recover partitions should work the same way for different file
>>>>>>>>>>> systems.
>>>>>>>>>>>
>>>>>>>>>>> Edward
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Nov 5, 2012 at 9:33 AM, Dean Wampler
>>>>>>>>>>> <de...@thinkbiganalytics.com> wrote:
>>>>>>>>>>> > Writing a script to add the external partitions individually
>>>>>>>>>>> is the only way
>>>>>>>>>>> > I know of.
>>>>>>>>>>> >
>>>>>>>>>>> > Sent from my rotary phone.
>>>>>>>>>>> >
>>>>>>>>>>> >
>>>>>>>>>>> > On Nov 5, 2012, at 8:19 AM, Chunky Gupta <
>>>>>>>>>>> chunky.gupta@vizury.com> wrote:
>>>>>>>>>>> >
>>>>>>>>>>> > Hi Dean,
>>>>>>>>>>> >
>>>>>>>>>>> > Actually I was having Hadoop and Hive cluster on EMR and I
>>>>>>>>>>> have S3 storage
>>>>>>>>>>> > containing logs which updates daily and having partition with
>>>>>>>>>>> date(dt). And
>>>>>>>>>>> > I was using this recover partition.
>>>>>>>>>>> > Now I wanted to shift to EC2 and have my own Hadoop and Hive
>>>>>>>>>>> cluster. So,
>>>>>>>>>>> > what is the alternate of using recover partition in this case,
>>>>>>>>>>> if you have
>>>>>>>>>>> > any idea ?
>>>>>>>>>>> > I found one way of individually partitioning all dates, so I
>>>>>>>>>>> have to write
>>>>>>>>>>> > script for that to do so for all dates. Is there any easiest
>>>>>>>>>>> way other than
>>>>>>>>>>> > this ?
>>>>>>>>>>> >
>>>>>>>>>>> > Thanks,
>>>>>>>>>>> > Chunky
>>>>>>>>>>> >
>>>>>>>>>>> >
>>>>>>>>>>> >
>>>>>>>>>>> > On Mon, Nov 5, 2012 at 6:28 PM, Dean Wampler
>>>>>>>>>>> > <de...@thinkbiganalytics.com> wrote:
>>>>>>>>>>> >>
>>>>>>>>>>> >> The RECOVER PARTITIONS is an enhancement added by Amazon to
>>>>>>>>>>> their version
>>>>>>>>>>> >> of Hive.
>>>>>>>>>>> >>
>>>>>>>>>>> >>
>>>>>>>>>>> >>
>>>>>>>>>>> http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/emr-hive-additional-features.html
>>>>>>>>>>> >>
>>>>>>>>>>> >> <shameless-plus>
>>>>>>>>>>> >>   Chapter 21 of Programming Hive discusses this feature and
>>>>>>>>>>> other aspects
>>>>>>>>>>> >> of using Hive in EMR.
>>>>>>>>>>> >> </shameless-plug>
>>>>>>>>>>> >>
>>>>>>>>>>> >> dean
>>>>>>>>>>> >>
>>>>>>>>>>> >>
>>>>>>>>>>> >> On Mon, Nov 5, 2012 at 5:34 AM, Chunky Gupta <
>>>>>>>>>>> chunky.gupta@vizury.com>
>>>>>>>>>>> >> wrote:
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> Hi,
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> I am having a cluster setup on EC2 with Hadoop version
>>>>>>>>>>> 0.20.2 and Hive
>>>>>>>>>>> >>> version 0.8.1 (I configured everything) . I have created a
>>>>>>>>>>> table using :-
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> CREATE EXTERNAL TABLE XXX ( YYY )PARTITIONED BY ( ZZZ )ROW
>>>>>>>>>>> FORMAT
>>>>>>>>>>> >>> DELIMITED FIELDS TERMINATED BY 'WWW' LOCATION
>>>>>>>>>>> 's3://my-location/data/';
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> Now I am trying to recover partition using :-
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> ALTER TABLE XXX RECOVER PARTITIONS;
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> but I am getting this error :- "FAILED: Parse Error: line
>>>>>>>>>>> 1:12 cannot
>>>>>>>>>>> >>> recognize input near 'XXX' 'RECOVER' 'PARTITIONS' in alter
>>>>>>>>>>> table statement"
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> Doing same steps on a cluster setup on EMR with Hadoop
>>>>>>>>>>> version 1.0.3 and
>>>>>>>>>>> >>> Hive version 0.8.1 (Configured by EMR), works fine.
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> So is this a version issue or am I missing some
>>>>>>>>>>> configuration changes in
>>>>>>>>>>> >>> EC2 setup ?
>>>>>>>>>>> >>> I am not able to find exact solution for this problem on
>>>>>>>>>>> internet. Please
>>>>>>>>>>> >>> help me.
>>>>>>>>>>> >>>
>>>>>>>>>>> >>> Thanks,
>>>>>>>>>>> >>> Chunky.
>>>>>>>>>>> >>>
>>>>>>>>>>> >>>
>>>>>>>>>>> >>>
>>>>>>>>>>> >>
>>>>>>>>>>> >>
>>>>>>>>>>> >>
>>>>>>>>>>> >> --
>>>>>>>>>>> >> Dean Wampler, Ph.D.
>>>>>>>>>>> >> thinkbiganalytics.com
>>>>>>>>>>> >> +1-312-339-1330
>>>>>>>>>>> >>
>>>>>>>>>>> >>
>>>>>>>>>>> >
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>


-- 
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330

Re: Alter table is giving error

Posted by Mark Grover <gr...@gmail.com>.

Chunky,
You have an external table that points at the location s3://location/

No need to load the data. All files (or partitions folders) under
s3://location/ should be available via the table.
Just run your queries on it.

Load data will move the data from one HDFS location to another. You don't
need/want to do that in this case.

Mark

On Tue, Nov 27, 2012 at 12:18 PM, Chunky Gupta <ch...@vizury.com>wrote:

> Hi,
>
> Now when I am trying to load a csv file to any table I created, its not
> working.
>
> I created a table :-
> CREATE EXTERNAL TABLE someidtable (
> someid STRING,
> )
> ROW FORMAT
> DELIMITED FIELDS TERMINATED BY '\t'
> LINES TERMINATED BY '\n'
> LOCATION 's3://location/';
>
> Then
>
> LOAD DATA INPATH 's3://location/someidexcel.csv' INTO TABLE someidtable;
>
> It gives this error:-
> "Error in semantic analysis: Line 1:17 Invalid path
> ''s3n://location/someidexcel.csv'': only "file" or "hdfs" file systems
> accepted"
>
> Please help me in resolving this issue.
> Thanks,
> Chunky.
>
>
> On Wed, Nov 7, 2012 at 6:43 PM, Chunky Gupta <ch...@vizury.com>wrote:
>
>> Okay Mark, I will be looking into this JIRA regularly.
>> Thanks again for helping.
>> Chunky.
>>
>>
>> On Wed, Nov 7, 2012 at 12:22 PM, Mark Grover <grover.markgrover@gmail.com
>> > wrote:
>>
>>> Chunky,
>>> I just tried it myself. It turns out that the directory you are adding
>>> as partition has to be empty for msck repair to work. This is obviously
>>> sub-optimal and there is a JIRA in place (
>>> https://issues.apache.org/jira/browse/HIVE-3231) to fix it.
>>>
>>> So, I'd suggest you keep an eye out for the next version for that fix to
>>> come in. In the meanwhile, run msck after you create your partition
>>> directory but before you populate your directory with data.
>>>
>>> Mark
>>>
>>>
>>> On Tue, Nov 6, 2012 at 10:33 PM, Chunky Gupta <ch...@vizury.com>wrote:
>>>
>>>> Hi Mark,
>>>> Sorry, I forgot to mention. I have also tried
>>>>                 msck repair table <Table name>;
>>>> and same output I got which I got from msck only.
>>>> Do I need to do any other settings for this to work, because I have
>>>> prepared Hadoop and Hive setup from start on EC2.
>>>>
>>>> Thanks,
>>>> Chunky.
>>>>
>>>>
>>>>
>>>> On Wed, Nov 7, 2012 at 11:58 AM, Mark Grover <
>>>> grover.markgrover@gmail.com> wrote:
>>>>
>>>>> Chunky,
>>>>> You should have run:
>>>>> msck repair table <Table name>;
>>>>>
>>>>> Sorry, I should have made it clear in my last reply. I have added an
>>>>> entry to Hive wiki for benefit of others:
>>>>>
>>>>> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Recoverpartitions
>>>>>
>>>>> Mark
>>>>>
>>>>>
>>>>> On Tue, Nov 6, 2012 at 9:55 PM, Chunky Gupta <ch...@vizury.com>wrote:
>>>>>
>>>>>> Hi Mark,
>>>>>> I didn't get any error.
>>>>>> I ran this on hive console:-
>>>>>>          "msck table Table_Name;"
>>>>>> It says Ok and showed the execution time as 1.050 sec.
>>>>>> But when I checked partitions for table using
>>>>>>           "show partitions Table_Name;"
>>>>>> It didn't show me any partitions.
>>>>>>
>>>>>> Thanks,
>>>>>> Chunky.
>>>>>>
>>>>>>
>>>>>> On Tue, Nov 6, 2012 at 10:38 PM, Mark Grover <
>>>>>> grover.markgrover@gmail.com> wrote:
>>>>>>
>>>>>>> Glad to hear, Chunky.
>>>>>>>
>>>>>>> Out of curiosity, what errors did you get when using msck?
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Nov 6, 2012 at 5:14 AM, Chunky Gupta <
>>>>>>> chunky.gupta@vizury.com> wrote:
>>>>>>>
>>>>>>>> Hi Mark,
>>>>>>>> I tried msck, but it is not working for me. I have written a python
>>>>>>>> script to partition the data individually.
>>>>>>>>
>>>>>>>> Thank you Edward, Mark and Dean.
>>>>>>>> Chunky.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Nov 5, 2012 at 11:08 PM, Mark Grover <
>>>>>>>> grover.markgrover@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Chunky,
>>>>>>>>> I have used "recover partitions" command on EMR, and that worked
>>>>>>>>> fine.
>>>>>>>>>
>>>>>>>>> However, take a look at
>>>>>>>>> https://issues.apache.org/jira/browse/HIVE-874. Seems like msck
>>>>>>>>> command in Apache Hive does the same thing. Try it out and let us know it
>>>>>>>>> goes.
>>>>>>>>>
>>>>>>>>> Mark
>>>>>>>>>
>>>>>>>>> On Mon, Nov 5, 2012 at 7:56 AM, Edward Capriolo <
>>>>>>>>> edlinuxguru@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Recover partitions should work the same way for different file
>>>>>>>>>> systems.
>>>>>>>>>>
>>>>>>>>>> Edward
>>>>>>>>>>
>>>>>>>>>> On Mon, Nov 5, 2012 at 9:33 AM, Dean Wampler
>>>>>>>>>> <de...@thinkbiganalytics.com> wrote:
>>>>>>>>>> > Writing a script to add the external partitions individually is
>>>>>>>>>> the only way
>>>>>>>>>> > I know of.
>>>>>>>>>> >
>>>>>>>>>> > Sent from my rotary phone.
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> > On Nov 5, 2012, at 8:19 AM, Chunky Gupta <
>>>>>>>>>> chunky.gupta@vizury.com> wrote:
>>>>>>>>>> >
>>>>>>>>>> > Hi Dean,
>>>>>>>>>> >
>>>>>>>>>> > Actually I was having Hadoop and Hive cluster on EMR and I have
>>>>>>>>>> S3 storage
>>>>>>>>>> > containing logs which updates daily and having partition with
>>>>>>>>>> date(dt). And
>>>>>>>>>> > I was using this recover partition.
>>>>>>>>>> > Now I wanted to shift to EC2 and have my own Hadoop and Hive
>>>>>>>>>> cluster. So,
>>>>>>>>>> > what is the alternate of using recover partition in this case,
>>>>>>>>>> if you have
>>>>>>>>>> > any idea ?
>>>>>>>>>> > I found one way of individually partitioning all dates, so I
>>>>>>>>>> have to write
>>>>>>>>>> > script for that to do so for all dates. Is there any easiest
>>>>>>>>>> way other than
>>>>>>>>>> > this ?
>>>>>>>>>> >
>>>>>>>>>> > Thanks,
>>>>>>>>>> > Chunky
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> > On Mon, Nov 5, 2012 at 6:28 PM, Dean Wampler
>>>>>>>>>> > <de...@thinkbiganalytics.com> wrote:
>>>>>>>>>> >>
>>>>>>>>>> >> The RECOVER PARTITIONS is an enhancement added by Amazon to
>>>>>>>>>> their version
>>>>>>>>>> >> of Hive.
>>>>>>>>>> >>
>>>>>>>>>> >>
>>>>>>>>>> >>
>>>>>>>>>> http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/emr-hive-additional-features.html
>>>>>>>>>> >>
>>>>>>>>>> >> <shameless-plus>
>>>>>>>>>> >>   Chapter 21 of Programming Hive discusses this feature and
>>>>>>>>>> other aspects
>>>>>>>>>> >> of using Hive in EMR.
>>>>>>>>>> >> </shameless-plug>
>>>>>>>>>> >>
>>>>>>>>>> >> dean
>>>>>>>>>> >>
>>>>>>>>>> >>
>>>>>>>>>> >> On Mon, Nov 5, 2012 at 5:34 AM, Chunky Gupta <
>>>>>>>>>> chunky.gupta@vizury.com>
>>>>>>>>>> >> wrote:
>>>>>>>>>> >>>
>>>>>>>>>> >>> Hi,
>>>>>>>>>> >>>
>>>>>>>>>> >>> I am having a cluster setup on EC2 with Hadoop version 0.20.2
>>>>>>>>>> and Hive
>>>>>>>>>> >>> version 0.8.1 (I configured everything) . I have created a
>>>>>>>>>> table using :-
>>>>>>>>>> >>>
>>>>>>>>>> >>> CREATE EXTERNAL TABLE XXX ( YYY )PARTITIONED BY ( ZZZ )ROW
>>>>>>>>>> FORMAT
>>>>>>>>>> >>> DELIMITED FIELDS TERMINATED BY 'WWW' LOCATION
>>>>>>>>>> 's3://my-location/data/';
>>>>>>>>>> >>>
>>>>>>>>>> >>> Now I am trying to recover partition using :-
>>>>>>>>>> >>>
>>>>>>>>>> >>> ALTER TABLE XXX RECOVER PARTITIONS;
>>>>>>>>>> >>>
>>>>>>>>>> >>> but I am getting this error :- "FAILED: Parse Error: line
>>>>>>>>>> 1:12 cannot
>>>>>>>>>> >>> recognize input near 'XXX' 'RECOVER' 'PARTITIONS' in alter
>>>>>>>>>> table statement"
>>>>>>>>>> >>>
>>>>>>>>>> >>> Doing same steps on a cluster setup on EMR with Hadoop
>>>>>>>>>> version 1.0.3 and
>>>>>>>>>> >>> Hive version 0.8.1 (Configured by EMR), works fine.
>>>>>>>>>> >>>
>>>>>>>>>> >>> So is this a version issue or am I missing some configuration
>>>>>>>>>> changes in
>>>>>>>>>> >>> EC2 setup ?
>>>>>>>>>> >>> I am not able to find exact solution for this problem on
>>>>>>>>>> internet. Please
>>>>>>>>>> >>> help me.
>>>>>>>>>> >>>
>>>>>>>>>> >>> Thanks,
>>>>>>>>>> >>> Chunky.
>>>>>>>>>> >>>
>>>>>>>>>> >>>
>>>>>>>>>> >>>
>>>>>>>>>> >>
>>>>>>>>>> >>
>>>>>>>>>> >>
>>>>>>>>>> >> --
>>>>>>>>>> >> Dean Wampler, Ph.D.
>>>>>>>>>> >> thinkbiganalytics.com
>>>>>>>>>> >> +1-312-339-1330
>>>>>>>>>> >>
>>>>>>>>>> >>
>>>>>>>>>> >
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Alter table is giving error

Posted by Chunky Gupta <ch...@vizury.com>.

Hi,

Now when I am trying to load a csv file to any table I created, its not
working.

I created a table :-
CREATE EXTERNAL TABLE someidtable (
someid STRING,
)
ROW FORMAT
DELIMITED FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
LOCATION 's3://location/';

Then

LOAD DATA INPATH 's3://location/someidexcel.csv' INTO TABLE someidtable;

It gives this error:-
"Error in semantic analysis: Line 1:17 Invalid path
''s3n://location/someidexcel.csv'': only "file" or "hdfs" file systems
accepted"

Please help me in resolving this issue.
Thanks,
Chunky.

On Wed, Nov 7, 2012 at 6:43 PM, Chunky Gupta <ch...@vizury.com>wrote:

> Okay Mark, I will be looking into this JIRA regularly.
> Thanks again for helping.
> Chunky.
>
>
> On Wed, Nov 7, 2012 at 12:22 PM, Mark Grover <gr...@gmail.com>wrote:
>
>> Chunky,
>> I just tried it myself. It turns out that the directory you are adding as
>> partition has to be empty for msck repair to work. This is obviously
>> sub-optimal and there is a JIRA in place (
>> https://issues.apache.org/jira/browse/HIVE-3231) to fix it.
>>
>> So, I'd suggest you keep an eye out for the next version for that fix to
>> come in. In the meanwhile, run msck after you create your partition
>> directory but before you populate your directory with data.
>>
>> Mark
>>
>>
>> On Tue, Nov 6, 2012 at 10:33 PM, Chunky Gupta <ch...@vizury.com>wrote:
>>
>>> Hi Mark,
>>> Sorry, I forgot to mention. I have also tried
>>>                 msck repair table <Table name>;
>>> and same output I got which I got from msck only.
>>> Do I need to do any other settings for this to work, because I have
>>> prepared Hadoop and Hive setup from start on EC2.
>>>
>>> Thanks,
>>> Chunky.
>>>
>>>
>>>
>>> On Wed, Nov 7, 2012 at 11:58 AM, Mark Grover <
>>> grover.markgrover@gmail.com> wrote:
>>>
>>>> Chunky,
>>>> You should have run:
>>>> msck repair table <Table name>;
>>>>
>>>> Sorry, I should have made it clear in my last reply. I have added an
>>>> entry to Hive wiki for benefit of others:
>>>>
>>>> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Recoverpartitions
>>>>
>>>> Mark
>>>>
>>>>
>>>> On Tue, Nov 6, 2012 at 9:55 PM, Chunky Gupta <ch...@vizury.com>wrote:
>>>>
>>>>> Hi Mark,
>>>>> I didn't get any error.
>>>>> I ran this on hive console:-
>>>>>          "msck table Table_Name;"
>>>>> It says Ok and showed the execution time as 1.050 sec.
>>>>> But when I checked partitions for table using
>>>>>           "show partitions Table_Name;"
>>>>> It didn't show me any partitions.
>>>>>
>>>>> Thanks,
>>>>> Chunky.
>>>>>
>>>>>
>>>>> On Tue, Nov 6, 2012 at 10:38 PM, Mark Grover <
>>>>> grover.markgrover@gmail.com> wrote:
>>>>>
>>>>>> Glad to hear, Chunky.
>>>>>>
>>>>>> Out of curiosity, what errors did you get when using msck?
>>>>>>
>>>>>>
>>>>>> On Tue, Nov 6, 2012 at 5:14 AM, Chunky Gupta <chunky.gupta@vizury.com
>>>>>> > wrote:
>>>>>>
>>>>>>> Hi Mark,
>>>>>>> I tried msck, but it is not working for me. I have written a python
>>>>>>> script to partition the data individually.
>>>>>>>
>>>>>>> Thank you Edward, Mark and Dean.
>>>>>>> Chunky.
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Nov 5, 2012 at 11:08 PM, Mark Grover <
>>>>>>> grover.markgrover@gmail.com> wrote:
>>>>>>>
>>>>>>>> Chunky,
>>>>>>>> I have used "recover partitions" command on EMR, and that worked
>>>>>>>> fine.
>>>>>>>>
>>>>>>>> However, take a look at
>>>>>>>> https://issues.apache.org/jira/browse/HIVE-874. Seems like msck
>>>>>>>> command in Apache Hive does the same thing. Try it out and let us know it
>>>>>>>> goes.
>>>>>>>>
>>>>>>>> Mark
>>>>>>>>
>>>>>>>> On Mon, Nov 5, 2012 at 7:56 AM, Edward Capriolo <
>>>>>>>> edlinuxguru@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Recover partitions should work the same way for different file
>>>>>>>>> systems.
>>>>>>>>>
>>>>>>>>> Edward
>>>>>>>>>
>>>>>>>>> On Mon, Nov 5, 2012 at 9:33 AM, Dean Wampler
>>>>>>>>> <de...@thinkbiganalytics.com> wrote:
>>>>>>>>> > Writing a script to add the external partitions individually is
>>>>>>>>> the only way
>>>>>>>>> > I know of.
>>>>>>>>> >
>>>>>>>>> > Sent from my rotary phone.
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > On Nov 5, 2012, at 8:19 AM, Chunky Gupta <
>>>>>>>>> chunky.gupta@vizury.com> wrote:
>>>>>>>>> >
>>>>>>>>> > Hi Dean,
>>>>>>>>> >
>>>>>>>>> > Actually I was having Hadoop and Hive cluster on EMR and I have
>>>>>>>>> S3 storage
>>>>>>>>> > containing logs which updates daily and having partition with
>>>>>>>>> date(dt). And
>>>>>>>>> > I was using this recover partition.
>>>>>>>>> > Now I wanted to shift to EC2 and have my own Hadoop and Hive
>>>>>>>>> cluster. So,
>>>>>>>>> > what is the alternate of using recover partition in this case,
>>>>>>>>> if you have
>>>>>>>>> > any idea ?
>>>>>>>>> > I found one way of individually partitioning all dates, so I
>>>>>>>>> have to write
>>>>>>>>> > script for that to do so for all dates. Is there any easiest way
>>>>>>>>> other than
>>>>>>>>> > this ?
>>>>>>>>> >
>>>>>>>>> > Thanks,
>>>>>>>>> > Chunky
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > On Mon, Nov 5, 2012 at 6:28 PM, Dean Wampler
>>>>>>>>> > <de...@thinkbiganalytics.com> wrote:
>>>>>>>>> >>
>>>>>>>>> >> The RECOVER PARTITIONS is an enhancement added by Amazon to
>>>>>>>>> their version
>>>>>>>>> >> of Hive.
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/emr-hive-additional-features.html
>>>>>>>>> >>
>>>>>>>>> >> <shameless-plus>
>>>>>>>>> >>   Chapter 21 of Programming Hive discusses this feature and
>>>>>>>>> other aspects
>>>>>>>>> >> of using Hive in EMR.
>>>>>>>>> >> </shameless-plug>
>>>>>>>>> >>
>>>>>>>>> >> dean
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >> On Mon, Nov 5, 2012 at 5:34 AM, Chunky Gupta <
>>>>>>>>> chunky.gupta@vizury.com>
>>>>>>>>> >> wrote:
>>>>>>>>> >>>
>>>>>>>>> >>> Hi,
>>>>>>>>> >>>
>>>>>>>>> >>> I am having a cluster setup on EC2 with Hadoop version 0.20.2
>>>>>>>>> and Hive
>>>>>>>>> >>> version 0.8.1 (I configured everything) . I have created a
>>>>>>>>> table using :-
>>>>>>>>> >>>
>>>>>>>>> >>> CREATE EXTERNAL TABLE XXX ( YYY )PARTITIONED BY ( ZZZ )ROW
>>>>>>>>> FORMAT
>>>>>>>>> >>> DELIMITED FIELDS TERMINATED BY 'WWW' LOCATION
>>>>>>>>> 's3://my-location/data/';
>>>>>>>>> >>>
>>>>>>>>> >>> Now I am trying to recover partition using :-
>>>>>>>>> >>>
>>>>>>>>> >>> ALTER TABLE XXX RECOVER PARTITIONS;
>>>>>>>>> >>>
>>>>>>>>> >>> but I am getting this error :- "FAILED: Parse Error: line 1:12
>>>>>>>>> cannot
>>>>>>>>> >>> recognize input near 'XXX' 'RECOVER' 'PARTITIONS' in alter
>>>>>>>>> table statement"
>>>>>>>>> >>>
>>>>>>>>> >>> Doing same steps on a cluster setup on EMR with Hadoop version
>>>>>>>>> 1.0.3 and
>>>>>>>>> >>> Hive version 0.8.1 (Configured by EMR), works fine.
>>>>>>>>> >>>
>>>>>>>>> >>> So is this a version issue or am I missing some configuration
>>>>>>>>> changes in
>>>>>>>>> >>> EC2 setup ?
>>>>>>>>> >>> I am not able to find exact solution for this problem on
>>>>>>>>> internet. Please
>>>>>>>>> >>> help me.
>>>>>>>>> >>>
>>>>>>>>> >>> Thanks,
>>>>>>>>> >>> Chunky.
>>>>>>>>> >>>
>>>>>>>>> >>>
>>>>>>>>> >>>
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >> --
>>>>>>>>> >> Dean Wampler, Ph.D.
>>>>>>>>> >> thinkbiganalytics.com
>>>>>>>>> >> +1-312-339-1330
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Alter table is giving error

Posted by Chunky Gupta <ch...@vizury.com>.

Okay Mark, I will be looking into this JIRA regularly.
Thanks again for helping.
Chunky.

On Wed, Nov 7, 2012 at 12:22 PM, Mark Grover <gr...@gmail.com>wrote:

> Chunky,
> I just tried it myself. It turns out that the directory you are adding as
> partition has to be empty for msck repair to work. This is obviously
> sub-optimal and there is a JIRA in place (
> https://issues.apache.org/jira/browse/HIVE-3231) to fix it.
>
> So, I'd suggest you keep an eye out for the next version for that fix to
> come in. In the meanwhile, run msck after you create your partition
> directory but before you populate your directory with data.
>
> Mark
>
>
> On Tue, Nov 6, 2012 at 10:33 PM, Chunky Gupta <ch...@vizury.com>wrote:
>
>> Hi Mark,
>> Sorry, I forgot to mention. I have also tried
>>                 msck repair table <Table name>;
>> and same output I got which I got from msck only.
>> Do I need to do any other settings for this to work, because I have
>> prepared Hadoop and Hive setup from start on EC2.
>>
>> Thanks,
>> Chunky.
>>
>>
>>
>> On Wed, Nov 7, 2012 at 11:58 AM, Mark Grover <grover.markgrover@gmail.com
>> > wrote:
>>
>>> Chunky,
>>> You should have run:
>>> msck repair table <Table name>;
>>>
>>> Sorry, I should have made it clear in my last reply. I have added an
>>> entry to Hive wiki for benefit of others:
>>>
>>> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Recoverpartitions
>>>
>>> Mark
>>>
>>>
>>> On Tue, Nov 6, 2012 at 9:55 PM, Chunky Gupta <ch...@vizury.com>wrote:
>>>
>>>> Hi Mark,
>>>> I didn't get any error.
>>>> I ran this on hive console:-
>>>>          "msck table Table_Name;"
>>>> It says Ok and showed the execution time as 1.050 sec.
>>>> But when I checked partitions for table using
>>>>           "show partitions Table_Name;"
>>>> It didn't show me any partitions.
>>>>
>>>> Thanks,
>>>> Chunky.
>>>>
>>>>
>>>> On Tue, Nov 6, 2012 at 10:38 PM, Mark Grover <
>>>> grover.markgrover@gmail.com> wrote:
>>>>
>>>>> Glad to hear, Chunky.
>>>>>
>>>>> Out of curiosity, what errors did you get when using msck?
>>>>>
>>>>>
>>>>> On Tue, Nov 6, 2012 at 5:14 AM, Chunky Gupta <ch...@vizury.com>wrote:
>>>>>
>>>>>> Hi Mark,
>>>>>> I tried msck, but it is not working for me. I have written a python
>>>>>> script to partition the data individually.
>>>>>>
>>>>>> Thank you Edward, Mark and Dean.
>>>>>> Chunky.
>>>>>>
>>>>>>
>>>>>> On Mon, Nov 5, 2012 at 11:08 PM, Mark Grover <
>>>>>> grover.markgrover@gmail.com> wrote:
>>>>>>
>>>>>>> Chunky,
>>>>>>> I have used "recover partitions" command on EMR, and that worked
>>>>>>> fine.
>>>>>>>
>>>>>>> However, take a look at
>>>>>>> https://issues.apache.org/jira/browse/HIVE-874. Seems like msck
>>>>>>> command in Apache Hive does the same thing. Try it out and let us know it
>>>>>>> goes.
>>>>>>>
>>>>>>> Mark
>>>>>>>
>>>>>>> On Mon, Nov 5, 2012 at 7:56 AM, Edward Capriolo <
>>>>>>> edlinuxguru@gmail.com> wrote:
>>>>>>>
>>>>>>>> Recover partitions should work the same way for different file
>>>>>>>> systems.
>>>>>>>>
>>>>>>>> Edward
>>>>>>>>
>>>>>>>> On Mon, Nov 5, 2012 at 9:33 AM, Dean Wampler
>>>>>>>> <de...@thinkbiganalytics.com> wrote:
>>>>>>>> > Writing a script to add the external partitions individually is
>>>>>>>> the only way
>>>>>>>> > I know of.
>>>>>>>> >
>>>>>>>> > Sent from my rotary phone.
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > On Nov 5, 2012, at 8:19 AM, Chunky Gupta <ch...@vizury.com>
>>>>>>>> wrote:
>>>>>>>> >
>>>>>>>> > Hi Dean,
>>>>>>>> >
>>>>>>>> > Actually I was having Hadoop and Hive cluster on EMR and I have
>>>>>>>> S3 storage
>>>>>>>> > containing logs which updates daily and having partition with
>>>>>>>> date(dt). And
>>>>>>>> > I was using this recover partition.
>>>>>>>> > Now I wanted to shift to EC2 and have my own Hadoop and Hive
>>>>>>>> cluster. So,
>>>>>>>> > what is the alternate of using recover partition in this case, if
>>>>>>>> you have
>>>>>>>> > any idea ?
>>>>>>>> > I found one way of individually partitioning all dates, so I have
>>>>>>>> to write
>>>>>>>> > script for that to do so for all dates. Is there any easiest way
>>>>>>>> other than
>>>>>>>> > this ?
>>>>>>>> >
>>>>>>>> > Thanks,
>>>>>>>> > Chunky
>>>>>>>> >
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > On Mon, Nov 5, 2012 at 6:28 PM, Dean Wampler
>>>>>>>> > <de...@thinkbiganalytics.com> wrote:
>>>>>>>> >>
>>>>>>>> >> The RECOVER PARTITIONS is an enhancement added by Amazon to
>>>>>>>> their version
>>>>>>>> >> of Hive.
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/emr-hive-additional-features.html
>>>>>>>> >>
>>>>>>>> >> <shameless-plus>
>>>>>>>> >>   Chapter 21 of Programming Hive discusses this feature and
>>>>>>>> other aspects
>>>>>>>> >> of using Hive in EMR.
>>>>>>>> >> </shameless-plug>
>>>>>>>> >>
>>>>>>>> >> dean
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >> On Mon, Nov 5, 2012 at 5:34 AM, Chunky Gupta <
>>>>>>>> chunky.gupta@vizury.com>
>>>>>>>> >> wrote:
>>>>>>>> >>>
>>>>>>>> >>> Hi,
>>>>>>>> >>>
>>>>>>>> >>> I am having a cluster setup on EC2 with Hadoop version 0.20.2
>>>>>>>> and Hive
>>>>>>>> >>> version 0.8.1 (I configured everything) . I have created a
>>>>>>>> table using :-
>>>>>>>> >>>
>>>>>>>> >>> CREATE EXTERNAL TABLE XXX ( YYY )PARTITIONED BY ( ZZZ )ROW
>>>>>>>> FORMAT
>>>>>>>> >>> DELIMITED FIELDS TERMINATED BY 'WWW' LOCATION
>>>>>>>> 's3://my-location/data/';
>>>>>>>> >>>
>>>>>>>> >>> Now I am trying to recover partition using :-
>>>>>>>> >>>
>>>>>>>> >>> ALTER TABLE XXX RECOVER PARTITIONS;
>>>>>>>> >>>
>>>>>>>> >>> but I am getting this error :- "FAILED: Parse Error: line 1:12
>>>>>>>> cannot
>>>>>>>> >>> recognize input near 'XXX' 'RECOVER' 'PARTITIONS' in alter
>>>>>>>> table statement"
>>>>>>>> >>>
>>>>>>>> >>> Doing same steps on a cluster setup on EMR with Hadoop version
>>>>>>>> 1.0.3 and
>>>>>>>> >>> Hive version 0.8.1 (Configured by EMR), works fine.
>>>>>>>> >>>
>>>>>>>> >>> So is this a version issue or am I missing some configuration
>>>>>>>> changes in
>>>>>>>> >>> EC2 setup ?
>>>>>>>> >>> I am not able to find exact solution for this problem on
>>>>>>>> internet. Please
>>>>>>>> >>> help me.
>>>>>>>> >>>
>>>>>>>> >>> Thanks,
>>>>>>>> >>> Chunky.
>>>>>>>> >>>
>>>>>>>> >>>
>>>>>>>> >>>
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >> --
>>>>>>>> >> Dean Wampler, Ph.D.
>>>>>>>> >> thinkbiganalytics.com
>>>>>>>> >> +1-312-339-1330
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Alter table is giving error

Posted by Mark Grover <gr...@gmail.com>.

Chunky,
I just tried it myself. It turns out that the directory you are adding as
partition has to be empty for msck repair to work. This is obviously
sub-optimal and there is a JIRA in place (
https://issues.apache.org/jira/browse/HIVE-3231) to fix it.

So, I'd suggest you keep an eye out for the next version for that fix to
come in. In the meanwhile, run msck after you create your partition
directory but before you populate your directory with data.

Mark

On Tue, Nov 6, 2012 at 10:33 PM, Chunky Gupta <ch...@vizury.com>wrote:

> Hi Mark,
> Sorry, I forgot to mention. I have also tried
>                 msck repair table <Table name>;
> and same output I got which I got from msck only.
> Do I need to do any other settings for this to work, because I have
> prepared Hadoop and Hive setup from start on EC2.
>
> Thanks,
> Chunky.
>
>
>
> On Wed, Nov 7, 2012 at 11:58 AM, Mark Grover <gr...@gmail.com>wrote:
>
>> Chunky,
>> You should have run:
>> msck repair table <Table name>;
>>
>> Sorry, I should have made it clear in my last reply. I have added an
>> entry to Hive wiki for benefit of others:
>>
>> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Recoverpartitions
>>
>> Mark
>>
>>
>> On Tue, Nov 6, 2012 at 9:55 PM, Chunky Gupta <ch...@vizury.com>wrote:
>>
>>> Hi Mark,
>>> I didn't get any error.
>>> I ran this on hive console:-
>>>          "msck table Table_Name;"
>>> It says Ok and showed the execution time as 1.050 sec.
>>> But when I checked partitions for table using
>>>           "show partitions Table_Name;"
>>> It didn't show me any partitions.
>>>
>>> Thanks,
>>> Chunky.
>>>
>>>
>>> On Tue, Nov 6, 2012 at 10:38 PM, Mark Grover <
>>> grover.markgrover@gmail.com> wrote:
>>>
>>>> Glad to hear, Chunky.
>>>>
>>>> Out of curiosity, what errors did you get when using msck?
>>>>
>>>>
>>>> On Tue, Nov 6, 2012 at 5:14 AM, Chunky Gupta <ch...@vizury.com>wrote:
>>>>
>>>>> Hi Mark,
>>>>> I tried msck, but it is not working for me. I have written a python
>>>>> script to partition the data individually.
>>>>>
>>>>> Thank you Edward, Mark and Dean.
>>>>> Chunky.
>>>>>
>>>>>
>>>>> On Mon, Nov 5, 2012 at 11:08 PM, Mark Grover <
>>>>> grover.markgrover@gmail.com> wrote:
>>>>>
>>>>>> Chunky,
>>>>>> I have used "recover partitions" command on EMR, and that worked fine.
>>>>>>
>>>>>> However, take a look at
>>>>>> https://issues.apache.org/jira/browse/HIVE-874. Seems like msck
>>>>>> command in Apache Hive does the same thing. Try it out and let us know it
>>>>>> goes.
>>>>>>
>>>>>> Mark
>>>>>>
>>>>>> On Mon, Nov 5, 2012 at 7:56 AM, Edward Capriolo <
>>>>>> edlinuxguru@gmail.com> wrote:
>>>>>>
>>>>>>> Recover partitions should work the same way for different file
>>>>>>> systems.
>>>>>>>
>>>>>>> Edward
>>>>>>>
>>>>>>> On Mon, Nov 5, 2012 at 9:33 AM, Dean Wampler
>>>>>>> <de...@thinkbiganalytics.com> wrote:
>>>>>>> > Writing a script to add the external partitions individually is
>>>>>>> the only way
>>>>>>> > I know of.
>>>>>>> >
>>>>>>> > Sent from my rotary phone.
>>>>>>> >
>>>>>>> >
>>>>>>> > On Nov 5, 2012, at 8:19 AM, Chunky Gupta <ch...@vizury.com>
>>>>>>> wrote:
>>>>>>> >
>>>>>>> > Hi Dean,
>>>>>>> >
>>>>>>> > Actually I was having Hadoop and Hive cluster on EMR and I have S3
>>>>>>> storage
>>>>>>> > containing logs which updates daily and having partition with
>>>>>>> date(dt). And
>>>>>>> > I was using this recover partition.
>>>>>>> > Now I wanted to shift to EC2 and have my own Hadoop and Hive
>>>>>>> cluster. So,
>>>>>>> > what is the alternate of using recover partition in this case, if
>>>>>>> you have
>>>>>>> > any idea ?
>>>>>>> > I found one way of individually partitioning all dates, so I have
>>>>>>> to write
>>>>>>> > script for that to do so for all dates. Is there any easiest way
>>>>>>> other than
>>>>>>> > this ?
>>>>>>> >
>>>>>>> > Thanks,
>>>>>>> > Chunky
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > On Mon, Nov 5, 2012 at 6:28 PM, Dean Wampler
>>>>>>> > <de...@thinkbiganalytics.com> wrote:
>>>>>>> >>
>>>>>>> >> The RECOVER PARTITIONS is an enhancement added by Amazon to their
>>>>>>> version
>>>>>>> >> of Hive.
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/emr-hive-additional-features.html
>>>>>>> >>
>>>>>>> >> <shameless-plus>
>>>>>>> >>   Chapter 21 of Programming Hive discusses this feature and other
>>>>>>> aspects
>>>>>>> >> of using Hive in EMR.
>>>>>>> >> </shameless-plug>
>>>>>>> >>
>>>>>>> >> dean
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> On Mon, Nov 5, 2012 at 5:34 AM, Chunky Gupta <
>>>>>>> chunky.gupta@vizury.com>
>>>>>>> >> wrote:
>>>>>>> >>>
>>>>>>> >>> Hi,
>>>>>>> >>>
>>>>>>> >>> I am having a cluster setup on EC2 with Hadoop version 0.20.2
>>>>>>> and Hive
>>>>>>> >>> version 0.8.1 (I configured everything) . I have created a table
>>>>>>> using :-
>>>>>>> >>>
>>>>>>> >>> CREATE EXTERNAL TABLE XXX ( YYY )PARTITIONED BY ( ZZZ )ROW FORMAT
>>>>>>> >>> DELIMITED FIELDS TERMINATED BY 'WWW' LOCATION
>>>>>>> 's3://my-location/data/';
>>>>>>> >>>
>>>>>>> >>> Now I am trying to recover partition using :-
>>>>>>> >>>
>>>>>>> >>> ALTER TABLE XXX RECOVER PARTITIONS;
>>>>>>> >>>
>>>>>>> >>> but I am getting this error :- "FAILED: Parse Error: line 1:12
>>>>>>> cannot
>>>>>>> >>> recognize input near 'XXX' 'RECOVER' 'PARTITIONS' in alter table
>>>>>>> statement"
>>>>>>> >>>
>>>>>>> >>> Doing same steps on a cluster setup on EMR with Hadoop version
>>>>>>> 1.0.3 and
>>>>>>> >>> Hive version 0.8.1 (Configured by EMR), works fine.
>>>>>>> >>>
>>>>>>> >>> So is this a version issue or am I missing some configuration
>>>>>>> changes in
>>>>>>> >>> EC2 setup ?
>>>>>>> >>> I am not able to find exact solution for this problem on
>>>>>>> internet. Please
>>>>>>> >>> help me.
>>>>>>> >>>
>>>>>>> >>> Thanks,
>>>>>>> >>> Chunky.
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> --
>>>>>>> >> Dean Wampler, Ph.D.
>>>>>>> >> thinkbiganalytics.com
>>>>>>> >> +1-312-339-1330
>>>>>>> >>
>>>>>>> >>
>>>>>>> >
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Alter table is giving error

Posted by Chunky Gupta <ch...@vizury.com>.

Hi Mark,
Sorry, I forgot to mention. I have also tried
                msck repair table <Table name>;
and same output I got which I got from msck only.
Do I need to do any other settings for this to work, because I have
prepared Hadoop and Hive setup from start on EC2.

Thanks,
Chunky.



On Wed, Nov 7, 2012 at 11:58 AM, Mark Grover <gr...@gmail.com>wrote:

> Chunky,
> You should have run:
> msck repair table <Table name>;
>
> Sorry, I should have made it clear in my last reply. I have added an entry
> to Hive wiki for benefit of others:
>
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Recoverpartitions
>
> Mark
>
>
> On Tue, Nov 6, 2012 at 9:55 PM, Chunky Gupta <ch...@vizury.com>wrote:
>
>> Hi Mark,
>> I didn't get any error.
>> I ran this on hive console:-
>>          "msck table Table_Name;"
>> It says Ok and showed the execution time as 1.050 sec.
>> But when I checked partitions for table using
>>           "show partitions Table_Name;"
>> It didn't show me any partitions.
>>
>> Thanks,
>> Chunky.
>>
>>
>> On Tue, Nov 6, 2012 at 10:38 PM, Mark Grover <grover.markgrover@gmail.com
>> > wrote:
>>
>>> Glad to hear, Chunky.
>>>
>>> Out of curiosity, what errors did you get when using msck?
>>>
>>>
>>> On Tue, Nov 6, 2012 at 5:14 AM, Chunky Gupta <ch...@vizury.com>wrote:
>>>
>>>> Hi Mark,
>>>> I tried msck, but it is not working for me. I have written a python
>>>> script to partition the data individually.
>>>>
>>>> Thank you Edward, Mark and Dean.
>>>> Chunky.
>>>>
>>>>
>>>> On Mon, Nov 5, 2012 at 11:08 PM, Mark Grover <
>>>> grover.markgrover@gmail.com> wrote:
>>>>
>>>>> Chunky,
>>>>> I have used "recover partitions" command on EMR, and that worked fine.
>>>>>
>>>>> However, take a look at https://issues.apache.org/jira/browse/HIVE-874. Seems
>>>>> like msck command in Apache Hive does the same thing. Try it out and let us
>>>>> know it goes.
>>>>>
>>>>> Mark
>>>>>
>>>>> On Mon, Nov 5, 2012 at 7:56 AM, Edward Capriolo <edlinuxguru@gmail.com
>>>>> > wrote:
>>>>>
>>>>>> Recover partitions should work the same way for different file
>>>>>> systems.
>>>>>>
>>>>>> Edward
>>>>>>
>>>>>> On Mon, Nov 5, 2012 at 9:33 AM, Dean Wampler
>>>>>> <de...@thinkbiganalytics.com> wrote:
>>>>>> > Writing a script to add the external partitions individually is the
>>>>>> only way
>>>>>> > I know of.
>>>>>> >
>>>>>> > Sent from my rotary phone.
>>>>>> >
>>>>>> >
>>>>>> > On Nov 5, 2012, at 8:19 AM, Chunky Gupta <ch...@vizury.com>
>>>>>> wrote:
>>>>>> >
>>>>>> > Hi Dean,
>>>>>> >
>>>>>> > Actually I was having Hadoop and Hive cluster on EMR and I have S3
>>>>>> storage
>>>>>> > containing logs which updates daily and having partition with
>>>>>> date(dt). And
>>>>>> > I was using this recover partition.
>>>>>> > Now I wanted to shift to EC2 and have my own Hadoop and Hive
>>>>>> cluster. So,
>>>>>> > what is the alternate of using recover partition in this case, if
>>>>>> you have
>>>>>> > any idea ?
>>>>>> > I found one way of individually partitioning all dates, so I have
>>>>>> to write
>>>>>> > script for that to do so for all dates. Is there any easiest way
>>>>>> other than
>>>>>> > this ?
>>>>>> >
>>>>>> > Thanks,
>>>>>> > Chunky
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > On Mon, Nov 5, 2012 at 6:28 PM, Dean Wampler
>>>>>> > <de...@thinkbiganalytics.com> wrote:
>>>>>> >>
>>>>>> >> The RECOVER PARTITIONS is an enhancement added by Amazon to their
>>>>>> version
>>>>>> >> of Hive.
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/emr-hive-additional-features.html
>>>>>> >>
>>>>>> >> <shameless-plus>
>>>>>> >>   Chapter 21 of Programming Hive discusses this feature and other
>>>>>> aspects
>>>>>> >> of using Hive in EMR.
>>>>>> >> </shameless-plug>
>>>>>> >>
>>>>>> >> dean
>>>>>> >>
>>>>>> >>
>>>>>> >> On Mon, Nov 5, 2012 at 5:34 AM, Chunky Gupta <
>>>>>> chunky.gupta@vizury.com>
>>>>>> >> wrote:
>>>>>> >>>
>>>>>> >>> Hi,
>>>>>> >>>
>>>>>> >>> I am having a cluster setup on EC2 with Hadoop version 0.20.2 and
>>>>>> Hive
>>>>>> >>> version 0.8.1 (I configured everything) . I have created a table
>>>>>> using :-
>>>>>> >>>
>>>>>> >>> CREATE EXTERNAL TABLE XXX ( YYY )PARTITIONED BY ( ZZZ )ROW FORMAT
>>>>>> >>> DELIMITED FIELDS TERMINATED BY 'WWW' LOCATION
>>>>>> 's3://my-location/data/';
>>>>>> >>>
>>>>>> >>> Now I am trying to recover partition using :-
>>>>>> >>>
>>>>>> >>> ALTER TABLE XXX RECOVER PARTITIONS;
>>>>>> >>>
>>>>>> >>> but I am getting this error :- "FAILED: Parse Error: line 1:12
>>>>>> cannot
>>>>>> >>> recognize input near 'XXX' 'RECOVER' 'PARTITIONS' in alter table
>>>>>> statement"
>>>>>> >>>
>>>>>> >>> Doing same steps on a cluster setup on EMR with Hadoop version
>>>>>> 1.0.3 and
>>>>>> >>> Hive version 0.8.1 (Configured by EMR), works fine.
>>>>>> >>>
>>>>>> >>> So is this a version issue or am I missing some configuration
>>>>>> changes in
>>>>>> >>> EC2 setup ?
>>>>>> >>> I am not able to find exact solution for this problem on
>>>>>> internet. Please
>>>>>> >>> help me.
>>>>>> >>>
>>>>>> >>> Thanks,
>>>>>> >>> Chunky.
>>>>>> >>>
>>>>>> >>>
>>>>>> >>>
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >> --
>>>>>> >> Dean Wampler, Ph.D.
>>>>>> >> thinkbiganalytics.com
>>>>>> >> +1-312-339-1330
>>>>>> >>
>>>>>> >>
>>>>>> >
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Alter table is giving error

Posted by Mark Grover <gr...@gmail.com>.

Chunky,
You should have run:
msck repair table <Table name>;

Sorry, I should have made it clear in my last reply. I have added an entry
to Hive wiki for benefit of others:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Recoverpartitions

Mark


On Tue, Nov 6, 2012 at 9:55 PM, Chunky Gupta <ch...@vizury.com>wrote:

> Hi Mark,
> I didn't get any error.
> I ran this on hive console:-
>          "msck table Table_Name;"
> It says Ok and showed the execution time as 1.050 sec.
> But when I checked partitions for table using
>           "show partitions Table_Name;"
> It didn't show me any partitions.
>
> Thanks,
> Chunky.
>
>
> On Tue, Nov 6, 2012 at 10:38 PM, Mark Grover <gr...@gmail.com>wrote:
>
>> Glad to hear, Chunky.
>>
>> Out of curiosity, what errors did you get when using msck?
>>
>>
>> On Tue, Nov 6, 2012 at 5:14 AM, Chunky Gupta <ch...@vizury.com>wrote:
>>
>>> Hi Mark,
>>> I tried msck, but it is not working for me. I have written a python
>>> script to partition the data individually.
>>>
>>> Thank you Edward, Mark and Dean.
>>> Chunky.
>>>
>>>
>>> On Mon, Nov 5, 2012 at 11:08 PM, Mark Grover <
>>> grover.markgrover@gmail.com> wrote:
>>>
>>>> Chunky,
>>>> I have used "recover partitions" command on EMR, and that worked fine.
>>>>
>>>> However, take a look at https://issues.apache.org/jira/browse/HIVE-874. Seems
>>>> like msck command in Apache Hive does the same thing. Try it out and let us
>>>> know it goes.
>>>>
>>>> Mark
>>>>
>>>> On Mon, Nov 5, 2012 at 7:56 AM, Edward Capriolo <ed...@gmail.com>wrote:
>>>>
>>>>> Recover partitions should work the same way for different file systems.
>>>>>
>>>>> Edward
>>>>>
>>>>> On Mon, Nov 5, 2012 at 9:33 AM, Dean Wampler
>>>>> <de...@thinkbiganalytics.com> wrote:
>>>>> > Writing a script to add the external partitions individually is the
>>>>> only way
>>>>> > I know of.
>>>>> >
>>>>> > Sent from my rotary phone.
>>>>> >
>>>>> >
>>>>> > On Nov 5, 2012, at 8:19 AM, Chunky Gupta <ch...@vizury.com>
>>>>> wrote:
>>>>> >
>>>>> > Hi Dean,
>>>>> >
>>>>> > Actually I was having Hadoop and Hive cluster on EMR and I have S3
>>>>> storage
>>>>> > containing logs which updates daily and having partition with
>>>>> date(dt). And
>>>>> > I was using this recover partition.
>>>>> > Now I wanted to shift to EC2 and have my own Hadoop and Hive
>>>>> cluster. So,
>>>>> > what is the alternate of using recover partition in this case, if
>>>>> you have
>>>>> > any idea ?
>>>>> > I found one way of individually partitioning all dates, so I have to
>>>>> write
>>>>> > script for that to do so for all dates. Is there any easiest way
>>>>> other than
>>>>> > this ?
>>>>> >
>>>>> > Thanks,
>>>>> > Chunky
>>>>> >
>>>>> >
>>>>> >
>>>>> > On Mon, Nov 5, 2012 at 6:28 PM, Dean Wampler
>>>>> > <de...@thinkbiganalytics.com> wrote:
>>>>> >>
>>>>> >> The RECOVER PARTITIONS is an enhancement added by Amazon to their
>>>>> version
>>>>> >> of Hive.
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/emr-hive-additional-features.html
>>>>> >>
>>>>> >> <shameless-plus>
>>>>> >>   Chapter 21 of Programming Hive discusses this feature and other
>>>>> aspects
>>>>> >> of using Hive in EMR.
>>>>> >> </shameless-plug>
>>>>> >>
>>>>> >> dean
>>>>> >>
>>>>> >>
>>>>> >> On Mon, Nov 5, 2012 at 5:34 AM, Chunky Gupta <
>>>>> chunky.gupta@vizury.com>
>>>>> >> wrote:
>>>>> >>>
>>>>> >>> Hi,
>>>>> >>>
>>>>> >>> I am having a cluster setup on EC2 with Hadoop version 0.20.2 and
>>>>> Hive
>>>>> >>> version 0.8.1 (I configured everything) . I have created a table
>>>>> using :-
>>>>> >>>
>>>>> >>> CREATE EXTERNAL TABLE XXX ( YYY )PARTITIONED BY ( ZZZ )ROW FORMAT
>>>>> >>> DELIMITED FIELDS TERMINATED BY 'WWW' LOCATION
>>>>> 's3://my-location/data/';
>>>>> >>>
>>>>> >>> Now I am trying to recover partition using :-
>>>>> >>>
>>>>> >>> ALTER TABLE XXX RECOVER PARTITIONS;
>>>>> >>>
>>>>> >>> but I am getting this error :- "FAILED: Parse Error: line 1:12
>>>>> cannot
>>>>> >>> recognize input near 'XXX' 'RECOVER' 'PARTITIONS' in alter table
>>>>> statement"
>>>>> >>>
>>>>> >>> Doing same steps on a cluster setup on EMR with Hadoop version
>>>>> 1.0.3 and
>>>>> >>> Hive version 0.8.1 (Configured by EMR), works fine.
>>>>> >>>
>>>>> >>> So is this a version issue or am I missing some configuration
>>>>> changes in
>>>>> >>> EC2 setup ?
>>>>> >>> I am not able to find exact solution for this problem on internet.
>>>>> Please
>>>>> >>> help me.
>>>>> >>>
>>>>> >>> Thanks,
>>>>> >>> Chunky.
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> --
>>>>> >> Dean Wampler, Ph.D.
>>>>> >> thinkbiganalytics.com
>>>>> >> +1-312-339-1330
>>>>> >>
>>>>> >>
>>>>> >
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Alter table is giving error

Posted by Chunky Gupta <ch...@vizury.com>.

Hi Mark,
I didn't get any error.
I ran this on hive console:-
         "msck table Table_Name;"
It says Ok and showed the execution time as 1.050 sec.
But when I checked partitions for table using
          "show partitions Table_Name;"
It didn't show me any partitions.

Thanks,
Chunky.

On Tue, Nov 6, 2012 at 10:38 PM, Mark Grover <gr...@gmail.com>wrote:

> Glad to hear, Chunky.
>
> Out of curiosity, what errors did you get when using msck?
>
>
> On Tue, Nov 6, 2012 at 5:14 AM, Chunky Gupta <ch...@vizury.com>wrote:
>
>> Hi Mark,
>> I tried msck, but it is not working for me. I have written a python
>> script to partition the data individually.
>>
>> Thank you Edward, Mark and Dean.
>> Chunky.
>>
>>
>> On Mon, Nov 5, 2012 at 11:08 PM, Mark Grover <grover.markgrover@gmail.com
>> > wrote:
>>
>>> Chunky,
>>> I have used "recover partitions" command on EMR, and that worked fine.
>>>
>>> However, take a look at https://issues.apache.org/jira/browse/HIVE-874. Seems
>>> like msck command in Apache Hive does the same thing. Try it out and let us
>>> know it goes.
>>>
>>> Mark
>>>
>>> On Mon, Nov 5, 2012 at 7:56 AM, Edward Capriolo <ed...@gmail.com>wrote:
>>>
>>>> Recover partitions should work the same way for different file systems.
>>>>
>>>> Edward
>>>>
>>>> On Mon, Nov 5, 2012 at 9:33 AM, Dean Wampler
>>>> <de...@thinkbiganalytics.com> wrote:
>>>> > Writing a script to add the external partitions individually is the
>>>> only way
>>>> > I know of.
>>>> >
>>>> > Sent from my rotary phone.
>>>> >
>>>> >
>>>> > On Nov 5, 2012, at 8:19 AM, Chunky Gupta <ch...@vizury.com>
>>>> wrote:
>>>> >
>>>> > Hi Dean,
>>>> >
>>>> > Actually I was having Hadoop and Hive cluster on EMR and I have S3
>>>> storage
>>>> > containing logs which updates daily and having partition with
>>>> date(dt). And
>>>> > I was using this recover partition.
>>>> > Now I wanted to shift to EC2 and have my own Hadoop and Hive cluster.
>>>> So,
>>>> > what is the alternate of using recover partition in this case, if you
>>>> have
>>>> > any idea ?
>>>> > I found one way of individually partitioning all dates, so I have to
>>>> write
>>>> > script for that to do so for all dates. Is there any easiest way
>>>> other than
>>>> > this ?
>>>> >
>>>> > Thanks,
>>>> > Chunky
>>>> >
>>>> >
>>>> >
>>>> > On Mon, Nov 5, 2012 at 6:28 PM, Dean Wampler
>>>> > <de...@thinkbiganalytics.com> wrote:
>>>> >>
>>>> >> The RECOVER PARTITIONS is an enhancement added by Amazon to their
>>>> version
>>>> >> of Hive.
>>>> >>
>>>> >>
>>>> >>
>>>> http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/emr-hive-additional-features.html
>>>> >>
>>>> >> <shameless-plus>
>>>> >>   Chapter 21 of Programming Hive discusses this feature and other
>>>> aspects
>>>> >> of using Hive in EMR.
>>>> >> </shameless-plug>
>>>> >>
>>>> >> dean
>>>> >>
>>>> >>
>>>> >> On Mon, Nov 5, 2012 at 5:34 AM, Chunky Gupta <
>>>> chunky.gupta@vizury.com>
>>>> >> wrote:
>>>> >>>
>>>> >>> Hi,
>>>> >>>
>>>> >>> I am having a cluster setup on EC2 with Hadoop version 0.20.2 and
>>>> Hive
>>>> >>> version 0.8.1 (I configured everything) . I have created a table
>>>> using :-
>>>> >>>
>>>> >>> CREATE EXTERNAL TABLE XXX ( YYY )PARTITIONED BY ( ZZZ )ROW FORMAT
>>>> >>> DELIMITED FIELDS TERMINATED BY 'WWW' LOCATION
>>>> 's3://my-location/data/';
>>>> >>>
>>>> >>> Now I am trying to recover partition using :-
>>>> >>>
>>>> >>> ALTER TABLE XXX RECOVER PARTITIONS;
>>>> >>>
>>>> >>> but I am getting this error :- "FAILED: Parse Error: line 1:12
>>>> cannot
>>>> >>> recognize input near 'XXX' 'RECOVER' 'PARTITIONS' in alter table
>>>> statement"
>>>> >>>
>>>> >>> Doing same steps on a cluster setup on EMR with Hadoop version
>>>> 1.0.3 and
>>>> >>> Hive version 0.8.1 (Configured by EMR), works fine.
>>>> >>>
>>>> >>> So is this a version issue or am I missing some configuration
>>>> changes in
>>>> >>> EC2 setup ?
>>>> >>> I am not able to find exact solution for this problem on internet.
>>>> Please
>>>> >>> help me.
>>>> >>>
>>>> >>> Thanks,
>>>> >>> Chunky.
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>
>>>> >>
>>>> >>
>>>> >> --
>>>> >> Dean Wampler, Ph.D.
>>>> >> thinkbiganalytics.com
>>>> >> +1-312-339-1330
>>>> >>
>>>> >>
>>>> >
>>>>
>>>
>>>
>>
>

Re: Alter table is giving error

Posted by Mark Grover <gr...@gmail.com>.

Glad to hear, Chunky.

Out of curiosity, what errors did you get when using msck?

On Tue, Nov 6, 2012 at 5:14 AM, Chunky Gupta <ch...@vizury.com>wrote:

> Hi Mark,
> I tried msck, but it is not working for me. I have written a python script
> to partition the data individually.
>
> Thank you Edward, Mark and Dean.
> Chunky.
>
>
> On Mon, Nov 5, 2012 at 11:08 PM, Mark Grover <gr...@gmail.com>wrote:
>
>> Chunky,
>> I have used "recover partitions" command on EMR, and that worked fine.
>>
>> However, take a look at https://issues.apache.org/jira/browse/HIVE-874. Seems
>> like msck command in Apache Hive does the same thing. Try it out and let us
>> know it goes.
>>
>> Mark
>>
>> On Mon, Nov 5, 2012 at 7:56 AM, Edward Capriolo <ed...@gmail.com>wrote:
>>
>>> Recover partitions should work the same way for different file systems.
>>>
>>> Edward
>>>
>>> On Mon, Nov 5, 2012 at 9:33 AM, Dean Wampler
>>> <de...@thinkbiganalytics.com> wrote:
>>> > Writing a script to add the external partitions individually is the
>>> only way
>>> > I know of.
>>> >
>>> > Sent from my rotary phone.
>>> >
>>> >
>>> > On Nov 5, 2012, at 8:19 AM, Chunky Gupta <ch...@vizury.com>
>>> wrote:
>>> >
>>> > Hi Dean,
>>> >
>>> > Actually I was having Hadoop and Hive cluster on EMR and I have S3
>>> storage
>>> > containing logs which updates daily and having partition with
>>> date(dt). And
>>> > I was using this recover partition.
>>> > Now I wanted to shift to EC2 and have my own Hadoop and Hive cluster.
>>> So,
>>> > what is the alternate of using recover partition in this case, if you
>>> have
>>> > any idea ?
>>> > I found one way of individually partitioning all dates, so I have to
>>> write
>>> > script for that to do so for all dates. Is there any easiest way other
>>> than
>>> > this ?
>>> >
>>> > Thanks,
>>> > Chunky
>>> >
>>> >
>>> >
>>> > On Mon, Nov 5, 2012 at 6:28 PM, Dean Wampler
>>> > <de...@thinkbiganalytics.com> wrote:
>>> >>
>>> >> The RECOVER PARTITIONS is an enhancement added by Amazon to their
>>> version
>>> >> of Hive.
>>> >>
>>> >>
>>> >>
>>> http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/emr-hive-additional-features.html
>>> >>
>>> >> <shameless-plus>
>>> >>   Chapter 21 of Programming Hive discusses this feature and other
>>> aspects
>>> >> of using Hive in EMR.
>>> >> </shameless-plug>
>>> >>
>>> >> dean
>>> >>
>>> >>
>>> >> On Mon, Nov 5, 2012 at 5:34 AM, Chunky Gupta <chunky.gupta@vizury.com
>>> >
>>> >> wrote:
>>> >>>
>>> >>> Hi,
>>> >>>
>>> >>> I am having a cluster setup on EC2 with Hadoop version 0.20.2 and
>>> Hive
>>> >>> version 0.8.1 (I configured everything) . I have created a table
>>> using :-
>>> >>>
>>> >>> CREATE EXTERNAL TABLE XXX ( YYY )PARTITIONED BY ( ZZZ )ROW FORMAT
>>> >>> DELIMITED FIELDS TERMINATED BY 'WWW' LOCATION
>>> 's3://my-location/data/';
>>> >>>
>>> >>> Now I am trying to recover partition using :-
>>> >>>
>>> >>> ALTER TABLE XXX RECOVER PARTITIONS;
>>> >>>
>>> >>> but I am getting this error :- "FAILED: Parse Error: line 1:12 cannot
>>> >>> recognize input near 'XXX' 'RECOVER' 'PARTITIONS' in alter table
>>> statement"
>>> >>>
>>> >>> Doing same steps on a cluster setup on EMR with Hadoop version 1.0.3
>>> and
>>> >>> Hive version 0.8.1 (Configured by EMR), works fine.
>>> >>>
>>> >>> So is this a version issue or am I missing some configuration
>>> changes in
>>> >>> EC2 setup ?
>>> >>> I am not able to find exact solution for this problem on internet.
>>> Please
>>> >>> help me.
>>> >>>
>>> >>> Thanks,
>>> >>> Chunky.
>>> >>>
>>> >>>
>>> >>>
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Dean Wampler, Ph.D.
>>> >> thinkbiganalytics.com
>>> >> +1-312-339-1330
>>> >>
>>> >>
>>> >
>>>
>>
>>
>

Re: Alter table is giving error

Posted by Chunky Gupta <ch...@vizury.com>.

Hi Mark,
I tried msck, but it is not working for me. I have written a python script
to partition the data individually.

Thank you Edward, Mark and Dean.
Chunky.

On Mon, Nov 5, 2012 at 11:08 PM, Mark Grover <gr...@gmail.com>wrote:

> Chunky,
> I have used "recover partitions" command on EMR, and that worked fine.
>
> However, take a look at https://issues.apache.org/jira/browse/HIVE-874. Seems
> like msck command in Apache Hive does the same thing. Try it out and let us
> know it goes.
>
> Mark
>
> On Mon, Nov 5, 2012 at 7:56 AM, Edward Capriolo <ed...@gmail.com>wrote:
>
>> Recover partitions should work the same way for different file systems.
>>
>> Edward
>>
>> On Mon, Nov 5, 2012 at 9:33 AM, Dean Wampler
>> <de...@thinkbiganalytics.com> wrote:
>> > Writing a script to add the external partitions individually is the
>> only way
>> > I know of.
>> >
>> > Sent from my rotary phone.
>> >
>> >
>> > On Nov 5, 2012, at 8:19 AM, Chunky Gupta <ch...@vizury.com>
>> wrote:
>> >
>> > Hi Dean,
>> >
>> > Actually I was having Hadoop and Hive cluster on EMR and I have S3
>> storage
>> > containing logs which updates daily and having partition with date(dt).
>> And
>> > I was using this recover partition.
>> > Now I wanted to shift to EC2 and have my own Hadoop and Hive cluster.
>> So,
>> > what is the alternate of using recover partition in this case, if you
>> have
>> > any idea ?
>> > I found one way of individually partitioning all dates, so I have to
>> write
>> > script for that to do so for all dates. Is there any easiest way other
>> than
>> > this ?
>> >
>> > Thanks,
>> > Chunky
>> >
>> >
>> >
>> > On Mon, Nov 5, 2012 at 6:28 PM, Dean Wampler
>> > <de...@thinkbiganalytics.com> wrote:
>> >>
>> >> The RECOVER PARTITIONS is an enhancement added by Amazon to their
>> version
>> >> of Hive.
>> >>
>> >>
>> >>
>> http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/emr-hive-additional-features.html
>> >>
>> >> <shameless-plus>
>> >>   Chapter 21 of Programming Hive discusses this feature and other
>> aspects
>> >> of using Hive in EMR.
>> >> </shameless-plug>
>> >>
>> >> dean
>> >>
>> >>
>> >> On Mon, Nov 5, 2012 at 5:34 AM, Chunky Gupta <ch...@vizury.com>
>> >> wrote:
>> >>>
>> >>> Hi,
>> >>>
>> >>> I am having a cluster setup on EC2 with Hadoop version 0.20.2 and Hive
>> >>> version 0.8.1 (I configured everything) . I have created a table
>> using :-
>> >>>
>> >>> CREATE EXTERNAL TABLE XXX ( YYY )PARTITIONED BY ( ZZZ )ROW FORMAT
>> >>> DELIMITED FIELDS TERMINATED BY 'WWW' LOCATION
>> 's3://my-location/data/';
>> >>>
>> >>> Now I am trying to recover partition using :-
>> >>>
>> >>> ALTER TABLE XXX RECOVER PARTITIONS;
>> >>>
>> >>> but I am getting this error :- "FAILED: Parse Error: line 1:12 cannot
>> >>> recognize input near 'XXX' 'RECOVER' 'PARTITIONS' in alter table
>> statement"
>> >>>
>> >>> Doing same steps on a cluster setup on EMR with Hadoop version 1.0.3
>> and
>> >>> Hive version 0.8.1 (Configured by EMR), works fine.
>> >>>
>> >>> So is this a version issue or am I missing some configuration changes
>> in
>> >>> EC2 setup ?
>> >>> I am not able to find exact solution for this problem on internet.
>> Please
>> >>> help me.
>> >>>
>> >>> Thanks,
>> >>> Chunky.
>> >>>
>> >>>
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> Dean Wampler, Ph.D.
>> >> thinkbiganalytics.com
>> >> +1-312-339-1330
>> >>
>> >>
>> >
>>
>
>

Re: Alter table is giving error

Posted by Mark Grover <gr...@gmail.com>.

Chunky,
I have used "recover partitions" command on EMR, and that worked fine.

However, take a look at https://issues.apache.org/jira/browse/HIVE-874. Seems
like msck command in Apache Hive does the same thing. Try it out and let us
know it goes.

Mark

On Mon, Nov 5, 2012 at 7:56 AM, Edward Capriolo <ed...@gmail.com>wrote:

> Recover partitions should work the same way for different file systems.
>
> Edward
>
> On Mon, Nov 5, 2012 at 9:33 AM, Dean Wampler
> <de...@thinkbiganalytics.com> wrote:
> > Writing a script to add the external partitions individually is the only
> way
> > I know of.
> >
> > Sent from my rotary phone.
> >
> >
> > On Nov 5, 2012, at 8:19 AM, Chunky Gupta <ch...@vizury.com>
> wrote:
> >
> > Hi Dean,
> >
> > Actually I was having Hadoop and Hive cluster on EMR and I have S3
> storage
> > containing logs which updates daily and having partition with date(dt).
> And
> > I was using this recover partition.
> > Now I wanted to shift to EC2 and have my own Hadoop and Hive cluster. So,
> > what is the alternate of using recover partition in this case, if you
> have
> > any idea ?
> > I found one way of individually partitioning all dates, so I have to
> write
> > script for that to do so for all dates. Is there any easiest way other
> than
> > this ?
> >
> > Thanks,
> > Chunky
> >
> >
> >
> > On Mon, Nov 5, 2012 at 6:28 PM, Dean Wampler
> > <de...@thinkbiganalytics.com> wrote:
> >>
> >> The RECOVER PARTITIONS is an enhancement added by Amazon to their
> version
> >> of Hive.
> >>
> >>
> >>
> http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/emr-hive-additional-features.html
> >>
> >> <shameless-plus>
> >>   Chapter 21 of Programming Hive discusses this feature and other
> aspects
> >> of using Hive in EMR.
> >> </shameless-plug>
> >>
> >> dean
> >>
> >>
> >> On Mon, Nov 5, 2012 at 5:34 AM, Chunky Gupta <ch...@vizury.com>
> >> wrote:
> >>>
> >>> Hi,
> >>>
> >>> I am having a cluster setup on EC2 with Hadoop version 0.20.2 and Hive
> >>> version 0.8.1 (I configured everything) . I have created a table using
> :-
> >>>
> >>> CREATE EXTERNAL TABLE XXX ( YYY )PARTITIONED BY ( ZZZ )ROW FORMAT
> >>> DELIMITED FIELDS TERMINATED BY 'WWW' LOCATION 's3://my-location/data/';
> >>>
> >>> Now I am trying to recover partition using :-
> >>>
> >>> ALTER TABLE XXX RECOVER PARTITIONS;
> >>>
> >>> but I am getting this error :- "FAILED: Parse Error: line 1:12 cannot
> >>> recognize input near 'XXX' 'RECOVER' 'PARTITIONS' in alter table
> statement"
> >>>
> >>> Doing same steps on a cluster setup on EMR with Hadoop version 1.0.3
> and
> >>> Hive version 0.8.1 (Configured by EMR), works fine.
> >>>
> >>> So is this a version issue or am I missing some configuration changes
> in
> >>> EC2 setup ?
> >>> I am not able to find exact solution for this problem on internet.
> Please
> >>> help me.
> >>>
> >>> Thanks,
> >>> Chunky.
> >>>
> >>>
> >>>
> >>
> >>
> >>
> >> --
> >> Dean Wampler, Ph.D.
> >> thinkbiganalytics.com
> >> +1-312-339-1330
> >>
> >>
> >
>

Re: Alter table is giving error

Posted by Edward Capriolo <ed...@gmail.com>.

Recover partitions should work the same way for different file systems.

Edward

On Mon, Nov 5, 2012 at 9:33 AM, Dean Wampler
<de...@thinkbiganalytics.com> wrote:
> Writing a script to add the external partitions individually is the only way
> I know of.
>
> Sent from my rotary phone.
>
>
> On Nov 5, 2012, at 8:19 AM, Chunky Gupta <ch...@vizury.com> wrote:
>
> Hi Dean,
>
> Actually I was having Hadoop and Hive cluster on EMR and I have S3 storage
> containing logs which updates daily and having partition with date(dt). And
> I was using this recover partition.
> Now I wanted to shift to EC2 and have my own Hadoop and Hive cluster. So,
> what is the alternate of using recover partition in this case, if you have
> any idea ?
> I found one way of individually partitioning all dates, so I have to write
> script for that to do so for all dates. Is there any easiest way other than
> this ?
>
> Thanks,
> Chunky
>
>
>
> On Mon, Nov 5, 2012 at 6:28 PM, Dean Wampler
> <de...@thinkbiganalytics.com> wrote:
>>
>> The RECOVER PARTITIONS is an enhancement added by Amazon to their version
>> of Hive.
>>
>>
>> http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/emr-hive-additional-features.html
>>
>> <shameless-plus>
>>   Chapter 21 of Programming Hive discusses this feature and other aspects
>> of using Hive in EMR.
>> </shameless-plug>
>>
>> dean
>>
>>
>> On Mon, Nov 5, 2012 at 5:34 AM, Chunky Gupta <ch...@vizury.com>
>> wrote:
>>>
>>> Hi,
>>>
>>> I am having a cluster setup on EC2 with Hadoop version 0.20.2 and Hive
>>> version 0.8.1 (I configured everything) . I have created a table using :-
>>>
>>> CREATE EXTERNAL TABLE XXX ( YYY )PARTITIONED BY ( ZZZ )ROW FORMAT
>>> DELIMITED FIELDS TERMINATED BY 'WWW' LOCATION 's3://my-location/data/';
>>>
>>> Now I am trying to recover partition using :-
>>>
>>> ALTER TABLE XXX RECOVER PARTITIONS;
>>>
>>> but I am getting this error :- "FAILED: Parse Error: line 1:12 cannot
>>> recognize input near 'XXX' 'RECOVER' 'PARTITIONS' in alter table statement"
>>>
>>> Doing same steps on a cluster setup on EMR with Hadoop version 1.0.3 and
>>> Hive version 0.8.1 (Configured by EMR), works fine.
>>>
>>> So is this a version issue or am I missing some configuration changes in
>>> EC2 setup ?
>>> I am not able to find exact solution for this problem on internet. Please
>>> help me.
>>>
>>> Thanks,
>>> Chunky.
>>>
>>>
>>>
>>
>>
>>
>> --
>> Dean Wampler, Ph.D.
>> thinkbiganalytics.com
>> +1-312-339-1330
>>
>>
>

Re: Alter table is giving error

Posted by Dean Wampler <de...@thinkbiganalytics.com>.

Writing a script to add the external partitions individually is the only way I know of. 

Sent from my rotary phone. 


On Nov 5, 2012, at 8:19 AM, Chunky Gupta <ch...@vizury.com> wrote:

> Hi Dean,
> 
> Actually I was having Hadoop and Hive cluster on EMR and I have S3 storage containing logs which updates daily and having partition with date(dt). And I was using this recover partition.
> Now I wanted to shift to EC2 and have my own Hadoop and Hive cluster. So, what is the alternate of using recover partition in this case, if you have any idea ? 
> I found one way of individually partitioning all dates, so I have to write script for that to do so for all dates. Is there any easiest way other than this ?
> 
> Thanks,
> Chunky
> 
> 
> 
> On Mon, Nov 5, 2012 at 6:28 PM, Dean Wampler <de...@thinkbiganalytics.com> wrote:
>> The RECOVER PARTITIONS is an enhancement added by Amazon to their version of Hive.
>> 
>> http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/emr-hive-additional-features.html
>> 
>> <shameless-plus>
>>   Chapter 21 of Programming Hive discusses this feature and other aspects of using Hive in EMR.
>> </shameless-plug>
>> 
>> dean
>> 
>> 
>> On Mon, Nov 5, 2012 at 5:34 AM, Chunky Gupta <ch...@vizury.com> wrote:
>>> Hi,
>>> 
>>> I am having a cluster setup on EC2 with Hadoop version 0.20.2 and Hive version 0.8.1 (I configured everything) . I have created a table using :-
>>> 
>>> CREATE EXTERNAL TABLE XXX ( YYY )PARTITIONED BY ( ZZZ )ROW FORMAT DELIMITED FIELDS TERMINATED BY 'WWW' LOCATION 's3://my-location/data/';
>>> 
>>> Now I am trying to recover partition using :-
>>> 
>>> ALTER TABLE XXX RECOVER PARTITIONS;
>>> 
>>> but I am getting this error :- "FAILED: Parse Error: line 1:12 cannot recognize input near 'XXX' 'RECOVER' 'PARTITIONS' in alter table statement"
>>> 
>>> Doing same steps on a cluster setup on EMR with Hadoop version 1.0.3 and Hive version 0.8.1 (Configured by EMR), works fine.
>>> 
>>> So is this a version issue or am I missing some configuration changes in EC2 setup ?
>>> I am not able to find exact solution for this problem on internet. Please help me.
>>> 
>>> Thanks,
>>> Chunky.
>> 
>> 
>> 
>> -- 
>> Dean Wampler, Ph.D.
>> thinkbiganalytics.com
>> +1-312-339-1330
>

Re: Alter table is giving error

Posted by Chunky Gupta <ch...@vizury.com>.

Hi Dean,

Actually I was having Hadoop and Hive cluster on EMR and I have S3 storage
containing logs which updates daily and having partition with date(dt). And
I was using this recover partition.
Now I wanted to shift to EC2 and have my own Hadoop and Hive cluster. So,
what is the alternate of using recover partition in this case, if you have
any idea ?
I found one way of individually partitioning all dates, so I have to write
script for that to do so for all dates. Is there any easiest way other than
this ?

Thanks,
Chunky



On Mon, Nov 5, 2012 at 6:28 PM, Dean Wampler <
dean.wampler@thinkbiganalytics.com> wrote:

> The RECOVER PARTITIONS is an enhancement added by Amazon to their version
> of Hive.
>
>
> http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/emr-hive-additional-features.html
>
> <shameless-plus>
>   Chapter 21 of Programming Hive discusses this feature and other aspects
> of using Hive in EMR.
> </shameless-plug>
>
> dean
>
>
> On Mon, Nov 5, 2012 at 5:34 AM, Chunky Gupta <ch...@vizury.com>wrote:
>
>> Hi,
>>
>> I am having a cluster setup on EC2 with Hadoop version 0.20.2 and Hive
>> version 0.8.1 (I configured everything) . I have created a table using :-
>>
>> CREATE EXTERNAL TABLE XXX ( YYY )PARTITIONED BY ( ZZZ )ROW FORMAT
>> DELIMITED FIELDS TERMINATED BY 'WWW' LOCATION 's3://my-location/data/';
>>
>> Now I am trying to recover partition using :-
>>
>> ALTER TABLE XXX RECOVER PARTITIONS;
>>
>> but I am getting this error :- "FAILED: Parse Error: line 1:12 cannot
>> recognize input near 'XXX' 'RECOVER' 'PARTITIONS' in alter table statement"
>>
>> Doing same steps on a cluster setup on EMR with Hadoop version 1.0.3 and
>> Hive version 0.8.1 (Configured by EMR), works fine.
>>
>> So is this a version issue or am I missing some configuration changes in
>> EC2 setup ?
>> I am not able to find exact solution for this problem on internet. Please
>> help me.
>>
>> Thanks,
>> Chunky.
>>
>>
>>
>>
>
>
> --
> *Dean Wampler, Ph.D.*
> thinkbiganalytics.com
> +1-312-339-1330
>
>
>

Re: Alter table is giving error

Posted by Dean Wampler <de...@thinkbiganalytics.com>.

The RECOVER PARTITIONS is an enhancement added by Amazon to their version
of Hive.

http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/emr-hive-additional-features.html

<shameless-plus>
  Chapter 21 of Programming Hive discusses this feature and other aspects
of using Hive in EMR.
</shameless-plug>

dean

On Mon, Nov 5, 2012 at 5:34 AM, Chunky Gupta <ch...@vizury.com>wrote:

> Hi,
>
> I am having a cluster setup on EC2 with Hadoop version 0.20.2 and Hive
> version 0.8.1 (I configured everything) . I have created a table using :-
>
> CREATE EXTERNAL TABLE XXX ( YYY )PARTITIONED BY ( ZZZ )ROW FORMAT
> DELIMITED FIELDS TERMINATED BY 'WWW' LOCATION 's3://my-location/data/';
>
> Now I am trying to recover partition using :-
>
> ALTER TABLE XXX RECOVER PARTITIONS;
>
> but I am getting this error :- "FAILED: Parse Error: line 1:12 cannot
> recognize input near 'XXX' 'RECOVER' 'PARTITIONS' in alter table statement"
>
> Doing same steps on a cluster setup on EMR with Hadoop version 1.0.3 and
> Hive version 0.8.1 (Configured by EMR), works fine.
>
> So is this a version issue or am I missing some configuration changes in
> EC2 setup ?
> I am not able to find exact solution for this problem on internet. Please
> help me.
>
> Thanks,
> Chunky.
>
>
>
>


-- 
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330