You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Peyman Mohajerian <mo...@gmail.com> on 2015/08/25 01:11:54 UTC

Data Deleted on Hive External Table

Hi Guys,

I managed to delete some data in HDFS by dropping a partitioned external
Hive table. One explanation is that data resided in the 'warehouse'
directory of Hive and that had something to do with?
An alternative explanation may that my 'drop table' statement didn't delete
the data but my follow up 'create table' statement with a different
partition name did. Let me elaborate, files used to be in this directory
structure:
/user/hive/warehouse/<tablename>/year=2009

I created a new Hive external table with partition column name of 'yr'
instead of 'year' pointing to the same base directory. Is it possible that
this create statement deleted the data (highly doubt that)? Either case
were unexpected to me!

This is on Hive 1.0.

Thanks,
Peyman

Re: Using transform

Posted by "Manjee, Sunile" <Su...@Teradata.com>.
You can use transform when you use a python udf.


select transform (column here)
using 'python myPythonScript.py' as (column outupt name here) from YourhiveTable
where ….


Sunile Manjee

From: rakesh sharma <ra...@hotmail.com>>
Reply-To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Date: Tuesday, August 25, 2015 at 9:17 AM
To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Subject: Using transform

Whats the use and purpose of transform in hive
Any help is appreciated

thanks
rakesh

Using transform

Posted by rakesh sharma <ra...@hotmail.com>.
Whats the use and purpose of transform in hiveAny help is appreciated
thanksrakesh 		 	   		  

Re: Data Deleted on Hive External Table

Posted by Peyman Mohajerian <mo...@gmail.com>.
Data was generated in some other cluster, they moved it to s3 and then
copied it to my cluster into the warehouse path. I then created a schema
over it. You are correct that this would not be the right process and we
had no plans to do this in production, it was a POC. Nevertheless in my
view 'external' should still carry the same meaning that 'Despite the fact
that data is in warehouse, I'm just doing some experimentation on the
different schema design and am creating temporary schema over this data and
therefore don't delete the content'. Perhaps instead of using 'external'
there is other options.  Also if 'external' doesn't mean anything in this
scenario perhaps throw me an exception so I'm unable to create the table in
the first place.
Again what I'm saying above is my logic and I could be wrong in something.



On Tue, Aug 25, 2015 at 7:09 AM, Jeetendra G <je...@housing.com>
wrote:

> if you put external in the table definition and point  INPATH to hive the
> original data(where data is landing from other source  ). then how come
> data will come to /user/hive/warehouse. /user/hive/warehouse should only be
> populated with data when its 'internal'?
>
> On Tue, Aug 25, 2015 at 7:33 PM, Peyman Mohajerian <mo...@gmail.com>
> wrote:
>
>> Hi Jeetendra,
>>
>> What I was originally saying is that if you drop the table, it will
>> deleted the data despite the fact that you put 'external' in the
>> definition. I think this behavior is due to the fact that data is in
>> /user/hive/warehouse and therefore Hive assumes ownership and ignores the
>> 'external' directive! I would have assumed 'external' would still carry its
>> meaning and dropping the table would not delete the data, but I was wrong.
>> If I got this inaccurately please challenge my conclusion.
>>
>> Thanks,
>> Peyman
>>
>> On Mon, Aug 24, 2015 at 11:22 PM, Jeetendra G <je...@housing.com>
>> wrote:
>>
>>> Hi Peyman
>>>
>>> I created a new Hive external table with partition column name of 'yr'
>>> instead of 'year' pointing to the same base directory.
>>> if this is a case how come /user/hive/warehouse having the data? it
>>> should not right?
>>>
>>> On Tue, Aug 25, 2015 at 4:41 AM, Peyman Mohajerian <mo...@gmail.com>
>>> wrote:
>>>
>>>> Hi Guys,
>>>>
>>>> I managed to delete some data in HDFS by dropping a partitioned
>>>> external Hive table. One explanation is that data resided in the
>>>> 'warehouse' directory of Hive and that had something to do with?
>>>> An alternative explanation may that my 'drop table' statement didn't
>>>> delete the data but my follow up 'create table' statement with a different
>>>> partition name did. Let me elaborate, files used to be in this directory
>>>> structure:
>>>> /user/hive/warehouse/<tablename>/year=2009
>>>>
>>>> I created a new Hive external table with partition column name of 'yr'
>>>> instead of 'year' pointing to the same base directory. Is it possible that
>>>> this create statement deleted the data (highly doubt that)? Either case
>>>> were unexpected to me!
>>>>
>>>> This is on Hive 1.0.
>>>>
>>>> Thanks,
>>>> Peyman
>>>>
>>>
>>>
>>
>

Re: Data Deleted on Hive External Table

Posted by Jeetendra G <je...@housing.com>.
if you put external in the table definition and point  INPATH to hive the
original data(where data is landing from other source  ). then how come
data will come to /user/hive/warehouse. /user/hive/warehouse should only be
populated with data when its 'internal'?

On Tue, Aug 25, 2015 at 7:33 PM, Peyman Mohajerian <mo...@gmail.com>
wrote:

> Hi Jeetendra,
>
> What I was originally saying is that if you drop the table, it will
> deleted the data despite the fact that you put 'external' in the
> definition. I think this behavior is due to the fact that data is in
> /user/hive/warehouse and therefore Hive assumes ownership and ignores the
> 'external' directive! I would have assumed 'external' would still carry its
> meaning and dropping the table would not delete the data, but I was wrong.
> If I got this inaccurately please challenge my conclusion.
>
> Thanks,
> Peyman
>
> On Mon, Aug 24, 2015 at 11:22 PM, Jeetendra G <je...@housing.com>
> wrote:
>
>> Hi Peyman
>>
>> I created a new Hive external table with partition column name of 'yr'
>> instead of 'year' pointing to the same base directory.
>> if this is a case how come /user/hive/warehouse having the data? it
>> should not right?
>>
>> On Tue, Aug 25, 2015 at 4:41 AM, Peyman Mohajerian <mo...@gmail.com>
>> wrote:
>>
>>> Hi Guys,
>>>
>>> I managed to delete some data in HDFS by dropping a partitioned external
>>> Hive table. One explanation is that data resided in the 'warehouse'
>>> directory of Hive and that had something to do with?
>>> An alternative explanation may that my 'drop table' statement didn't
>>> delete the data but my follow up 'create table' statement with a different
>>> partition name did. Let me elaborate, files used to be in this directory
>>> structure:
>>> /user/hive/warehouse/<tablename>/year=2009
>>>
>>> I created a new Hive external table with partition column name of 'yr'
>>> instead of 'year' pointing to the same base directory. Is it possible that
>>> this create statement deleted the data (highly doubt that)? Either case
>>> were unexpected to me!
>>>
>>> This is on Hive 1.0.
>>>
>>> Thanks,
>>> Peyman
>>>
>>
>>
>

Re: Data Deleted on Hive External Table

Posted by Peyman Mohajerian <mo...@gmail.com>.
Hi Jeetendra,

What I was originally saying is that if you drop the table, it will deleted
the data despite the fact that you put 'external' in the definition. I
think this behavior is due to the fact that data is in /user/hive/warehouse
and therefore Hive assumes ownership and ignores the 'external' directive!
I would have assumed 'external' would still carry its meaning and dropping
the table would not delete the data, but I was wrong.
If I got this inaccurately please challenge my conclusion.

Thanks,
Peyman

On Mon, Aug 24, 2015 at 11:22 PM, Jeetendra G <je...@housing.com>
wrote:

> Hi Peyman
>
> I created a new Hive external table with partition column name of 'yr'
> instead of 'year' pointing to the same base directory.
> if this is a case how come /user/hive/warehouse having the data? it should
> not right?
>
> On Tue, Aug 25, 2015 at 4:41 AM, Peyman Mohajerian <mo...@gmail.com>
> wrote:
>
>> Hi Guys,
>>
>> I managed to delete some data in HDFS by dropping a partitioned external
>> Hive table. One explanation is that data resided in the 'warehouse'
>> directory of Hive and that had something to do with?
>> An alternative explanation may that my 'drop table' statement didn't
>> delete the data but my follow up 'create table' statement with a different
>> partition name did. Let me elaborate, files used to be in this directory
>> structure:
>> /user/hive/warehouse/<tablename>/year=2009
>>
>> I created a new Hive external table with partition column name of 'yr'
>> instead of 'year' pointing to the same base directory. Is it possible that
>> this create statement deleted the data (highly doubt that)? Either case
>> were unexpected to me!
>>
>> This is on Hive 1.0.
>>
>> Thanks,
>> Peyman
>>
>
>

Re: Data Deleted on Hive External Table

Posted by Jeetendra G <je...@housing.com>.
Hi Peyman

I created a new Hive external table with partition column name of 'yr'
instead of 'year' pointing to the same base directory.
if this is a case how come /user/hive/warehouse having the data? it should
not right?

On Tue, Aug 25, 2015 at 4:41 AM, Peyman Mohajerian <mo...@gmail.com>
wrote:

> Hi Guys,
>
> I managed to delete some data in HDFS by dropping a partitioned external
> Hive table. One explanation is that data resided in the 'warehouse'
> directory of Hive and that had something to do with?
> An alternative explanation may that my 'drop table' statement didn't
> delete the data but my follow up 'create table' statement with a different
> partition name did. Let me elaborate, files used to be in this directory
> structure:
> /user/hive/warehouse/<tablename>/year=2009
>
> I created a new Hive external table with partition column name of 'yr'
> instead of 'year' pointing to the same base directory. Is it possible that
> this create statement deleted the data (highly doubt that)? Either case
> were unexpected to me!
>
> This is on Hive 1.0.
>
> Thanks,
> Peyman
>