You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Mark Grover <gr...@gmail.com> on 2012/06/01 06:35:54 UTC

Behavior of Hive 2837: insert into external tables should not be allowed

Hi folks,
I have a question regarding HIVE 2837(
https://issues.apache.org/jira/browse/HIVE-2837) that deals with
disallowing external table from using insert into queries.

>From looking at the JIRA, it seems like it applies to external tables on
HDFS as well. Technically, insert into should be ok for external tables on
HDFS (and S3 as well). Seems like a storage file system level thing to
specify whether insert into is applied and implement it.

Historically, there hasn't been any real difference between creating an
external table on HDFS vs creating a managed one. However, if we disallow
insert into on external tables, that would mean that folks with external
tables on HDFS wouldn't be able to make use of insert into functionality
even though they should be able to. Do we want to allow insert into on HDFS
tables regardless of whether they are external or not?

Mark

Re: Behavior of Hive 2837: insert into external tables should not be allowed

Posted by Edward Capriolo <ed...@gmail.com>.
Well now hive has a property

hive.insert.into.external.tables which is true by default.

So the default behaviour/semantics is unchange unless the switch is
thrown. That is a fair compromise all be it semi confusing when there
is already two other ways to prevent someone from editing the table
(one being the hive access/authorization framework)

Edward

On 6/1/12, Edward Capriolo <ed...@gmail.com> wrote:
> I am a bit confused by this feature too especialyl since hive now has
> a lock table function. Changing existing semantics would be bad.
> Different storage handlers actually treat external differently as
> well.
>
> On 6/1/12, Mark Grover <gr...@gmail.com> wrote:
>> Hi folks,
>> I have a question regarding HIVE 2837(
>> https://issues.apache.org/jira/browse/HIVE-2837) that deals with
>> disallowing external table from using insert into queries.
>>
>> From looking at the JIRA, it seems like it applies to external tables on
>> HDFS as well. Technically, insert into should be ok for external tables
>> on
>> HDFS (and S3 as well). Seems like a storage file system level thing to
>> specify whether insert into is applied and implement it.
>>
>> Historically, there hasn't been any real difference between creating an
>> external table on HDFS vs creating a managed one. However, if we disallow
>> insert into on external tables, that would mean that folks with external
>> tables on HDFS wouldn't be able to make use of insert into functionality
>> even though they should be able to. Do we want to allow insert into on
>> HDFS
>> tables regardless of whether they are external or not?
>>
>> Mark
>>
>

Re: Behavior of Hive 2837: insert into external tables should not be allowed

Posted by Edward Capriolo <ed...@gmail.com>.
Well now hive has a property

hive.insert.into.external.tables which is true by default.

So the default behaviour/semantics is unchange unless the switch is
thrown. That is a fair compromise all be it semi confusing when there
is already two other ways to prevent someone from editing the table
(one being the hive access/authorization framework)

Edward

On 6/1/12, Edward Capriolo <ed...@gmail.com> wrote:
> I am a bit confused by this feature too especialyl since hive now has
> a lock table function. Changing existing semantics would be bad.
> Different storage handlers actually treat external differently as
> well.
>
> On 6/1/12, Mark Grover <gr...@gmail.com> wrote:
>> Hi folks,
>> I have a question regarding HIVE 2837(
>> https://issues.apache.org/jira/browse/HIVE-2837) that deals with
>> disallowing external table from using insert into queries.
>>
>> From looking at the JIRA, it seems like it applies to external tables on
>> HDFS as well. Technically, insert into should be ok for external tables
>> on
>> HDFS (and S3 as well). Seems like a storage file system level thing to
>> specify whether insert into is applied and implement it.
>>
>> Historically, there hasn't been any real difference between creating an
>> external table on HDFS vs creating a managed one. However, if we disallow
>> insert into on external tables, that would mean that folks with external
>> tables on HDFS wouldn't be able to make use of insert into functionality
>> even though they should be able to. Do we want to allow insert into on
>> HDFS
>> tables regardless of whether they are external or not?
>>
>> Mark
>>
>

Re: Behavior of Hive 2837: insert into external tables should not be allowed

Posted by Edward Capriolo <ed...@gmail.com>.
I am a bit confused by this feature too especialyl since hive now has
a lock table function. Changing existing semantics would be bad.
Different storage handlers actually treat external differently as
well.

On 6/1/12, Mark Grover <gr...@gmail.com> wrote:
> Hi folks,
> I have a question regarding HIVE 2837(
> https://issues.apache.org/jira/browse/HIVE-2837) that deals with
> disallowing external table from using insert into queries.
>
> From looking at the JIRA, it seems like it applies to external tables on
> HDFS as well. Technically, insert into should be ok for external tables on
> HDFS (and S3 as well). Seems like a storage file system level thing to
> specify whether insert into is applied and implement it.
>
> Historically, there hasn't been any real difference between creating an
> external table on HDFS vs creating a managed one. However, if we disallow
> insert into on external tables, that would mean that folks with external
> tables on HDFS wouldn't be able to make use of insert into functionality
> even though they should be able to. Do we want to allow insert into on HDFS
> tables regardless of whether they are external or not?
>
> Mark
>

Re: Behavior of Hive 2837: insert into external tables should not be allowed

Posted by Mark Grover <mg...@oanda.com>.
Thanks, Ashutosh and Ed.

Historically, I didn't have much reason choose managed over external tables or vice-versa since the semantics were very similar. I chose external because it allowed me a better handle on the table metadata. For example, if a new column got added to the file, I could just drop the external table and recreate with the new schema. With managed, I could do the same using ALTER TABLE commands but at that point, not all metadata for the table could be modified using ALTER TABLE commands so I decided to go with external tables. I think a lot of people use external tables on HDFS in preference to managed tables.

I did see the property hive.insert.into.external.tables but it's a all-or-none switch. If I had an HBase external table and a HDFS external table, it might very well be the case that I want to be able to insert into the HDFS backed external but not the HBase table. So, to me disallowing insert into all the external tables doesn't seem like the right thing to do. Like Ed suggested, it's dependent on the storage handler not on the table being external. I could go ahead and use table locking in that case, but that kinda defeats the purpose of this feature and property.

Thoughts?

Mark

----- Original Message -----
From: "Ashutosh Chauhan" <ha...@apache.org>
To: dev@hive.apache.org
Cc: user@hive.apache.org
Sent: Friday, June 1, 2012 10:24:24 AM
Subject: Re: Behavior of Hive 2837: insert into external tables should not be allowed

Hi Mark, 


I understand your concern w.r.t backward compatibility. But as Ed pointed out there is a config variable and by default semantic is unchanged so you can continue to insert into your external table. 
I have a question though. Why are you creating all your tables as "external" tables ? Why not regular tables? 


Thanks, 
Ashutosh 


On Thu, May 31, 2012 at 9:35 PM, Mark Grover < grover.markgrover@gmail.com > wrote: 


Hi folks, 
I have a question regarding HIVE 2837( 
https://issues.apache.org/jira/browse/HIVE-2837 ) that deals with 
disallowing external table from using insert into queries. 

>From looking at the JIRA, it seems like it applies to external tables on 
HDFS as well. Technically, insert into should be ok for external tables on 
HDFS (and S3 as well). Seems like a storage file system level thing to 
specify whether insert into is applied and implement it. 

Historically, there hasn't been any real difference between creating an 
external table on HDFS vs creating a managed one. However, if we disallow 
insert into on external tables, that would mean that folks with external 
tables on HDFS wouldn't be able to make use of insert into functionality 
even though they should be able to. Do we want to allow insert into on HDFS 
tables regardless of whether they are external or not? 

Mark 

Re: Behavior of Hive 2837: insert into external tables should not be allowed

Posted by Mark Grover <mg...@oanda.com>.
Thanks, Ashutosh and Ed.

Historically, I didn't have much reason choose managed over external tables or vice-versa since the semantics were very similar. I chose external because it allowed me a better handle on the table metadata. For example, if a new column got added to the file, I could just drop the external table and recreate with the new schema. With managed, I could do the same using ALTER TABLE commands but at that point, not all metadata for the table could be modified using ALTER TABLE commands so I decided to go with external tables. I think a lot of people use external tables on HDFS in preference to managed tables.

I did see the property hive.insert.into.external.tables but it's a all-or-none switch. If I had an HBase external table and a HDFS external table, it might very well be the case that I want to be able to insert into the HDFS backed external but not the HBase table. So, to me disallowing insert into all the external tables doesn't seem like the right thing to do. Like Ed suggested, it's dependent on the storage handler not on the table being external. I could go ahead and use table locking in that case, but that kinda defeats the purpose of this feature and property.

Thoughts?

Mark

----- Original Message -----
From: "Ashutosh Chauhan" <ha...@apache.org>
To: dev@hive.apache.org
Cc: user@hive.apache.org
Sent: Friday, June 1, 2012 10:24:24 AM
Subject: Re: Behavior of Hive 2837: insert into external tables should not be allowed

Hi Mark, 


I understand your concern w.r.t backward compatibility. But as Ed pointed out there is a config variable and by default semantic is unchanged so you can continue to insert into your external table. 
I have a question though. Why are you creating all your tables as "external" tables ? Why not regular tables? 


Thanks, 
Ashutosh 


On Thu, May 31, 2012 at 9:35 PM, Mark Grover < grover.markgrover@gmail.com > wrote: 


Hi folks, 
I have a question regarding HIVE 2837( 
https://issues.apache.org/jira/browse/HIVE-2837 ) that deals with 
disallowing external table from using insert into queries. 

>From looking at the JIRA, it seems like it applies to external tables on 
HDFS as well. Technically, insert into should be ok for external tables on 
HDFS (and S3 as well). Seems like a storage file system level thing to 
specify whether insert into is applied and implement it. 

Historically, there hasn't been any real difference between creating an 
external table on HDFS vs creating a managed one. However, if we disallow 
insert into on external tables, that would mean that folks with external 
tables on HDFS wouldn't be able to make use of insert into functionality 
even though they should be able to. Do we want to allow insert into on HDFS 
tables regardless of whether they are external or not? 

Mark 

Re: Behavior of Hive 2837: insert into external tables should not be allowed

Posted by Ashutosh Chauhan <ha...@apache.org>.
Hi Mark,

I understand your concern w.r.t backward compatibility. But as Ed pointed
out there is a config variable and by default semantic is unchanged so you
can continue to insert into your external table.
I have a question though. Why are you creating all your tables as
"external" tables ? Why not regular tables?

Thanks,
Ashutosh

On Thu, May 31, 2012 at 9:35 PM, Mark Grover <gr...@gmail.com>wrote:

> Hi folks,
> I have a question regarding HIVE 2837(
> https://issues.apache.org/jira/browse/HIVE-2837) that deals with
> disallowing external table from using insert into queries.
>
> From looking at the JIRA, it seems like it applies to external tables on
> HDFS as well. Technically, insert into should be ok for external tables on
> HDFS (and S3 as well). Seems like a storage file system level thing to
> specify whether insert into is applied and implement it.
>
> Historically, there hasn't been any real difference between creating an
> external table on HDFS vs creating a managed one. However, if we disallow
> insert into on external tables, that would mean that folks with external
> tables on HDFS wouldn't be able to make use of insert into functionality
> even though they should be able to. Do we want to allow insert into on HDFS
> tables regardless of whether they are external or not?
>
> Mark
>

Re: Behavior of Hive 2837: insert into external tables should not be allowed

Posted by Ashutosh Chauhan <ha...@apache.org>.
Hi Mark,

I understand your concern w.r.t backward compatibility. But as Ed pointed
out there is a config variable and by default semantic is unchanged so you
can continue to insert into your external table.
I have a question though. Why are you creating all your tables as
"external" tables ? Why not regular tables?

Thanks,
Ashutosh

On Thu, May 31, 2012 at 9:35 PM, Mark Grover <gr...@gmail.com>wrote:

> Hi folks,
> I have a question regarding HIVE 2837(
> https://issues.apache.org/jira/browse/HIVE-2837) that deals with
> disallowing external table from using insert into queries.
>
> From looking at the JIRA, it seems like it applies to external tables on
> HDFS as well. Technically, insert into should be ok for external tables on
> HDFS (and S3 as well). Seems like a storage file system level thing to
> specify whether insert into is applied and implement it.
>
> Historically, there hasn't been any real difference between creating an
> external table on HDFS vs creating a managed one. However, if we disallow
> insert into on external tables, that would mean that folks with external
> tables on HDFS wouldn't be able to make use of insert into functionality
> even though they should be able to. Do we want to allow insert into on HDFS
> tables regardless of whether they are external or not?
>
> Mark
>

Re: Behavior of Hive 2837: insert into external tables should not be allowed

Posted by Edward Capriolo <ed...@gmail.com>.
I am a bit confused by this feature too especialyl since hive now has
a lock table function. Changing existing semantics would be bad.
Different storage handlers actually treat external differently as
well.

On 6/1/12, Mark Grover <gr...@gmail.com> wrote:
> Hi folks,
> I have a question regarding HIVE 2837(
> https://issues.apache.org/jira/browse/HIVE-2837) that deals with
> disallowing external table from using insert into queries.
>
> From looking at the JIRA, it seems like it applies to external tables on
> HDFS as well. Technically, insert into should be ok for external tables on
> HDFS (and S3 as well). Seems like a storage file system level thing to
> specify whether insert into is applied and implement it.
>
> Historically, there hasn't been any real difference between creating an
> external table on HDFS vs creating a managed one. However, if we disallow
> insert into on external tables, that would mean that folks with external
> tables on HDFS wouldn't be able to make use of insert into functionality
> even though they should be able to. Do we want to allow insert into on HDFS
> tables regardless of whether they are external or not?
>
> Mark
>