You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Krishnan K <kk...@gmail.com> on 2013/01/24 15:44:19 UTC

Loading a Hive table simultaneously from 2 different sources

Hi All,

Could you please let me know what would happen if we try to load a table
from 2 different sources at the same time ?

I had tried this earlier and got an error for 1 load job and while the
other job loaded the data successfully into the table..

I guess it was because of lock acquired on the table by the first load
process.

Is there anyway to handle this ?

Please give your insights.

Regards,
Krishnan

Re: Loading a Hive table simultaneously from 2 different sources

Posted by Dean Wampler <de...@thinkbiganalytics.com>.
You'll face all the usual concurrency synchronization risks if you're
updating the same "place" concurrently. One thing to keep in mind; it's all
just HDFS under the hood. That pretty much tells you everything you need to
know. Yes, there's also the metadata. So, one way to update a partition
directory safely is to write to unique files. Hive doesn't care about their
names.

You can even write new directories for the partitions yourself, bypassing
Hive, and then tell Hive to "find" them afterwards. See
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Recoverpartitions
In this case, you're updating the metadata to reflect what just
happened
to the file system.

dean

On Thu, Jan 24, 2013 at 9:33 AM, Krishnan K <kk...@gmail.com> wrote:

> Hi Edward, All,
>
> Thanks for the quick reply!
>
> We are using dynamic partitions - so unable to say to which partition each
> record goes. We dont have much control here.
>
> Is there any properties that can be set ?
> I'm a bit doubtful here - is it because of the lock acquired on the table ?
>
> Regards,
> Krishnan
>
>
> On Thu, Jan 24, 2013 at 8:22 PM, Edward Capriolo <ed...@gmail.com>wrote:
>
>> Partition the table and load the data into different partitions. That or
>> build the data outside he table and then use scripting to move the data in
>> using LOAD DATA INPATH or copying.
>>
>>
>> On Thu, Jan 24, 2013 at 9:44 AM, Krishnan K <kk...@gmail.com>wrote:
>>
>>> Hi All,
>>>
>>> Could you please let me know what would happen if we try to load a table
>>> from 2 different sources at the same time ?
>>>
>>> I had tried this earlier and got an error for 1 load job and while the
>>> other job loaded the data successfully into the table..
>>>
>>> I guess it was because of lock acquired on the table by the first load
>>> process.
>>>
>>> Is there anyway to handle this ?
>>>
>>> Please give your insights.
>>>
>>> Regards,
>>> Krishnan
>>>
>>>
>>>
>>
>


-- 
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330

Re: Loading a Hive table simultaneously from 2 different sources

Posted by Krishnan K <kk...@gmail.com>.
Hi Edward, All,

Thanks for the quick reply!

We are using dynamic partitions - so unable to say to which partition each
record goes. We dont have much control here.

Is there any properties that can be set ?
I'm a bit doubtful here - is it because of the lock acquired on the table ?

Regards,
Krishnan


On Thu, Jan 24, 2013 at 8:22 PM, Edward Capriolo <ed...@gmail.com>wrote:

> Partition the table and load the data into different partitions. That or
> build the data outside he table and then use scripting to move the data in
> using LOAD DATA INPATH or copying.
>
>
> On Thu, Jan 24, 2013 at 9:44 AM, Krishnan K <kk...@gmail.com> wrote:
>
>> Hi All,
>>
>> Could you please let me know what would happen if we try to load a table
>> from 2 different sources at the same time ?
>>
>> I had tried this earlier and got an error for 1 load job and while the
>> other job loaded the data successfully into the table..
>>
>> I guess it was because of lock acquired on the table by the first load
>> process.
>>
>> Is there anyway to handle this ?
>>
>> Please give your insights.
>>
>> Regards,
>> Krishnan
>>
>>
>>
>

RE: Loading a Hive table simultaneously from 2 different sources

Posted by Bennie Schut <bs...@ebuddy.com>.
The benefit of using the partitioned approach is really nicely described in the oreilly book "Programming Hive". (Thanks for writing it Edward)
For me the ability to drop a single partition if there's any doubt about the quality of the data of just one job is a large benefit.

From: Edward Capriolo [mailto:edlinuxguru@gmail.com]
Sent: Thursday, January 24, 2013 3:52 PM
To: user@hive.apache.org
Subject: Re: Loading a Hive table simultaneously from 2 different sources

Partition the table and load the data into different partitions. That or build the data outside he table and then use scripting to move the data in using LOAD DATA INPATH or copying.
On Thu, Jan 24, 2013 at 9:44 AM, Krishnan K <kk...@gmail.com>> wrote:
Hi All,

Could you please let me know what would happen if we try to load a table from 2 different sources at the same time ?

I had tried this earlier and got an error for 1 load job and while the other job loaded the data successfully into the table..

I guess it was because of lock acquired on the table by the first load process.

Is there anyway to handle this ?

Please give your insights.

Regards,
Krishnan




Re: Loading a Hive table simultaneously from 2 different sources

Posted by Edward Capriolo <ed...@gmail.com>.
Partition the table and load the data into different partitions. That or
build the data outside he table and then use scripting to move the data in
using LOAD DATA INPATH or copying.

On Thu, Jan 24, 2013 at 9:44 AM, Krishnan K <kk...@gmail.com> wrote:

> Hi All,
>
> Could you please let me know what would happen if we try to load a table
> from 2 different sources at the same time ?
>
> I had tried this earlier and got an error for 1 load job and while the
> other job loaded the data successfully into the table..
>
> I guess it was because of lock acquired on the table by the first load
> process.
>
> Is there anyway to handle this ?
>
> Please give your insights.
>
> Regards,
> Krishnan
>
>
>