You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Bennie Schut <bs...@ebuddy.com> on 2010/02/05 12:22:52 UTC

LZO Compression on trunk

I have a tab separated files I have loaded it with "load data inpath" 
then I do a

SET hive.exec.compress.output=true;
SET mapred.output.compression.codec=com.hadoop.compression.lzo.LzoCodec;
SET mapred.map.output.compression.codec=com.hadoop.compression.lzo.LzoCodec;
select distinct login_cldr_id as cldr_id from chatsessions_load;

Ended Job = job_201001151039_1641
OK
NULL
NULL
NULL
Time taken: 49.06 seconds

however if I start it without the set commands I get this:
Ended Job = job_201001151039_1642
OK
2283
Time taken: 45.308 seconds

Which is the correct result.

When I do a "insert overwrite" on a rcfile table it will actually 
compress the data correctly.
When I disable compression and query this new table the result is correct.
When I enable compression it's wrong again.
I see no errors in the logs.

Any idea's why this might happen?



Re: LZO Compression on trunk

Posted by Bennie Schut <bs...@ebuddy.com>.
Hadoop 0.20.1 and hive trunk from this week. Monday I'll try and use an 
older version of hive to see if that helps. Perhaps also "gz" to see if 
it's compression in general.

Yongqiang He wrote:
> Hi Bennie,
> Can you post your hadoop version and hive version?
>
> Thanks
> Yongqiang
>
>
> On 2/5/10 10:05 AM, "Zheng Shao" <zs...@gmail.com> wrote:
>
>   
>> That seems to be a bug.
>> Are you using hive trunk or any release?
>>
>>
>> On 2/5/10, Bennie Schut <bs...@ebuddy.com> wrote:
>>     
>>> I have a tab separated files I have loaded it with "load data inpath"
>>> then I do a
>>>
>>> SET hive.exec.compress.output=true;
>>> SET mapred.output.compression.codec=com.hadoop.compression.lzo.LzoCodec;
>>> SET mapred.map.output.compression.codec=com.hadoop.compression.lzo.LzoCodec;
>>> select distinct login_cldr_id as cldr_id from chatsessions_load;
>>>
>>> Ended Job = job_201001151039_1641
>>> OK
>>> NULL
>>> NULL
>>> NULL
>>> Time taken: 49.06 seconds
>>>
>>> however if I start it without the set commands I get this:
>>> Ended Job = job_201001151039_1642
>>> OK
>>> 2283
>>> Time taken: 45.308 seconds
>>>
>>> Which is the correct result.
>>>
>>> When I do a "insert overwrite" on a rcfile table it will actually
>>> compress the data correctly.
>>> When I disable compression and query this new table the result is correct.
>>> When I enable compression it's wrong again.
>>> I see no errors in the logs.
>>>
>>> Any idea's why this might happen?
>>>
>>>
>>>
>>>       
>
>
>   


Re: LZO Compression on trunk

Posted by Yongqiang He <he...@software.ict.ac.cn>.
Hi Bennie,
Can you post your hadoop version and hive version?

Thanks
Yongqiang


On 2/5/10 10:05 AM, "Zheng Shao" <zs...@gmail.com> wrote:

> That seems to be a bug.
> Are you using hive trunk or any release?
> 
> 
> On 2/5/10, Bennie Schut <bs...@ebuddy.com> wrote:
>> I have a tab separated files I have loaded it with "load data inpath"
>> then I do a
>> 
>> SET hive.exec.compress.output=true;
>> SET mapred.output.compression.codec=com.hadoop.compression.lzo.LzoCodec;
>> SET mapred.map.output.compression.codec=com.hadoop.compression.lzo.LzoCodec;
>> select distinct login_cldr_id as cldr_id from chatsessions_load;
>> 
>> Ended Job = job_201001151039_1641
>> OK
>> NULL
>> NULL
>> NULL
>> Time taken: 49.06 seconds
>> 
>> however if I start it without the set commands I get this:
>> Ended Job = job_201001151039_1642
>> OK
>> 2283
>> Time taken: 45.308 seconds
>> 
>> Which is the correct result.
>> 
>> When I do a "insert overwrite" on a rcfile table it will actually
>> compress the data correctly.
>> When I disable compression and query this new table the result is correct.
>> When I enable compression it's wrong again.
>> I see no errors in the logs.
>> 
>> Any idea's why this might happen?
>> 
>> 
>> 



Re: LZO Compression on trunk

Posted by Zheng Shao <zs...@gmail.com>.
That seems to be a bug.
Are you using hive trunk or any release?


On 2/5/10, Bennie Schut <bs...@ebuddy.com> wrote:
> I have a tab separated files I have loaded it with "load data inpath"
> then I do a
>
> SET hive.exec.compress.output=true;
> SET mapred.output.compression.codec=com.hadoop.compression.lzo.LzoCodec;
> SET mapred.map.output.compression.codec=com.hadoop.compression.lzo.LzoCodec;
> select distinct login_cldr_id as cldr_id from chatsessions_load;
>
> Ended Job = job_201001151039_1641
> OK
> NULL
> NULL
> NULL
> Time taken: 49.06 seconds
>
> however if I start it without the set commands I get this:
> Ended Job = job_201001151039_1642
> OK
> 2283
> Time taken: 45.308 seconds
>
> Which is the correct result.
>
> When I do a "insert overwrite" on a rcfile table it will actually
> compress the data correctly.
> When I disable compression and query this new table the result is correct.
> When I enable compression it's wrong again.
> I see no errors in the logs.
>
> Any idea's why this might happen?
>
>
>

-- 
Sent from my mobile device

Yours,
Zheng