You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Bennie Schut (JIRA)" <ji...@apache.org> on 2010/02/09 13:28:28 UTC
[jira] Resolved: (HIVE-1138) Hive using lzo comporession returns
unexpected results.
[ https://issues.apache.org/jira/browse/HIVE-1138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Bennie Schut resolved HIVE-1138.
--------------------------------
Resolution: Not A Problem
Assignee: Bennie Schut
Ah a clear case of rtfm
The codec needs to be in the list of codecs like this:
{noformat}
<property>
<name>io.compression.codecs</name>
<value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec,org.apache.hadoop.io.compress.BZip2Codec</value>
</property>
{noformat}
So this is a configuration mistake and not a bug in hive.
I just wouldn't have expected this behavior since it seems to work a little bit.
Hopefully someone else can learn from my mistake ;-)
Thanks Zheng and He for the support on this.
> Hive using lzo comporession returns unexpected results.
> -------------------------------------------------------
>
> Key: HIVE-1138
> URL: https://issues.apache.org/jira/browse/HIVE-1138
> Project: Hadoop Hive
> Issue Type: Bug
> Components: Query Processor
> Affects Versions: 0.6.0
> Environment: hadoop 0.20.1, hive trunk 2010-02-03
> Reporter: Bennie Schut
> Assignee: Bennie Schut
> Priority: Blocker
> Attachments: test.csv
>
>
> I have a tab separated files I have loaded it with "load data inpath" then I do a
> SET hive.exec.compress.output=true;
> SET mapred.output.compression.codec=com.hadoop.compression.lzo.LzoCodec;
> SET mapred.map.output.compression.codec=com.hadoop.compression.lzo.LzoCodec;
> select distinct login_cldr_id as cldr_id from chatsessions_load;
> Ended Job = job_201001151039_1641
> OK
> NULL
> NULL
> NULL
> Time taken: 49.06 seconds
> however if I start it without the set commands I get this:
> Ended Job = job_201001151039_1642
> OK
> 2283
> Time taken: 45.308 seconds
> Which is the correct result.
> When I do a "insert overwrite" on a rcfile table it will actually compress the data correctly.
> When I disable compression and query this new table the result is correct.
> When I enable compression it's wrong again.
> I see no errors in the logs.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.