You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by shangan <sh...@corp.kaixin001.com> on 2010/08/25 06:19:28 UTC

file format in .tar.gz not work

my file was stored in .tar.gz format in hadoop,I can do "select count(1)" but when I run a more complex sql "select uid,sum(num) as num from log group by uid  order by num desc limit 500; " it just stuck there with map 0% and reduce 0% and I check the log from  tasknode:
2010-08-25 11:32:57,366 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201008201644_0082_m_000000_0 8.8814067E-10% hdfs://vm125:9000/user/hive/warehouse/log/pdt=20100823/operation.tar.gz:0+767063934

the above log was printed out repeatedly, can anyone explain it ? Does it have anything to do with the compression format as it seems to be ok when I store the data without compression.

2010-08-25 



shangan