You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by Saurabh Nanda <sa...@gmail.com> on 2009/08/13 06:14:21 UTC

Output compression not working on hive-trunk (r802989)

I migrated from Hive-0.30 to Hive-trunk (r802989 compiled against Hadoop
0.18.3), copied over metastore_db & the conf directory. Output compression
used to work with my earlier Hive installation, but it seems to have stopped
working now. Are the configuration parameters different from Hive-0.3?

"set -v" on Hive-trunk throws up the following relevant configuration
parameters:

mapred.output.compress=false
hive.exec.compress.intermediate=false
hive.exec.compress.output=true
mapred.output.compression.type=BLOCK
mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec
mapred.map.output.compression.codec=org.apache.hadoop.io.compress.DefaultCodec
io.seqfile.compress.blocksize=1000000
io.seqfile.lazydecompress=true
mapred.compress.map.output=false
io.compression.codecs=org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec

What am I missing?

Saurabh.
-- 
http://nandz.blogspot.com
http://foodieforlife.blogspot.com

Re: Output compression not working on hive-trunk (r802989)

Posted by Saurabh Nanda <sa...@gmail.com>.

Strangely, when I'm using JDBC all new data/partitions are compressed.
However, when I'm using CLI no matter what I do everything is uncompressed.

Saurabh.

On Thu, Aug 13, 2009 at 11:22 AM, Saurabh Nanda <sa...@gmail.com>wrote:

> I've even tried setting "mapred.output.compress=true" in hadoop-site.xml
> and restarting the cluster but in vain.
>
> How do I get compression to work in Hive-trunk? Is it something to do with
> the Hive query as well. Here's what I'm trying:
>
> from raw_compressed
>     insert overwrite table raw partition (dt='2009-04-02')
>     select transform(line) using 'parse_logs.rb' as ip_address, aid, uid,
> ts, method, uri, response, referer, user_agent, cookies, ptime
>
> Saurabh.
>
>
> On Thu, Aug 13, 2009 at 9:44 AM, Saurabh Nanda <sa...@gmail.com>wrote:
>
>> I migrated from Hive-0.30 to Hive-trunk (r802989 compiled against Hadoop
>> 0.18.3), copied over metastore_db & the conf directory. Output compression
>> used to work with my earlier Hive installation, but it seems to have stopped
>> working now. Are the configuration parameters different from Hive-0.3?
>>
>> "set -v" on Hive-trunk throws up the following relevant configuration
>> parameters:
>>
>> mapred.output.compress=false
>> hive.exec.compress.intermediate=false
>> hive.exec.compress.output=true
>> mapred.output.compression.type=BLOCK
>> mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec
>>
>> mapred.map.output.compression.codec=org.apache.hadoop.io.compress.DefaultCodec
>> io.seqfile.compress.blocksize=1000000
>> io.seqfile.lazydecompress=true
>> mapred.compress.map.output=false
>>
>> io.compression.codecs=org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec
>>
>> What am I missing?
>>
>> Saurabh.
>> --
>> http://nandz.blogspot.com
>> http://foodieforlife.blogspot.com
>>
>
>
>
> --
> http://nandz.blogspot.com
> http://foodieforlife.blogspot.com
>



-- 
http://nandz.blogspot.com
http://foodieforlife.blogspot.com

Re: Output compression not working on hive-trunk (r802989)

Posted by Saurabh Nanda <sa...@gmail.com>.

Any clues anyone?

On Mon, Aug 17, 2009 at 5:47 PM, Saurabh Nanda <sa...@gmail.com>wrote:

> Hey Zheng, any clues as to what the bug is? Or what I'm doing wrong? I can
> do some more digging and logging if required.
>
> Saurabh.
>
>
> On Mon, Aug 17, 2009 at 1:28 PM, Saurabh Nanda <sa...@gmail.com>wrote:
>
>> Here's the log output:
>>
>> 2009-08-17 13:26:42,183 INFO  parse.SemanticAnalyzer
>> (SemanticAnalyzer.java:genFileSinkPlan(2575)) - Created FileSink Plan for
>> clause: insclause-0dest_path:
>> hdfs://master-hadoop/user/hive/warehouse/raw/dt=2009-04-07 row schema:
>> {(_col0,_col0: string)(_col1,_col1: string)(_col2,_col2:
>> string)(_col3,_col3: string)(_col4,_col4: string)(_col5,_col5:
>> string)(_col6,_col6: string)(_col7,_col7: string)(_col8,_col8:
>> string)(_col9,_col9: string)(_col10,_col10: int)} .
>> HiveConf.ConfVars.COMPRESSRESULT=true
>>
>> Is the SemanticAnalyszer run more than once in the lifetime of a job?
>> Should I be looking for another log entry like this one?
>>
>> Saurabh.
>>
>>
>> On Mon, Aug 17, 2009 at 1:26 PM, Saurabh Nanda <sa...@gmail.com>wrote:
>>
>>> Strange. The compression configuration log entry was also info but I
>>> could see it in the task logs:
>>>
>>>       LOG.info("Compression configuration is:" + isCompressed);
>>>
>>> Saurabh.
>>>
>>> On Mon, Aug 17, 2009 at 12:56 PM, Zheng Shao <zs...@gmail.com> wrote:
>>>
>>>> The default log level is WARN. Please change it to INFO.
>>>>
>>>> hive.root.logger=INFO,DRFA
>>>>
>>>> Of course you can also use LOG.warn() in your test code.
>>>>
>>>> Zheng
>>>>
>>>> --
>>> http://nandz.blogspot.com
>>> http://foodieforlife.blogspot.com
>>>
>>
>>
>>
>> --
>> http://nandz.blogspot.com
>> http://foodieforlife.blogspot.com
>>
>
>
>
> --
> http://nandz.blogspot.com
> http://foodieforlife.blogspot.com
>



-- 
http://nandz.blogspot.com
http://foodieforlife.blogspot.com

Re: Output compression not working on hive-trunk (r802989)

Posted by Saurabh Nanda <sa...@gmail.com>.

Hey Zheng, thanks for fixing the issue. I've commented on
https://issues.apache.org/jira/browse/HIVE-794 with the results of applying
the change. Do you really need a patch for a one line change?

Saurabh.

On Tue, Aug 25, 2009 at 1:53 PM, Zheng Shao <zs...@gmail.com> wrote:

> Hi Saurabh,
>
> Finally I found the line of code. See
> https://issues.apache.org/jira/browse/HIVE-794 for details.
> Can you help make a patch for that?
>
> Zheng
>
>
> On Tue, Aug 25, 2009 at 12:19 AM, Saurabh Nanda <sa...@gmail.com>wrote:
>
>> Hi Zheng,
>>
>> Here's the plan for the second map-reduce job --
>> http://pastebin.com/m59d5a84b
>> I don't see compression anywhere.
>>
>> Saurabh.
>>
>>
>> On Fri, Aug 21, 2009 at 11:30 AM, Zheng Shao <zs...@gmail.com> wrote:
>>
>>> Hi Suarabh,
>>>
>>> Sorry for the delay on this. We are busy with the production this week.
>>>
>>> I don't think there is much difference in CLI queries and JDBC queries.
>>>
>>> Yes, this is what I am talking about. Since your query has 2
>>> map-reduce jobs, there will be two .xml files.
>>> Can you show us the second one? Does the second one also contains
>>> "<...>compressed<...>true<...>" in the section of FileSinkOperator?
>>>
>>> Zheng
>>>
>>> On Tue, Aug 18, 2009 at 3:21 AM, Saurabh Nanda<sa...@gmail.com>
>>> wrote:
>>> > Is this what you're talking about -- http://pastebin.ca/1533627 ?
>>> Seems like
>>> > compression is on.
>>> >
>>> > Is there any difference in how CLI queries and JDBC queries are
>>> treated?
>>> >
>>> > Saurabh.
>>> >
>>> > On Tue, Aug 18, 2009 at 11:19 AM, Zheng Shao <zs...@gmail.com> wrote:
>>> >>
>>> >> Hi Saurabh,
>>> >>
>>> >> So the compression flag is correct when the plan is generated.
>>> >> When you run the query, you should see "plan = xxx.xml" in the log
>>> >> file. Can you open that file (in HDFS) and see whether the compression
>>> >> flag is on or not?
>>> >>
>>> >> Zheng
>>> >>
>>> >> On Mon, Aug 17, 2009 at 5:17 AM, Saurabh Nanda<saurabhnanda@gmail.com
>>> >
>>> >> wrote:
>>> >> > Hey Zheng, any clues as to what the bug is? Or what I'm doing wrong?
>>> I
>>> >> > can
>>> >> > do some more digging and logging if required.
>>> >> >
>>> >> > Saurabh.
>>> >> >
>>> >> > On Mon, Aug 17, 2009 at 1:28 PM, Saurabh Nanda <
>>> saurabhnanda@gmail.com>
>>> >> > wrote:
>>> >> >>
>>> >> >> Here's the log output:
>>> >> >>
>>> >> >> 2009-08-17 13:26:42,183 INFO  parse.SemanticAnalyzer
>>> >> >> (SemanticAnalyzer.java:genFileSinkPlan(2575)) - Created FileSink
>>> Plan
>>> >> >> for
>>> >> >> clause: insclause-0dest_path:
>>> >> >> hdfs://master-hadoop/user/hive/warehouse/raw/dt=2009-04-07 row
>>> schema:
>>> >> >> {(_col0,_col0: string)(_col1,_col1: string)(_col2,_col2:
>>> >> >> string)(_col3,_col3: string)(_col4,_col4: string)(_col5,_col5:
>>> >> >> string)(_col6,_col6: string)(_col7,_col7: string)(_col8,_col8:
>>> >> >> string)(_col9,_col9: string)(_col10,_col10: int)} .
>>> >> >> HiveConf.ConfVars.COMPRESSRESULT=true
>>> >> >>
>>> >> >> Is the SemanticAnalyszer run more than once in the lifetime of a
>>> job?
>>> >> >> Should I be looking for another log entry like this one?
>>> >> >>
>>> >> >> Saurabh.
>>> >> >>
>>> >> >> On Mon, Aug 17, 2009 at 1:26 PM, Saurabh Nanda <
>>> saurabhnanda@gmail.com>
>>> >> >> wrote:
>>> >> >>>
>>> >> >>> Strange. The compression configuration log entry was also info but
>>> I
>>> >> >>> could see it in the task logs:
>>> >> >>>
>>> >> >>>       LOG.info("Compression configuration is:" + isCompressed);
>>> >> >>>
>>> >> >>> Saurabh.
>>> >> >>>
>>> >> >>> On Mon, Aug 17, 2009 at 12:56 PM, Zheng Shao <zs...@gmail.com>
>>> wrote:
>>> >> >>>>
>>> >> >>>> The default log level is WARN. Please change it to INFO.
>>> >> >>>>
>>> >> >>>> hive.root.logger=INFO,DRFA
>>> >> >>>>
>>> >> >>>> Of course you can also use LOG.warn() in your test code.
>>> >> >>>>
>>> >> >>>> Zheng
>>> >> >>>>
>>> >> >>> --
>>> >> >>> http://nandz.blogspot.com
>>> >> >>> http://foodieforlife.blogspot.com
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> --
>>> >> >> http://nandz.blogspot.com
>>> >> >> http://foodieforlife.blogspot.com
>>> >> >
>>> >> >
>>> >> >
>>> >> > --
>>> >> > http://nandz.blogspot.com
>>> >> > http://foodieforlife.blogspot.com
>>> >> >
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Yours,
>>> >> Zheng
>>> >
>>> >
>>> >
>>> > --
>>> > http://nandz.blogspot.com
>>> > http://foodieforlife.blogspot.com
>>> >
>>>
>>>
>>>
>>> --
>>> Yours,
>>> Zheng
>>>
>>
>>
>>
>> --
>> http://nandz.blogspot.com
>> http://foodieforlife.blogspot.com
>>
>
>
>
> --
> Yours,
> Zheng
>



-- 
http://nandz.blogspot.com
http://foodieforlife.blogspot.com

Re: Output compression not working on hive-trunk (r802989)

Posted by Zheng Shao <zs...@gmail.com>.

Hi Saurabh,

Finally I found the line of code. See
https://issues.apache.org/jira/browse/HIVE-794 for details.
Can you help make a patch for that?

Zheng

On Tue, Aug 25, 2009 at 12:19 AM, Saurabh Nanda <sa...@gmail.com>wrote:

> Hi Zheng,
>
> Here's the plan for the second map-reduce job --
> http://pastebin.com/m59d5a84b
> I don't see compression anywhere.
>
> Saurabh.
>
>
> On Fri, Aug 21, 2009 at 11:30 AM, Zheng Shao <zs...@gmail.com> wrote:
>
>> Hi Suarabh,
>>
>> Sorry for the delay on this. We are busy with the production this week.
>>
>> I don't think there is much difference in CLI queries and JDBC queries.
>>
>> Yes, this is what I am talking about. Since your query has 2
>> map-reduce jobs, there will be two .xml files.
>> Can you show us the second one? Does the second one also contains
>> "<...>compressed<...>true<...>" in the section of FileSinkOperator?
>>
>> Zheng
>>
>> On Tue, Aug 18, 2009 at 3:21 AM, Saurabh Nanda<sa...@gmail.com>
>> wrote:
>> > Is this what you're talking about -- http://pastebin.ca/1533627 ? Seems
>> like
>> > compression is on.
>> >
>> > Is there any difference in how CLI queries and JDBC queries are treated?
>> >
>> > Saurabh.
>> >
>> > On Tue, Aug 18, 2009 at 11:19 AM, Zheng Shao <zs...@gmail.com> wrote:
>> >>
>> >> Hi Saurabh,
>> >>
>> >> So the compression flag is correct when the plan is generated.
>> >> When you run the query, you should see "plan = xxx.xml" in the log
>> >> file. Can you open that file (in HDFS) and see whether the compression
>> >> flag is on or not?
>> >>
>> >> Zheng
>> >>
>> >> On Mon, Aug 17, 2009 at 5:17 AM, Saurabh Nanda<sa...@gmail.com>
>> >> wrote:
>> >> > Hey Zheng, any clues as to what the bug is? Or what I'm doing wrong?
>> I
>> >> > can
>> >> > do some more digging and logging if required.
>> >> >
>> >> > Saurabh.
>> >> >
>> >> > On Mon, Aug 17, 2009 at 1:28 PM, Saurabh Nanda <
>> saurabhnanda@gmail.com>
>> >> > wrote:
>> >> >>
>> >> >> Here's the log output:
>> >> >>
>> >> >> 2009-08-17 13:26:42,183 INFO  parse.SemanticAnalyzer
>> >> >> (SemanticAnalyzer.java:genFileSinkPlan(2575)) - Created FileSink
>> Plan
>> >> >> for
>> >> >> clause: insclause-0dest_path:
>> >> >> hdfs://master-hadoop/user/hive/warehouse/raw/dt=2009-04-07 row
>> schema:
>> >> >> {(_col0,_col0: string)(_col1,_col1: string)(_col2,_col2:
>> >> >> string)(_col3,_col3: string)(_col4,_col4: string)(_col5,_col5:
>> >> >> string)(_col6,_col6: string)(_col7,_col7: string)(_col8,_col8:
>> >> >> string)(_col9,_col9: string)(_col10,_col10: int)} .
>> >> >> HiveConf.ConfVars.COMPRESSRESULT=true
>> >> >>
>> >> >> Is the SemanticAnalyszer run more than once in the lifetime of a
>> job?
>> >> >> Should I be looking for another log entry like this one?
>> >> >>
>> >> >> Saurabh.
>> >> >>
>> >> >> On Mon, Aug 17, 2009 at 1:26 PM, Saurabh Nanda <
>> saurabhnanda@gmail.com>
>> >> >> wrote:
>> >> >>>
>> >> >>> Strange. The compression configuration log entry was also info but
>> I
>> >> >>> could see it in the task logs:
>> >> >>>
>> >> >>>       LOG.info("Compression configuration is:" + isCompressed);
>> >> >>>
>> >> >>> Saurabh.
>> >> >>>
>> >> >>> On Mon, Aug 17, 2009 at 12:56 PM, Zheng Shao <zs...@gmail.com>
>> wrote:
>> >> >>>>
>> >> >>>> The default log level is WARN. Please change it to INFO.
>> >> >>>>
>> >> >>>> hive.root.logger=INFO,DRFA
>> >> >>>>
>> >> >>>> Of course you can also use LOG.warn() in your test code.
>> >> >>>>
>> >> >>>> Zheng
>> >> >>>>
>> >> >>> --
>> >> >>> http://nandz.blogspot.com
>> >> >>> http://foodieforlife.blogspot.com
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> http://nandz.blogspot.com
>> >> >> http://foodieforlife.blogspot.com
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > http://nandz.blogspot.com
>> >> > http://foodieforlife.blogspot.com
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Yours,
>> >> Zheng
>> >
>> >
>> >
>> > --
>> > http://nandz.blogspot.com
>> > http://foodieforlife.blogspot.com
>> >
>>
>>
>>
>> --
>> Yours,
>> Zheng
>>
>
>
>
> --
> http://nandz.blogspot.com
> http://foodieforlife.blogspot.com
>



-- 
Yours,
Zheng

Re: Output compression not working on hive-trunk (r802989)

Posted by Saurabh Nanda <sa...@gmail.com>.

Hi Zheng,

Here's the plan for the second map-reduce job --
http://pastebin.com/m59d5a84b
I don't see compression anywhere.

Saurabh.

On Fri, Aug 21, 2009 at 11:30 AM, Zheng Shao <zs...@gmail.com> wrote:

> Hi Suarabh,
>
> Sorry for the delay on this. We are busy with the production this week.
>
> I don't think there is much difference in CLI queries and JDBC queries.
>
> Yes, this is what I am talking about. Since your query has 2
> map-reduce jobs, there will be two .xml files.
> Can you show us the second one? Does the second one also contains
> "<...>compressed<...>true<...>" in the section of FileSinkOperator?
>
> Zheng
>
> On Tue, Aug 18, 2009 at 3:21 AM, Saurabh Nanda<sa...@gmail.com>
> wrote:
> > Is this what you're talking about -- http://pastebin.ca/1533627 ? Seems
> like
> > compression is on.
> >
> > Is there any difference in how CLI queries and JDBC queries are treated?
> >
> > Saurabh.
> >
> > On Tue, Aug 18, 2009 at 11:19 AM, Zheng Shao <zs...@gmail.com> wrote:
> >>
> >> Hi Saurabh,
> >>
> >> So the compression flag is correct when the plan is generated.
> >> When you run the query, you should see "plan = xxx.xml" in the log
> >> file. Can you open that file (in HDFS) and see whether the compression
> >> flag is on or not?
> >>
> >> Zheng
> >>
> >> On Mon, Aug 17, 2009 at 5:17 AM, Saurabh Nanda<sa...@gmail.com>
> >> wrote:
> >> > Hey Zheng, any clues as to what the bug is? Or what I'm doing wrong? I
> >> > can
> >> > do some more digging and logging if required.
> >> >
> >> > Saurabh.
> >> >
> >> > On Mon, Aug 17, 2009 at 1:28 PM, Saurabh Nanda <
> saurabhnanda@gmail.com>
> >> > wrote:
> >> >>
> >> >> Here's the log output:
> >> >>
> >> >> 2009-08-17 13:26:42,183 INFO  parse.SemanticAnalyzer
> >> >> (SemanticAnalyzer.java:genFileSinkPlan(2575)) - Created FileSink Plan
> >> >> for
> >> >> clause: insclause-0dest_path:
> >> >> hdfs://master-hadoop/user/hive/warehouse/raw/dt=2009-04-07 row
> schema:
> >> >> {(_col0,_col0: string)(_col1,_col1: string)(_col2,_col2:
> >> >> string)(_col3,_col3: string)(_col4,_col4: string)(_col5,_col5:
> >> >> string)(_col6,_col6: string)(_col7,_col7: string)(_col8,_col8:
> >> >> string)(_col9,_col9: string)(_col10,_col10: int)} .
> >> >> HiveConf.ConfVars.COMPRESSRESULT=true
> >> >>
> >> >> Is the SemanticAnalyszer run more than once in the lifetime of a job?
> >> >> Should I be looking for another log entry like this one?
> >> >>
> >> >> Saurabh.
> >> >>
> >> >> On Mon, Aug 17, 2009 at 1:26 PM, Saurabh Nanda <
> saurabhnanda@gmail.com>
> >> >> wrote:
> >> >>>
> >> >>> Strange. The compression configuration log entry was also info but I
> >> >>> could see it in the task logs:
> >> >>>
> >> >>>       LOG.info("Compression configuration is:" + isCompressed);
> >> >>>
> >> >>> Saurabh.
> >> >>>
> >> >>> On Mon, Aug 17, 2009 at 12:56 PM, Zheng Shao <zs...@gmail.com>
> wrote:
> >> >>>>
> >> >>>> The default log level is WARN. Please change it to INFO.
> >> >>>>
> >> >>>> hive.root.logger=INFO,DRFA
> >> >>>>
> >> >>>> Of course you can also use LOG.warn() in your test code.
> >> >>>>
> >> >>>> Zheng
> >> >>>>
> >> >>> --
> >> >>> http://nandz.blogspot.com
> >> >>> http://foodieforlife.blogspot.com
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> http://nandz.blogspot.com
> >> >> http://foodieforlife.blogspot.com
> >> >
> >> >
> >> >
> >> > --
> >> > http://nandz.blogspot.com
> >> > http://foodieforlife.blogspot.com
> >> >
> >>
> >>
> >>
> >> --
> >> Yours,
> >> Zheng
> >
> >
> >
> > --
> > http://nandz.blogspot.com
> > http://foodieforlife.blogspot.com
> >
>
>
>
> --
> Yours,
> Zheng
>



-- 
http://nandz.blogspot.com
http://foodieforlife.blogspot.com

Re: Output compression not working on hive-trunk (r802989)

Posted by Zheng Shao <zs...@gmail.com>.

Hi Suarabh,

Sorry for the delay on this. We are busy with the production this week.

I don't think there is much difference in CLI queries and JDBC queries.

Yes, this is what I am talking about. Since your query has 2
map-reduce jobs, there will be two .xml files.
Can you show us the second one? Does the second one also contains
"<...>compressed<...>true<...>" in the section of FileSinkOperator?

Zheng

On Tue, Aug 18, 2009 at 3:21 AM, Saurabh Nanda<sa...@gmail.com> wrote:
> Is this what you're talking about -- http://pastebin.ca/1533627 ? Seems like
> compression is on.
>
> Is there any difference in how CLI queries and JDBC queries are treated?
>
> Saurabh.
>
> On Tue, Aug 18, 2009 at 11:19 AM, Zheng Shao <zs...@gmail.com> wrote:
>>
>> Hi Saurabh,
>>
>> So the compression flag is correct when the plan is generated.
>> When you run the query, you should see "plan = xxx.xml" in the log
>> file. Can you open that file (in HDFS) and see whether the compression
>> flag is on or not?
>>
>> Zheng
>>
>> On Mon, Aug 17, 2009 at 5:17 AM, Saurabh Nanda<sa...@gmail.com>
>> wrote:
>> > Hey Zheng, any clues as to what the bug is? Or what I'm doing wrong? I
>> > can
>> > do some more digging and logging if required.
>> >
>> > Saurabh.
>> >
>> > On Mon, Aug 17, 2009 at 1:28 PM, Saurabh Nanda <sa...@gmail.com>
>> > wrote:
>> >>
>> >> Here's the log output:
>> >>
>> >> 2009-08-17 13:26:42,183 INFO  parse.SemanticAnalyzer
>> >> (SemanticAnalyzer.java:genFileSinkPlan(2575)) - Created FileSink Plan
>> >> for
>> >> clause: insclause-0dest_path:
>> >> hdfs://master-hadoop/user/hive/warehouse/raw/dt=2009-04-07 row schema:
>> >> {(_col0,_col0: string)(_col1,_col1: string)(_col2,_col2:
>> >> string)(_col3,_col3: string)(_col4,_col4: string)(_col5,_col5:
>> >> string)(_col6,_col6: string)(_col7,_col7: string)(_col8,_col8:
>> >> string)(_col9,_col9: string)(_col10,_col10: int)} .
>> >> HiveConf.ConfVars.COMPRESSRESULT=true
>> >>
>> >> Is the SemanticAnalyszer run more than once in the lifetime of a job?
>> >> Should I be looking for another log entry like this one?
>> >>
>> >> Saurabh.
>> >>
>> >> On Mon, Aug 17, 2009 at 1:26 PM, Saurabh Nanda <sa...@gmail.com>
>> >> wrote:
>> >>>
>> >>> Strange. The compression configuration log entry was also info but I
>> >>> could see it in the task logs:
>> >>>
>> >>>       LOG.info("Compression configuration is:" + isCompressed);
>> >>>
>> >>> Saurabh.
>> >>>
>> >>> On Mon, Aug 17, 2009 at 12:56 PM, Zheng Shao <zs...@gmail.com> wrote:
>> >>>>
>> >>>> The default log level is WARN. Please change it to INFO.
>> >>>>
>> >>>> hive.root.logger=INFO,DRFA
>> >>>>
>> >>>> Of course you can also use LOG.warn() in your test code.
>> >>>>
>> >>>> Zheng
>> >>>>
>> >>> --
>> >>> http://nandz.blogspot.com
>> >>> http://foodieforlife.blogspot.com
>> >>
>> >>
>> >>
>> >> --
>> >> http://nandz.blogspot.com
>> >> http://foodieforlife.blogspot.com
>> >
>> >
>> >
>> > --
>> > http://nandz.blogspot.com
>> > http://foodieforlife.blogspot.com
>> >
>>
>>
>>
>> --
>> Yours,
>> Zheng
>
>
>
> --
> http://nandz.blogspot.com
> http://foodieforlife.blogspot.com
>



-- 
Yours,
Zheng

Re: Output compression not working on hive-trunk (r802989)

Posted by Saurabh Nanda <sa...@gmail.com>.

How do i do that? Just steps to reproduce the error or a formal java
based test case?

On 8/21/09, Ashish Thusoo <at...@facebook.com> wrote:
> Hi Saurabh,
>
> Can you give a simple reproducible test case for this (unless you have
> already done so) ?
>
> Thanks,
> Ashish
>
> ________________________________
> From: Saurabh Nanda [mailto:saurabhnanda@gmail.com]
> Sent: Thursday, August 20, 2009 12:43 PM
> To: hive-user@hadoop.apache.org
> Subject: Re: Output compression not working on hive-trunk (r802989)
>
> Is anyone else facing this issue?
>
> On Wed, Aug 19, 2009 at 11:16 AM, Saurabh Nanda
> <sa...@gmail.com>> wrote:
> Any clue? Has anyone else tried to replicate this? Is this really a bug or
> am I doing something obviously stupid?
>
> Saurabh.
>
>
> On Tue, Aug 18, 2009 at 3:51 PM, Saurabh Nanda
> <sa...@gmail.com>> wrote:
> Is this what you're talking about -- http://pastebin.ca/1533627 ? Seems like
> compression is on.
>
> Is there any difference in how CLI queries and JDBC queries are treated?
>
> Saurabh.
>
>
> On Tue, Aug 18, 2009 at 11:19 AM, Zheng Shao
> <zs...@gmail.com>> wrote:
> Hi Saurabh,
>
> So the compression flag is correct when the plan is generated.
> When you run the query, you should see "plan = xxx.xml" in the log
> file. Can you open that file (in HDFS) and see whether the compression
> flag is on or not?
>
> Zheng
>
> On Mon, Aug 17, 2009 at 5:17 AM, Saurabh
> Nanda<sa...@gmail.com>> wrote:
>> Hey Zheng, any clues as to what the bug is? Or what I'm doing wrong? I can
>> do some more digging and logging if required.
>>
>> Saurabh.
>>
>> On Mon, Aug 17, 2009 at 1:28 PM, Saurabh Nanda
>> <sa...@gmail.com>>
>> wrote:
>>>
>>> Here's the log output:
>>>
>>> 2009-08-17 13:26:42,183 INFO  parse.SemanticAnalyzer
>>> (SemanticAnalyzer.java:genFileSinkPlan(2575)) - Created FileSink Plan for
>>> clause: insclause-0dest_path:
>>> hdfs://master-hadoop/user/hive/warehouse/raw/dt=2009-04-07 row schema:
>>> {(_col0,_col0: string)(_col1,_col1: string)(_col2,_col2:
>>> string)(_col3,_col3: string)(_col4,_col4: string)(_col5,_col5:
>>> string)(_col6,_col6: string)(_col7,_col7: string)(_col8,_col8:
>>> string)(_col9,_col9: string)(_col10,_col10: int)} .
>>> HiveConf.ConfVars.COMPRESSRESULT=true
>>>
>>> Is the SemanticAnalyszer run more than once in the lifetime of a job?
>>> Should I be looking for another log entry like this one?
>>>
>>> Saurabh.
>>>
>>> On Mon, Aug 17, 2009 at 1:26 PM, Saurabh Nanda
>>> <sa...@gmail.com>>
>>> wrote:
>>>>
>>>> Strange. The compression configuration log entry was also info but I
>>>> could see it in the task logs:
>>>>
>>>>       LOG.info("Compression configuration is:" + isCompressed);
>>>>
>>>> Saurabh.
>>>>
>>>> On Mon, Aug 17, 2009 at 12:56 PM, Zheng Shao
>>>> <zs...@gmail.com>> wrote:
>>>>>
>>>>> The default log level is WARN. Please change it to INFO.
>>>>>
>>>>> hive.root.logger=INFO,DRFA
>>>>>
>>>>> Of course you can also use LOG.warn() in your test code.
>>>>>
>>>>> Zheng
>>>>>
>>>> --
>>>> http://nandz.blogspot.com
>>>> http://foodieforlife.blogspot.com
>>>
>>>
>>>
>>> --
>>> http://nandz.blogspot.com
>>> http://foodieforlife.blogspot.com
>>
>>
>>
>> --
>> http://nandz.blogspot.com
>> http://foodieforlife.blogspot.com
>>
>
>
>
> --
> Yours,
> Zheng
>
>
>
> --
> http://nandz.blogspot.com
> http://foodieforlife.blogspot.com
>
>
>
> --
> http://nandz.blogspot.com
> http://foodieforlife.blogspot.com
>
>
>
> --
> http://nandz.blogspot.com
> http://foodieforlife.blogspot.com
>


-- 
http://nandz.blogspot.com
http://foodieforlife.blogspot.com

RE: Output compression not working on hive-trunk (r802989)

Posted by Ashish Thusoo <at...@facebook.com>.

Hi Saurabh,

Can you give a simple reproducible test case for this (unless you have already done so) ?

Thanks,
Ashish

________________________________
From: Saurabh Nanda [mailto:saurabhnanda@gmail.com]
Sent: Thursday, August 20, 2009 12:43 PM
To: hive-user@hadoop.apache.org
Subject: Re: Output compression not working on hive-trunk (r802989)

Is anyone else facing this issue?

On Wed, Aug 19, 2009 at 11:16 AM, Saurabh Nanda <sa...@gmail.com>> wrote:
Any clue? Has anyone else tried to replicate this? Is this really a bug or am I doing something obviously stupid?

Saurabh.


On Tue, Aug 18, 2009 at 3:51 PM, Saurabh Nanda <sa...@gmail.com>> wrote:
Is this what you're talking about -- http://pastebin.ca/1533627 ? Seems like compression is on.

Is there any difference in how CLI queries and JDBC queries are treated?

Saurabh.


On Tue, Aug 18, 2009 at 11:19 AM, Zheng Shao <zs...@gmail.com>> wrote:
Hi Saurabh,

So the compression flag is correct when the plan is generated.
When you run the query, you should see "plan = xxx.xml" in the log
file. Can you open that file (in HDFS) and see whether the compression
flag is on or not?

Zheng

On Mon, Aug 17, 2009 at 5:17 AM, Saurabh Nanda<sa...@gmail.com>> wrote:
> Hey Zheng, any clues as to what the bug is? Or what I'm doing wrong? I can
> do some more digging and logging if required.
>
> Saurabh.
>
> On Mon, Aug 17, 2009 at 1:28 PM, Saurabh Nanda <sa...@gmail.com>>
> wrote:
>>
>> Here's the log output:
>>
>> 2009-08-17 13:26:42,183 INFO  parse.SemanticAnalyzer
>> (SemanticAnalyzer.java:genFileSinkPlan(2575)) - Created FileSink Plan for
>> clause: insclause-0dest_path:
>> hdfs://master-hadoop/user/hive/warehouse/raw/dt=2009-04-07 row schema:
>> {(_col0,_col0: string)(_col1,_col1: string)(_col2,_col2:
>> string)(_col3,_col3: string)(_col4,_col4: string)(_col5,_col5:
>> string)(_col6,_col6: string)(_col7,_col7: string)(_col8,_col8:
>> string)(_col9,_col9: string)(_col10,_col10: int)} .
>> HiveConf.ConfVars.COMPRESSRESULT=true
>>
>> Is the SemanticAnalyszer run more than once in the lifetime of a job?
>> Should I be looking for another log entry like this one?
>>
>> Saurabh.
>>
>> On Mon, Aug 17, 2009 at 1:26 PM, Saurabh Nanda <sa...@gmail.com>>
>> wrote:
>>>
>>> Strange. The compression configuration log entry was also info but I
>>> could see it in the task logs:
>>>
>>>       LOG.info("Compression configuration is:" + isCompressed);
>>>
>>> Saurabh.
>>>
>>> On Mon, Aug 17, 2009 at 12:56 PM, Zheng Shao <zs...@gmail.com>> wrote:
>>>>
>>>> The default log level is WARN. Please change it to INFO.
>>>>
>>>> hive.root.logger=INFO,DRFA
>>>>
>>>> Of course you can also use LOG.warn() in your test code.
>>>>
>>>> Zheng
>>>>
>>> --
>>> http://nandz.blogspot.com
>>> http://foodieforlife.blogspot.com
>>
>>
>>
>> --
>> http://nandz.blogspot.com
>> http://foodieforlife.blogspot.com
>
>
>
> --
> http://nandz.blogspot.com
> http://foodieforlife.blogspot.com
>



--
Yours,
Zheng



--
http://nandz.blogspot.com
http://foodieforlife.blogspot.com



--
http://nandz.blogspot.com
http://foodieforlife.blogspot.com



--
http://nandz.blogspot.com
http://foodieforlife.blogspot.com

Re: Output compression not working on hive-trunk (r802989)

Posted by Saurabh Nanda <sa...@gmail.com>.

Is anyone else facing this issue?

On Wed, Aug 19, 2009 at 11:16 AM, Saurabh Nanda <sa...@gmail.com>wrote:

> Any clue? Has anyone else tried to replicate this? Is this really a bug or
> am I doing something obviously stupid?
>
> Saurabh.
>
>
> On Tue, Aug 18, 2009 at 3:51 PM, Saurabh Nanda <sa...@gmail.com>wrote:
>
>> Is this what you're talking about -- http://pastebin.ca/1533627 ? Seems
>> like compression is on.
>>
>> Is there any difference in how CLI queries and JDBC queries are treated?
>>
>> Saurabh.
>>
>>
>> On Tue, Aug 18, 2009 at 11:19 AM, Zheng Shao <zs...@gmail.com> wrote:
>>
>>> Hi Saurabh,
>>>
>>> So the compression flag is correct when the plan is generated.
>>> When you run the query, you should see "plan = xxx.xml" in the log
>>> file. Can you open that file (in HDFS) and see whether the compression
>>> flag is on or not?
>>>
>>> Zheng
>>>
>>> On Mon, Aug 17, 2009 at 5:17 AM, Saurabh Nanda<sa...@gmail.com>
>>> wrote:
>>> > Hey Zheng, any clues as to what the bug is? Or what I'm doing wrong? I
>>> can
>>> > do some more digging and logging if required.
>>> >
>>> > Saurabh.
>>> >
>>> > On Mon, Aug 17, 2009 at 1:28 PM, Saurabh Nanda <saurabhnanda@gmail.com
>>> >
>>> > wrote:
>>> >>
>>> >> Here's the log output:
>>> >>
>>> >> 2009-08-17 13:26:42,183 INFO  parse.SemanticAnalyzer
>>> >> (SemanticAnalyzer.java:genFileSinkPlan(2575)) - Created FileSink Plan
>>> for
>>> >> clause: insclause-0dest_path:
>>> >> hdfs://master-hadoop/user/hive/warehouse/raw/dt=2009-04-07 row schema:
>>> >> {(_col0,_col0: string)(_col1,_col1: string)(_col2,_col2:
>>> >> string)(_col3,_col3: string)(_col4,_col4: string)(_col5,_col5:
>>> >> string)(_col6,_col6: string)(_col7,_col7: string)(_col8,_col8:
>>> >> string)(_col9,_col9: string)(_col10,_col10: int)} .
>>> >> HiveConf.ConfVars.COMPRESSRESULT=true
>>> >>
>>> >> Is the SemanticAnalyszer run more than once in the lifetime of a job?
>>> >> Should I be looking for another log entry like this one?
>>> >>
>>> >> Saurabh.
>>> >>
>>> >> On Mon, Aug 17, 2009 at 1:26 PM, Saurabh Nanda <
>>> saurabhnanda@gmail.com>
>>> >> wrote:
>>> >>>
>>> >>> Strange. The compression configuration log entry was also info but I
>>> >>> could see it in the task logs:
>>> >>>
>>> >>>       LOG.info("Compression configuration is:" + isCompressed);
>>> >>>
>>> >>> Saurabh.
>>> >>>
>>> >>> On Mon, Aug 17, 2009 at 12:56 PM, Zheng Shao <zs...@gmail.com>
>>> wrote:
>>> >>>>
>>> >>>> The default log level is WARN. Please change it to INFO.
>>> >>>>
>>> >>>> hive.root.logger=INFO,DRFA
>>> >>>>
>>> >>>> Of course you can also use LOG.warn() in your test code.
>>> >>>>
>>> >>>> Zheng
>>> >>>>
>>> >>> --
>>> >>> http://nandz.blogspot.com
>>> >>> http://foodieforlife.blogspot.com
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> http://nandz.blogspot.com
>>> >> http://foodieforlife.blogspot.com
>>> >
>>> >
>>> >
>>> > --
>>> > http://nandz.blogspot.com
>>> > http://foodieforlife.blogspot.com
>>> >
>>>
>>>
>>>
>>> --
>>> Yours,
>>> Zheng
>>>
>>
>>
>>
>> --
>> http://nandz.blogspot.com
>> http://foodieforlife.blogspot.com
>>
>
>
>
> --
> http://nandz.blogspot.com
> http://foodieforlife.blogspot.com
>



-- 
http://nandz.blogspot.com
http://foodieforlife.blogspot.com

Re: Output compression not working on hive-trunk (r802989)

Posted by Saurabh Nanda <sa...@gmail.com>.

Any clue? Has anyone else tried to replicate this? Is this really a bug or
am I doing something obviously stupid?

Saurabh.

On Tue, Aug 18, 2009 at 3:51 PM, Saurabh Nanda <sa...@gmail.com>wrote:

> Is this what you're talking about -- http://pastebin.ca/1533627 ? Seems
> like compression is on.
>
> Is there any difference in how CLI queries and JDBC queries are treated?
>
> Saurabh.
>
>
> On Tue, Aug 18, 2009 at 11:19 AM, Zheng Shao <zs...@gmail.com> wrote:
>
>> Hi Saurabh,
>>
>> So the compression flag is correct when the plan is generated.
>> When you run the query, you should see "plan = xxx.xml" in the log
>> file. Can you open that file (in HDFS) and see whether the compression
>> flag is on or not?
>>
>> Zheng
>>
>> On Mon, Aug 17, 2009 at 5:17 AM, Saurabh Nanda<sa...@gmail.com>
>> wrote:
>> > Hey Zheng, any clues as to what the bug is? Or what I'm doing wrong? I
>> can
>> > do some more digging and logging if required.
>> >
>> > Saurabh.
>> >
>> > On Mon, Aug 17, 2009 at 1:28 PM, Saurabh Nanda <sa...@gmail.com>
>> > wrote:
>> >>
>> >> Here's the log output:
>> >>
>> >> 2009-08-17 13:26:42,183 INFO  parse.SemanticAnalyzer
>> >> (SemanticAnalyzer.java:genFileSinkPlan(2575)) - Created FileSink Plan
>> for
>> >> clause: insclause-0dest_path:
>> >> hdfs://master-hadoop/user/hive/warehouse/raw/dt=2009-04-07 row schema:
>> >> {(_col0,_col0: string)(_col1,_col1: string)(_col2,_col2:
>> >> string)(_col3,_col3: string)(_col4,_col4: string)(_col5,_col5:
>> >> string)(_col6,_col6: string)(_col7,_col7: string)(_col8,_col8:
>> >> string)(_col9,_col9: string)(_col10,_col10: int)} .
>> >> HiveConf.ConfVars.COMPRESSRESULT=true
>> >>
>> >> Is the SemanticAnalyszer run more than once in the lifetime of a job?
>> >> Should I be looking for another log entry like this one?
>> >>
>> >> Saurabh.
>> >>
>> >> On Mon, Aug 17, 2009 at 1:26 PM, Saurabh Nanda <saurabhnanda@gmail.com
>> >
>> >> wrote:
>> >>>
>> >>> Strange. The compression configuration log entry was also info but I
>> >>> could see it in the task logs:
>> >>>
>> >>>       LOG.info("Compression configuration is:" + isCompressed);
>> >>>
>> >>> Saurabh.
>> >>>
>> >>> On Mon, Aug 17, 2009 at 12:56 PM, Zheng Shao <zs...@gmail.com>
>> wrote:
>> >>>>
>> >>>> The default log level is WARN. Please change it to INFO.
>> >>>>
>> >>>> hive.root.logger=INFO,DRFA
>> >>>>
>> >>>> Of course you can also use LOG.warn() in your test code.
>> >>>>
>> >>>> Zheng
>> >>>>
>> >>> --
>> >>> http://nandz.blogspot.com
>> >>> http://foodieforlife.blogspot.com
>> >>
>> >>
>> >>
>> >> --
>> >> http://nandz.blogspot.com
>> >> http://foodieforlife.blogspot.com
>> >
>> >
>> >
>> > --
>> > http://nandz.blogspot.com
>> > http://foodieforlife.blogspot.com
>> >
>>
>>
>>
>> --
>> Yours,
>> Zheng
>>
>
>
>
> --
> http://nandz.blogspot.com
> http://foodieforlife.blogspot.com
>



-- 
http://nandz.blogspot.com
http://foodieforlife.blogspot.com

Re: Output compression not working on hive-trunk (r802989)

Posted by Saurabh Nanda <sa...@gmail.com>.

Is this what you're talking about -- http://pastebin.ca/1533627 ? Seems like
compression is on.

Is there any difference in how CLI queries and JDBC queries are treated?

Saurabh.

On Tue, Aug 18, 2009 at 11:19 AM, Zheng Shao <zs...@gmail.com> wrote:

> Hi Saurabh,
>
> So the compression flag is correct when the plan is generated.
> When you run the query, you should see "plan = xxx.xml" in the log
> file. Can you open that file (in HDFS) and see whether the compression
> flag is on or not?
>
> Zheng
>
> On Mon, Aug 17, 2009 at 5:17 AM, Saurabh Nanda<sa...@gmail.com>
> wrote:
> > Hey Zheng, any clues as to what the bug is? Or what I'm doing wrong? I
> can
> > do some more digging and logging if required.
> >
> > Saurabh.
> >
> > On Mon, Aug 17, 2009 at 1:28 PM, Saurabh Nanda <sa...@gmail.com>
> > wrote:
> >>
> >> Here's the log output:
> >>
> >> 2009-08-17 13:26:42,183 INFO  parse.SemanticAnalyzer
> >> (SemanticAnalyzer.java:genFileSinkPlan(2575)) - Created FileSink Plan
> for
> >> clause: insclause-0dest_path:
> >> hdfs://master-hadoop/user/hive/warehouse/raw/dt=2009-04-07 row schema:
> >> {(_col0,_col0: string)(_col1,_col1: string)(_col2,_col2:
> >> string)(_col3,_col3: string)(_col4,_col4: string)(_col5,_col5:
> >> string)(_col6,_col6: string)(_col7,_col7: string)(_col8,_col8:
> >> string)(_col9,_col9: string)(_col10,_col10: int)} .
> >> HiveConf.ConfVars.COMPRESSRESULT=true
> >>
> >> Is the SemanticAnalyszer run more than once in the lifetime of a job?
> >> Should I be looking for another log entry like this one?
> >>
> >> Saurabh.
> >>
> >> On Mon, Aug 17, 2009 at 1:26 PM, Saurabh Nanda <sa...@gmail.com>
> >> wrote:
> >>>
> >>> Strange. The compression configuration log entry was also info but I
> >>> could see it in the task logs:
> >>>
> >>>       LOG.info("Compression configuration is:" + isCompressed);
> >>>
> >>> Saurabh.
> >>>
> >>> On Mon, Aug 17, 2009 at 12:56 PM, Zheng Shao <zs...@gmail.com> wrote:
> >>>>
> >>>> The default log level is WARN. Please change it to INFO.
> >>>>
> >>>> hive.root.logger=INFO,DRFA
> >>>>
> >>>> Of course you can also use LOG.warn() in your test code.
> >>>>
> >>>> Zheng
> >>>>
> >>> --
> >>> http://nandz.blogspot.com
> >>> http://foodieforlife.blogspot.com
> >>
> >>
> >>
> >> --
> >> http://nandz.blogspot.com
> >> http://foodieforlife.blogspot.com
> >
> >
> >
> > --
> > http://nandz.blogspot.com
> > http://foodieforlife.blogspot.com
> >
>
>
>
> --
> Yours,
> Zheng
>



-- 
http://nandz.blogspot.com
http://foodieforlife.blogspot.com

Re: Output compression not working on hive-trunk (r802989)

Posted by Zheng Shao <zs...@gmail.com>.

Hi Saurabh,

So the compression flag is correct when the plan is generated.
When you run the query, you should see "plan = xxx.xml" in the log
file. Can you open that file (in HDFS) and see whether the compression
flag is on or not?

Zheng

On Mon, Aug 17, 2009 at 5:17 AM, Saurabh Nanda<sa...@gmail.com> wrote:
> Hey Zheng, any clues as to what the bug is? Or what I'm doing wrong? I can
> do some more digging and logging if required.
>
> Saurabh.
>
> On Mon, Aug 17, 2009 at 1:28 PM, Saurabh Nanda <sa...@gmail.com>
> wrote:
>>
>> Here's the log output:
>>
>> 2009-08-17 13:26:42,183 INFO  parse.SemanticAnalyzer
>> (SemanticAnalyzer.java:genFileSinkPlan(2575)) - Created FileSink Plan for
>> clause: insclause-0dest_path:
>> hdfs://master-hadoop/user/hive/warehouse/raw/dt=2009-04-07 row schema:
>> {(_col0,_col0: string)(_col1,_col1: string)(_col2,_col2:
>> string)(_col3,_col3: string)(_col4,_col4: string)(_col5,_col5:
>> string)(_col6,_col6: string)(_col7,_col7: string)(_col8,_col8:
>> string)(_col9,_col9: string)(_col10,_col10: int)} .
>> HiveConf.ConfVars.COMPRESSRESULT=true
>>
>> Is the SemanticAnalyszer run more than once in the lifetime of a job?
>> Should I be looking for another log entry like this one?
>>
>> Saurabh.
>>
>> On Mon, Aug 17, 2009 at 1:26 PM, Saurabh Nanda <sa...@gmail.com>
>> wrote:
>>>
>>> Strange. The compression configuration log entry was also info but I
>>> could see it in the task logs:
>>>
>>>       LOG.info("Compression configuration is:" + isCompressed);
>>>
>>> Saurabh.
>>>
>>> On Mon, Aug 17, 2009 at 12:56 PM, Zheng Shao <zs...@gmail.com> wrote:
>>>>
>>>> The default log level is WARN. Please change it to INFO.
>>>>
>>>> hive.root.logger=INFO,DRFA
>>>>
>>>> Of course you can also use LOG.warn() in your test code.
>>>>
>>>> Zheng
>>>>
>>> --
>>> http://nandz.blogspot.com
>>> http://foodieforlife.blogspot.com
>>
>>
>>
>> --
>> http://nandz.blogspot.com
>> http://foodieforlife.blogspot.com
>
>
>
> --
> http://nandz.blogspot.com
> http://foodieforlife.blogspot.com
>



-- 
Yours,
Zheng

Re: Output compression not working on hive-trunk (r802989)

Posted by Saurabh Nanda <sa...@gmail.com>.

Hey Zheng, any clues as to what the bug is? Or what I'm doing wrong? I can
do some more digging and logging if required.

Saurabh.

On Mon, Aug 17, 2009 at 1:28 PM, Saurabh Nanda <sa...@gmail.com>wrote:

> Here's the log output:
>
> 2009-08-17 13:26:42,183 INFO  parse.SemanticAnalyzer
> (SemanticAnalyzer.java:genFileSinkPlan(2575)) - Created FileSink Plan for
> clause: insclause-0dest_path:
> hdfs://master-hadoop/user/hive/warehouse/raw/dt=2009-04-07 row schema:
> {(_col0,_col0: string)(_col1,_col1: string)(_col2,_col2:
> string)(_col3,_col3: string)(_col4,_col4: string)(_col5,_col5:
> string)(_col6,_col6: string)(_col7,_col7: string)(_col8,_col8:
> string)(_col9,_col9: string)(_col10,_col10: int)} .
> HiveConf.ConfVars.COMPRESSRESULT=true
>
> Is the SemanticAnalyszer run more than once in the lifetime of a job?
> Should I be looking for another log entry like this one?
>
> Saurabh.
>
>
> On Mon, Aug 17, 2009 at 1:26 PM, Saurabh Nanda <sa...@gmail.com>wrote:
>
>> Strange. The compression configuration log entry was also info but I could
>> see it in the task logs:
>>
>>       LOG.info("Compression configuration is:" + isCompressed);
>>
>> Saurabh.
>>
>> On Mon, Aug 17, 2009 at 12:56 PM, Zheng Shao <zs...@gmail.com> wrote:
>>
>>> The default log level is WARN. Please change it to INFO.
>>>
>>> hive.root.logger=INFO,DRFA
>>>
>>> Of course you can also use LOG.warn() in your test code.
>>>
>>> Zheng
>>>
>>> --
>> http://nandz.blogspot.com
>> http://foodieforlife.blogspot.com
>>
>
>
>
> --
> http://nandz.blogspot.com
> http://foodieforlife.blogspot.com
>



-- 
http://nandz.blogspot.com
http://foodieforlife.blogspot.com

Re: Output compression not working on hive-trunk (r802989)

Posted by Saurabh Nanda <sa...@gmail.com>.

Here's the log output:

2009-08-17 13:26:42,183 INFO  parse.SemanticAnalyzer
(SemanticAnalyzer.java:genFileSinkPlan(2575)) - Created FileSink Plan for
clause: insclause-0dest_path:
hdfs://master-hadoop/user/hive/warehouse/raw/dt=2009-04-07 row schema:
{(_col0,_col0: string)(_col1,_col1: string)(_col2,_col2:
string)(_col3,_col3: string)(_col4,_col4: string)(_col5,_col5:
string)(_col6,_col6: string)(_col7,_col7: string)(_col8,_col8:
string)(_col9,_col9: string)(_col10,_col10: int)} .
HiveConf.ConfVars.COMPRESSRESULT=true

Is the SemanticAnalyszer run more than once in the lifetime of a job? Should
I be looking for another log entry like this one?

Saurabh.

On Mon, Aug 17, 2009 at 1:26 PM, Saurabh Nanda <sa...@gmail.com>wrote:

> Strange. The compression configuration log entry was also info but I could
> see it in the task logs:
>
>       LOG.info("Compression configuration is:" + isCompressed);
>
> Saurabh.
>
> On Mon, Aug 17, 2009 at 12:56 PM, Zheng Shao <zs...@gmail.com> wrote:
>
>> The default log level is WARN. Please change it to INFO.
>>
>> hive.root.logger=INFO,DRFA
>>
>> Of course you can also use LOG.warn() in your test code.
>>
>> Zheng
>>
>> --
> http://nandz.blogspot.com
> http://foodieforlife.blogspot.com
>

-- 
http://nandz.blogspot.com
http://foodieforlife.blogspot.com

Re: Output compression not working on hive-trunk (r802989)

Posted by Saurabh Nanda <sa...@gmail.com>.

Strange. The compression configuration log entry was also info but I could
see it in the task logs:

      LOG.info("Compression configuration is:" + isCompressed);

Saurabh.

On Mon, Aug 17, 2009 at 12:56 PM, Zheng Shao <zs...@gmail.com> wrote:

> The default log level is WARN. Please change it to INFO.
>
> hive.root.logger=INFO,DRFA
>
> Of course you can also use LOG.warn() in your test code.
>
> Zheng
>
> --
http://nandz.blogspot.com
http://foodieforlife.blogspot.com

Re: Output compression not working on hive-trunk (r802989)

Posted by Zheng Shao <zs...@gmail.com>.

The default log level is WARN. Please change it to INFO.

hive.root.logger=INFO,DRFA

Of course you can also use LOG.warn() in your test code.

Zheng

On Sun, Aug 16, 2009 at 11:58 PM, Saurabh Nanda<sa...@gmail.com> wrote:
> I still can't find the log output anywhere.
>
> The log file is in /tmp/ct-admin/hive.log for me. The only contents in the
> log file are:
>
> 2009-08-17 11:18:18,018 WARN  mapred.JobClient
> (JobClient.java:configureCommandLineOptions(510)) - Use GenericOptionsParser
> for parsing the arguments. Applications should implement Tool for the same.
> 2009-08-17 11:26:45,380 WARN  mapred.JobClient
> (JobClient.java:configureCommandLineOptions(510)) - Use GenericOptionsParser
> for parsing the arguments. Applications should implement Tool for the same.
>
> Here's the exact change I made in SemanticAnalyzer.java:
>
>   Operator output = putOpInsertMap(
>       OperatorFactory.getAndMakeChild(
>         new fileSinkDesc(queryTmpdir, table_desc,
>                          conf.getBoolVar(HiveConf.ConfVars.COMPRESSRESULT),
> currentTableId),
>         fsRS, input), inputRR);
>
>     LOG.info("Created FileSink Plan for clause: " + dest + "dest_path: "
>              + dest_path + " row schema: "
>              + inputRR.toString()
>              + ". HiveConf.ConfVars.COMPRESSRESULT="
>              + conf.getBoolVar(HiveConf.ConfVars.COMPRESSRES
> ULT));
>
> Here's what conf/hive-log4j.properties looks like:
>
> # Define some default values that can be overridden by system properties
> hive.root.logger=WARN,DRFA
> hive.log.dir=/tmp/${user.name}
> hive.log.file=hive.log
>
> # Define the root logger to the system property "hadoop.root.logger".
> log4j.rootLogger=${hive.root.logger}, EventCounter
>
> # Logging Threshold
> log4j.threshhold=ALL
>
> #
> # Daily Rolling File Appender
> #
>
> log4j.appender.DRFA=org.apache.log4j.DailyRollingFileAppender
> log4j.appender.DRFA.File=${hive.log.dir}/${hive.log.file}
>
> # Rollver at midnight
> log4j.appender.DRFA.DatePattern=.yyyy-MM-dd
>
> # 30-day backup
> #log4j.appender.DRFA.MaxBackupIndex=30
> log4j.appender.DRFA.layout=org.apache.log4j.PatternLayout
>
> # Pattern format: Date LogLevel LoggerName LogMessage
> #log4j.appender.DRFA.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
> # Debugging Pattern format
> log4j.appender.DRFA.layout.ConversionPattern=%d{ISO8601} %-5p %c{2}
> (%F:%M(%L)) - %m%n
>
>
> #
> # console
> # Add "console" to rootlogger above if you want to use this
> #
>
> log4j.appender.console=org.apache.log4j.ConsoleAppender
> log4j.appender.console.target=System.err
> log4j.appender.console.layout=org.apache.log4j.PatternLayout
> log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p
> %c{2}: %m%n
>
> #custom logging levels
> # log4j.logger.root=DEBUG
>
> #
> # Event Counter Appender
> # Sends counts of logging messages at different severity levels to Hadoop
> Metrics.
> #
> log4j.appender.EventCounter=org.apache.hadoop.metrics.jvm.EventCounter
>
>
> log4j.category.DataNucleus=ERROR,DRFA
> log4j.category.Datastore=ERROR,DRFA
> log4j.category.Datastore.Schema=ERROR,DRFA
> log4j.category.JPOX.Datastore=ERROR,DRFA
> log4j.category.JPOX.Plugin=ERROR,DRFA
> log4j.category.JPOX.MetaData=ERROR,DRFA
> log4j.category.JPOX.Query=ERROR,DRFA
> log4j.category.JPOX.General=ERROR,DRFA
> log4j.category.JPOX.Enhancer=ERROR,DRFA
>
> What is going wrong?
>
> Saurabh.
> --
> http://nandz.blogspot.com
> http://foodieforlife.blogspot.com
>



-- 
Yours,
Zheng

Re: Output compression not working on hive-trunk (r802989)

Posted by Saurabh Nanda <sa...@gmail.com>.

I still can't find the log output anywhere.

*The log file is in /tmp/ct-admin/hive.log for me. The only contents in the
log file are:*

2009-08-17 11:18:18,018 WARN  mapred.JobClient
(JobClient.java:configureCommandLineOptions(510)) - Use GenericOptionsParser
for parsing the arguments. Applications should implement Tool for the same.
2009-08-17 11:26:45,380 WARN  mapred.JobClient
(JobClient.java:configureCommandLineOptions(510)) - Use GenericOptionsParser
for parsing the arguments. Applications should implement Tool for the same.

*Here's the exact change I made in SemanticAnalyzer.java:*

  Operator output = putOpInsertMap(
      OperatorFactory.getAndMakeChild(
        new fileSinkDesc(queryTmpdir, table_desc,
                         conf.getBoolVar(HiveConf.ConfVars.COMPRESSRESULT),
currentTableId),
        fsRS, input), inputRR);

    LOG.info("Created FileSink Plan for clause: " + dest + "dest_path: "
             + dest_path + " row schema: "
             + inputRR.toString()
             + ". HiveConf.ConfVars.COMPRESSRESULT="
             + conf.getBoolVar(HiveConf.ConfVars.COMPRESSRESULT));

*Here's what conf/hive-log4j.properties looks like:*

# Define some default values that can be overridden by system properties
hive.root.logger=WARN,DRFA
hive.log.dir=/tmp/${user.name}
hive.log.file=hive.log

# Define the root logger to the system property "hadoop.root.logger".
log4j.rootLogger=${hive.root.logger}, EventCounter

# Logging Threshold
log4j.threshhold=ALL

#
# Daily Rolling File Appender
#

log4j.appender.DRFA=org.apache.log4j.DailyRollingFileAppender
log4j.appender.DRFA.File=${hive.log.dir}/${hive.log.file}

# Rollver at midnight
log4j.appender.DRFA.DatePattern=.yyyy-MM-dd

# 30-day backup
#log4j.appender.DRFA.MaxBackupIndex=30
log4j.appender.DRFA.layout=org.apache.log4j.PatternLayout

# Pattern format: Date LogLevel LoggerName LogMessage
#log4j.appender.DRFA.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
# Debugging Pattern format
log4j.appender.DRFA.layout.ConversionPattern=%d{ISO8601} %-5p %c{2}
(%F:%M(%L)) - %m%n


#
# console
# Add "console" to rootlogger above if you want to use this
#

log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p
%c{2}: %m%n

#custom logging levels
# log4j.logger.root=DEBUG

#
# Event Counter Appender
# Sends counts of logging messages at different severity levels to Hadoop
Metrics.
#
log4j.appender.EventCounter=org.apache.hadoop.metrics.jvm.EventCounter


log4j.category.DataNucleus=ERROR,DRFA
log4j.category.Datastore=ERROR,DRFA
log4j.category.Datastore.Schema=ERROR,DRFA
log4j.category.JPOX.Datastore=ERROR,DRFA
log4j.category.JPOX.Plugin=ERROR,DRFA
log4j.category.JPOX.MetaData=ERROR,DRFA
log4j.category.JPOX.Query=ERROR,DRFA
log4j.category.JPOX.General=ERROR,DRFA
log4j.category.JPOX.Enhancer=ERROR,DRFA

What is going wrong?

Saurabh.
-- 
http://nandz.blogspot.com
http://foodieforlife.blogspot.com

Re: Output compression not working on hive-trunk (r802989)

Posted by Zheng Shao <zs...@gmail.com>.

Should be in /mnt/<yourname>/hive.log

This is specified in conf/hive-log4j.properties

Zheng
On Fri, Aug 14, 2009 at 2:03 AM, Saurabh Nanda<sa...@gmail.com> wrote:
> The statement I changed was in the function genFileSinkPlan() and was on
> line 2571 not 2711
>
> Saurabh.
>
> On Fri, Aug 14, 2009 at 2:31 PM, Saurabh Nanda <sa...@gmail.com>
> wrote:
>>
>> I changed the log statement, rebuilt Hive, and re-ran the insert query. I
>> didn't find this log entry anywhere. Where exactly should I be looking for
>> this log entry?
>>
>> Saurabh.
>>
>> On Fri, Aug 14, 2009 at 1:04 PM, Saurabh Nanda <sa...@gmail.com>
>> wrote:
>>>
>>> I'm changing the LOG.debug statement to the following --
>>>
>>>     LOG.info("Created FileSink Plan for clause: " + dest + "dest_path: "
>>>               + dest_path + " row schema: "
>>>               + inputRR.toString() + ".
>>> HiveConf.ConfVars.COMPRESSRESULT=" +
>>> conf.getBoolVar(HiveConf.ConfVars.COMPRESSRESULT));
>>>
>>> Saurabh.
>>>
>>> On Fri, Aug 14, 2009 at 12:39 PM, Zheng Shao <zs...@gmail.com> wrote:
>>>>
>>>> Great. We are one step closer to the root cause.
>>>>
>>>> Can you print out a log line here as well? This is the place that we
>>>> fill in the compression option.
>>>>
>>>> SemanticAnalyzer.java:2711:
>>>>    Operator output = putOpInsertMap(
>>>>      OperatorFactory.getAndMakeChild(
>>>>        new fileSinkDesc(queryTmpdir, table_desc,
>>>>
>>>> conf.getBoolVar(HiveConf.ConfVars.COMPRESSRESULT), currentTableId),
>>>>        fsRS, input), inputRR);
>>>>
>>>>
>>>> Zheng
>>>>
>>>> On Thu, Aug 13, 2009 at 11:34 PM, Saurabh Nanda<sa...@gmail.com>
>>>> wrote:
>>>> > The query is being split into two map/reduce jobs. The first job
>>>> > consists of
>>>> > 16 map tasks (no reduce job). The relevant log output is given below:
>>>> >
>>>> >
>>>> > 2009-08-14 11:29:38,245 INFO
>>>> > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Writing to temp file:
>>>> > FS
>>>> >
>>>> > hdfs://master-hadoop:8020/tmp/hive-ct-admin/1957063362/_tmp.10002/_tmp.attempt_200908131050_0218_m_000000_0
>>>> > 2009-08-14 11:29:38,246 INFO
>>>> > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Compression
>>>> > configuration
>>>> > is:true
>>>> >
>>>> > 2009-08-14 11:29:38,347 INFO org.apache.hadoop.io.compress.CodecPool:
>>>> > Got
>>>> > brand-new compressor
>>>> > 2009-08-14 11:29:38,358 INFO
>>>> > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Operator 6 FS
>>>> > initialized
>>>> > 2009-08-14 11:29:38,358 INFO
>>>> > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initialization Done 6
>>>> > FS
>>>> >
>>>> >
>>>> > The second job consists of 16 map tasks & 3 reduce tasks. None of the
>>>> > map
>>>> > tasks contain any log output from FileSinkOperator. The reduce tasks
>>>> > contain
>>>> > the following relevant log output:
>>>> >
>>>> >
>>>> > 2009-08-14 11:38:13,553 INFO
>>>> > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initializing child 3
>>>> > FS
>>>> > 2009-08-14 11:38:13,553 INFO
>>>> > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initializing Self 3
>>>> > FS
>>>> > 2009-08-14 11:38:13,604 INFO
>>>> > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Writing to temp file:
>>>> > FS
>>>> >
>>>> > hdfs://master-hadoop/tmp/hive-ct-admin/2045778473/_tmp.10000/_tmp.attempt_200908131050_0219_r_000000_0
>>>> >
>>>> > 2009-08-14 11:38:13,605 INFO
>>>> > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Compression
>>>> > configuration
>>>> > is:false
>>>> > 2009-08-14 11:38:43,128 INFO
>>>> > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Operator 3 FS
>>>> > initialized
>>>> >
>>>> > 2009-08-14 11:38:43,128 INFO
>>>> > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initialization Done 3
>>>> > FS
>>>> >
>>>> > You can see, that compression is "on" for the  first map/reduce job,
>>>> > but
>>>> > "off" for the second one. Did I forget to set any configuration
>>>> > parameter?
>>>> >
>>>> > Saurabh.
>>>> > --
>>>> > http://nandz.blogspot.com
>>>> > http://foodieforlife.blogspot.com
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>> Yours,
>>>> Zheng
>>>
>>>
>>>
>>> --
>>> http://nandz.blogspot.com
>>> http://foodieforlife.blogspot.com
>>
>>
>>
>> --
>> http://nandz.blogspot.com
>> http://foodieforlife.blogspot.com
>
>
>
> --
> http://nandz.blogspot.com
> http://foodieforlife.blogspot.com
>



-- 
Yours,
Zheng

Re: Output compression not working on hive-trunk (r802989)

Posted by Saurabh Nanda <sa...@gmail.com>.

The statement I changed was in the function genFileSinkPlan() and was on
line 2571 not 2711

Saurabh.

On Fri, Aug 14, 2009 at 2:31 PM, Saurabh Nanda <sa...@gmail.com>wrote:

> I changed the log statement, rebuilt Hive, and re-ran the insert query. I
> didn't find this log entry anywhere. Where exactly should I be looking for
> this log entry?
>
> Saurabh.
>
>
> On Fri, Aug 14, 2009 at 1:04 PM, Saurabh Nanda <sa...@gmail.com>wrote:
>
>> I'm changing the LOG.debug statement to the following --
>>
>>     LOG.info("Created FileSink Plan for clause: " + dest + "dest_path: "
>>               + dest_path + " row schema: "
>>               + inputRR.toString() + ". HiveConf.ConfVars.COMPRESSRESULT="
>> + conf.getBoolVar(HiveConf.ConfVars.COMPRESSRESULT));
>>
>> Saurabh.
>>
>>
>> On Fri, Aug 14, 2009 at 12:39 PM, Zheng Shao <zs...@gmail.com> wrote:
>>
>>> Great. We are one step closer to the root cause.
>>>
>>> Can you print out a log line here as well? This is the place that we
>>> fill in the compression option.
>>>
>>> SemanticAnalyzer.java:2711:
>>>    Operator output = putOpInsertMap(
>>>      OperatorFactory.getAndMakeChild(
>>>        new fileSinkDesc(queryTmpdir, table_desc,
>>>
>>> conf.getBoolVar(HiveConf.ConfVars.COMPRESSRESULT), currentTableId),
>>>        fsRS, input), inputRR);
>>>
>>>
>>> Zheng
>>>
>>> On Thu, Aug 13, 2009 at 11:34 PM, Saurabh Nanda<sa...@gmail.com>
>>> wrote:
>>> > The query is being split into two map/reduce jobs. The first job
>>> consists of
>>> > 16 map tasks (no reduce job). The relevant log output is given below:
>>> >
>>> >
>>> > 2009-08-14 11:29:38,245 INFO
>>> > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Writing to temp file:
>>> FS
>>> >
>>> hdfs://master-hadoop:8020/tmp/hive-ct-admin/1957063362/_tmp.10002/_tmp.attempt_200908131050_0218_m_000000_0
>>> > 2009-08-14 11:29:38,246 INFO
>>> > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Compression
>>> configuration
>>> > is:true
>>> >
>>> > 2009-08-14 11:29:38,347 INFO org.apache.hadoop.io.compress.CodecPool:
>>> Got
>>> > brand-new compressor
>>> > 2009-08-14 11:29:38,358 INFO
>>> > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Operator 6 FS
>>> initialized
>>> > 2009-08-14 11:29:38,358 INFO
>>> > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initialization Done 6
>>> FS
>>> >
>>> >
>>> > The second job consists of 16 map tasks & 3 reduce tasks. None of the
>>> map
>>> > tasks contain any log output from FileSinkOperator. The reduce tasks
>>> contain
>>> > the following relevant log output:
>>> >
>>> >
>>> > 2009-08-14 11:38:13,553 INFO
>>> > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initializing child 3
>>> FS
>>> > 2009-08-14 11:38:13,553 INFO
>>> > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initializing Self 3 FS
>>> > 2009-08-14 11:38:13,604 INFO
>>> > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Writing to temp file:
>>> FS
>>> >
>>> hdfs://master-hadoop/tmp/hive-ct-admin/2045778473/_tmp.10000/_tmp.attempt_200908131050_0219_r_000000_0
>>> >
>>> > 2009-08-14 11:38:13,605 INFO
>>> > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Compression
>>> configuration
>>> > is:false
>>> > 2009-08-14 11:38:43,128 INFO
>>> > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Operator 3 FS
>>> initialized
>>> >
>>> > 2009-08-14 11:38:43,128 INFO
>>> > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initialization Done 3
>>> FS
>>> >
>>> > You can see, that compression is "on" for the  first map/reduce job,
>>> but
>>> > "off" for the second one. Did I forget to set any configuration
>>> parameter?
>>> >
>>> > Saurabh.
>>> > --
>>> > http://nandz.blogspot.com
>>> > http://foodieforlife.blogspot.com
>>> >
>>>
>>>
>>>
>>> --
>>> Yours,
>>> Zheng
>>>
>>
>>
>>
>> --
>> http://nandz.blogspot.com
>> http://foodieforlife.blogspot.com
>>
>
>
>
> --
> http://nandz.blogspot.com
> http://foodieforlife.blogspot.com
>



-- 
http://nandz.blogspot.com
http://foodieforlife.blogspot.com

Re: Output compression not working on hive-trunk (r802989)

Posted by Saurabh Nanda <sa...@gmail.com>.

I changed the log statement, rebuilt Hive, and re-ran the insert query. I
didn't find this log entry anywhere. Where exactly should I be looking for
this log entry?

Saurabh.

On Fri, Aug 14, 2009 at 1:04 PM, Saurabh Nanda <sa...@gmail.com>wrote:

> I'm changing the LOG.debug statement to the following --
>
>     LOG.info("Created FileSink Plan for clause: " + dest + "dest_path: "
>               + dest_path + " row schema: "
>               + inputRR.toString() + ". HiveConf.ConfVars.COMPRESSRESULT="
> + conf.getBoolVar(HiveConf.ConfVars.COMPRESSRESULT));
>
> Saurabh.
>
>
> On Fri, Aug 14, 2009 at 12:39 PM, Zheng Shao <zs...@gmail.com> wrote:
>
>> Great. We are one step closer to the root cause.
>>
>> Can you print out a log line here as well? This is the place that we
>> fill in the compression option.
>>
>> SemanticAnalyzer.java:2711:
>>    Operator output = putOpInsertMap(
>>      OperatorFactory.getAndMakeChild(
>>        new fileSinkDesc(queryTmpdir, table_desc,
>>
>> conf.getBoolVar(HiveConf.ConfVars.COMPRESSRESULT), currentTableId),
>>        fsRS, input), inputRR);
>>
>>
>> Zheng
>>
>> On Thu, Aug 13, 2009 at 11:34 PM, Saurabh Nanda<sa...@gmail.com>
>> wrote:
>> > The query is being split into two map/reduce jobs. The first job
>> consists of
>> > 16 map tasks (no reduce job). The relevant log output is given below:
>> >
>> >
>> > 2009-08-14 11:29:38,245 INFO
>> > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Writing to temp file:
>> FS
>> >
>> hdfs://master-hadoop:8020/tmp/hive-ct-admin/1957063362/_tmp.10002/_tmp.attempt_200908131050_0218_m_000000_0
>> > 2009-08-14 11:29:38,246 INFO
>> > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Compression
>> configuration
>> > is:true
>> >
>> > 2009-08-14 11:29:38,347 INFO org.apache.hadoop.io.compress.CodecPool:
>> Got
>> > brand-new compressor
>> > 2009-08-14 11:29:38,358 INFO
>> > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Operator 6 FS
>> initialized
>> > 2009-08-14 11:29:38,358 INFO
>> > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initialization Done 6
>> FS
>> >
>> >
>> > The second job consists of 16 map tasks & 3 reduce tasks. None of the
>> map
>> > tasks contain any log output from FileSinkOperator. The reduce tasks
>> contain
>> > the following relevant log output:
>> >
>> >
>> > 2009-08-14 11:38:13,553 INFO
>> > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initializing child 3 FS
>> > 2009-08-14 11:38:13,553 INFO
>> > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initializing Self 3 FS
>> > 2009-08-14 11:38:13,604 INFO
>> > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Writing to temp file:
>> FS
>> >
>> hdfs://master-hadoop/tmp/hive-ct-admin/2045778473/_tmp.10000/_tmp.attempt_200908131050_0219_r_000000_0
>> >
>> > 2009-08-14 11:38:13,605 INFO
>> > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Compression
>> configuration
>> > is:false
>> > 2009-08-14 11:38:43,128 INFO
>> > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Operator 3 FS
>> initialized
>> >
>> > 2009-08-14 11:38:43,128 INFO
>> > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initialization Done 3
>> FS
>> >
>> > You can see, that compression is "on" for the  first map/reduce job, but
>> > "off" for the second one. Did I forget to set any configuration
>> parameter?
>> >
>> > Saurabh.
>> > --
>> > http://nandz.blogspot.com
>> > http://foodieforlife.blogspot.com
>> >
>>
>>
>>
>> --
>> Yours,
>> Zheng
>>
>
>
>
> --
> http://nandz.blogspot.com
> http://foodieforlife.blogspot.com
>



-- 
http://nandz.blogspot.com
http://foodieforlife.blogspot.com

Re: Output compression not working on hive-trunk (r802989)

Posted by Saurabh Nanda <sa...@gmail.com>.

I'm changing the LOG.debug statement to the following --

    LOG.info("Created FileSink Plan for clause: " + dest + "dest_path: "
              + dest_path + " row schema: "
              + inputRR.toString() + ". HiveConf.ConfVars.COMPRESSRESULT=" +
conf.getBoolVar(HiveConf.ConfVars.COMPRESSRESULT));

Saurabh.

On Fri, Aug 14, 2009 at 12:39 PM, Zheng Shao <zs...@gmail.com> wrote:

> Great. We are one step closer to the root cause.
>
> Can you print out a log line here as well? This is the place that we
> fill in the compression option.
>
> SemanticAnalyzer.java:2711:
>    Operator output = putOpInsertMap(
>      OperatorFactory.getAndMakeChild(
>        new fileSinkDesc(queryTmpdir, table_desc,
>
> conf.getBoolVar(HiveConf.ConfVars.COMPRESSRESULT), currentTableId),
>        fsRS, input), inputRR);
>
>
> Zheng
>
> On Thu, Aug 13, 2009 at 11:34 PM, Saurabh Nanda<sa...@gmail.com>
> wrote:
> > The query is being split into two map/reduce jobs. The first job consists
> of
> > 16 map tasks (no reduce job). The relevant log output is given below:
> >
> >
> > 2009-08-14 11:29:38,245 INFO
> > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Writing to temp file: FS
> >
> hdfs://master-hadoop:8020/tmp/hive-ct-admin/1957063362/_tmp.10002/_tmp.attempt_200908131050_0218_m_000000_0
> > 2009-08-14 11:29:38,246 INFO
> > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Compression
> configuration
> > is:true
> >
> > 2009-08-14 11:29:38,347 INFO org.apache.hadoop.io.compress.CodecPool: Got
> > brand-new compressor
> > 2009-08-14 11:29:38,358 INFO
> > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Operator 6 FS
> initialized
> > 2009-08-14 11:29:38,358 INFO
> > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initialization Done 6 FS
> >
> >
> > The second job consists of 16 map tasks & 3 reduce tasks. None of the map
> > tasks contain any log output from FileSinkOperator. The reduce tasks
> contain
> > the following relevant log output:
> >
> >
> > 2009-08-14 11:38:13,553 INFO
> > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initializing child 3 FS
> > 2009-08-14 11:38:13,553 INFO
> > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initializing Self 3 FS
> > 2009-08-14 11:38:13,604 INFO
> > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Writing to temp file: FS
> >
> hdfs://master-hadoop/tmp/hive-ct-admin/2045778473/_tmp.10000/_tmp.attempt_200908131050_0219_r_000000_0
> >
> > 2009-08-14 11:38:13,605 INFO
> > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Compression
> configuration
> > is:false
> > 2009-08-14 11:38:43,128 INFO
> > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Operator 3 FS
> initialized
> >
> > 2009-08-14 11:38:43,128 INFO
> > org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initialization Done 3 FS
> >
> > You can see, that compression is "on" for the  first map/reduce job, but
> > "off" for the second one. Did I forget to set any configuration
> parameter?
> >
> > Saurabh.
> > --
> > http://nandz.blogspot.com
> > http://foodieforlife.blogspot.com
> >
>
>
>
> --
> Yours,
> Zheng
>



-- 
http://nandz.blogspot.com
http://foodieforlife.blogspot.com

Re: Output compression not working on hive-trunk (r802989)

Posted by Zheng Shao <zs...@gmail.com>.

Great. We are one step closer to the root cause.

Can you print out a log line here as well? This is the place that we
fill in the compression option.

SemanticAnalyzer.java:2711:
    Operator output = putOpInsertMap(
      OperatorFactory.getAndMakeChild(
        new fileSinkDesc(queryTmpdir, table_desc,

conf.getBoolVar(HiveConf.ConfVars.COMPRESSRESULT), currentTableId),
        fsRS, input), inputRR);


Zheng

On Thu, Aug 13, 2009 at 11:34 PM, Saurabh Nanda<sa...@gmail.com> wrote:
> The query is being split into two map/reduce jobs. The first job consists of
> 16 map tasks (no reduce job). The relevant log output is given below:
>
>
> 2009-08-14 11:29:38,245 INFO
> org.apache.hadoop.hive.ql.exec.FileSinkOperator: Writing to temp file: FS
> hdfs://master-hadoop:8020/tmp/hive-ct-admin/1957063362/_tmp.10002/_tmp.attempt_200908131050_0218_m_000000_0
> 2009-08-14 11:29:38,246 INFO
> org.apache.hadoop.hive.ql.exec.FileSinkOperator: Compression configuration
> is:true
>
> 2009-08-14 11:29:38,347 INFO org.apache.hadoop.io.compress.CodecPool: Got
> brand-new compressor
> 2009-08-14 11:29:38,358 INFO
> org.apache.hadoop.hive.ql.exec.FileSinkOperator: Operator 6 FS initialized
> 2009-08-14 11:29:38,358 INFO
> org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initialization Done 6 FS
>
>
> The second job consists of 16 map tasks & 3 reduce tasks. None of the map
> tasks contain any log output from FileSinkOperator. The reduce tasks contain
> the following relevant log output:
>
>
> 2009-08-14 11:38:13,553 INFO
> org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initializing child 3 FS
> 2009-08-14 11:38:13,553 INFO
> org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initializing Self 3 FS
> 2009-08-14 11:38:13,604 INFO
> org.apache.hadoop.hive.ql.exec.FileSinkOperator: Writing to temp file: FS
> hdfs://master-hadoop/tmp/hive-ct-admin/2045778473/_tmp.10000/_tmp.attempt_200908131050_0219_r_000000_0
>
> 2009-08-14 11:38:13,605 INFO
> org.apache.hadoop.hive.ql.exec.FileSinkOperator: Compression configuration
> is:false
> 2009-08-14 11:38:43,128 INFO
> org.apache.hadoop.hive.ql.exec.FileSinkOperator: Operator 3 FS initialized
>
> 2009-08-14 11:38:43,128 INFO
> org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initialization Done 3 FS
>
> You can see, that compression is "on" for the  first map/reduce job, but
> "off" for the second one. Did I forget to set any configuration parameter?
>
> Saurabh.
> --
> http://nandz.blogspot.com
> http://foodieforlife.blogspot.com
>



-- 
Yours,
Zheng

Re: Output compression not working on hive-trunk (r802989)

Posted by Saurabh Nanda <sa...@gmail.com>.

The query is being split into two map/reduce jobs. The first job
consists of 16 map tasks (no reduce job). The relevant log output is
given below:

2009-08-14 11:29:38,245 INFO
org.apache.hadoop.hive.ql.exec.FileSinkOperator: Writing to temp file:
FS hdfs://master-hadoop:8020/tmp/hive-ct-admin/1957063362/_tmp.10002/_tmp.attempt_200908131050_0218_m_000000_0
2009-08-14 11:29:38,246 INFO
org.apache.hadoop.hive.ql.exec.FileSinkOperator: Compression
configuration is:true
2009-08-14 11:29:38,347 INFO org.apache.hadoop.io.compress.CodecPool:
Got brand-new compressor
2009-08-14 11:29:38,358 INFO
org.apache.hadoop.hive.ql.exec.FileSinkOperator: Operator 6 FS
initialized
2009-08-14 11:29:38,358 INFO
org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initialization Done 6
FS

The second job consists of 16 map tasks & 3 reduce tasks. None of the
map tasks contain any log output from FileSinkOperator. The reduce
tasks contain the following relevant log output:

2009-08-14 11:38:13,553 INFO
org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initializing child 3
FS
2009-08-14 11:38:13,553 INFO
org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initializing Self 3
FS
2009-08-14 11:38:13,604 INFO
org.apache.hadoop.hive.ql.exec.FileSinkOperator: Writing to temp file:
FS hdfs://master-hadoop/tmp/hive-ct-admin/2045778473/_tmp.10000/_tmp.attempt_200908131050_0219_r_000000_0
2009-08-14 11:38:13,605 INFO
org.apache.hadoop.hive.ql.exec.FileSinkOperator: Compression
configuration is:false
2009-08-14 11:38:43,128 INFO
org.apache.hadoop.hive.ql.exec.FileSinkOperator: Operator 3 FS
initialized
2009-08-14 11:38:43,128 INFO
org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initialization Done 3
FS

You can see, that compression is "on" for the  first map/reduce job,
but "off" for the second one. Did I forget to set any configuration
parameter?

Saurabh.
-- 
http://nandz.blogspot.com
http://foodieforlife.blogspot.com

Re: Output compression not working on hive-trunk (r802989)

Posted by Saurabh Nanda <sa...@gmail.com>.

Sorry, found it. It's in the task logs for the reduce jobs.

Saurabh.

On Fri, Aug 14, 2009 at 10:58 AM, Saurabh Nanda <sa...@gmail.com>wrote:

> I can't find the previous log entry anywhere -- LOG.info("Writing to temp
> file: FS " + outPath); -- where should I be looking? Should I configure
> log4j differently for LOG.info to show up?
>
> Saurabh.
>
>
> On Fri, Aug 14, 2009 at 10:54 AM, Saurabh Nanda <sa...@gmail.com>wrote:
>
>> Files in table raw_compressed start with this header:
>> SEQ|"org.apache.hadoop.io.BytesWritable|org.apache.hadoop.io.Text||'org.apache.hadoop.io.compress.GzipCodec
>>
>> Files in table raw start with this header:
>> SEQ|"org.apache.hadoop.io.BytesWritable|org.apache.hadoop.io.Text
>>
>> File size for raw_compressed: 250MB
>> File size for raw: 2150 MB
>>
>> After "boolean isCompressed = conf.getCompressed();" should I put
>> "LOG.info("Compression config is:" + isCompressed);" ?
>>
>> Saurabh.
>>
>>
>> On Fri, Aug 14, 2009 at 9:51 AM, Zheng Shao <zs...@gmail.com> wrote:
>>
>>> What is the average file size in table raw?
>>>
>>> Can you put a log line in FileSinkOperator.java:107 ? That will tell
>>> us whether compression is turned on or not.
>>>
>>> Zheng
>>>
>>> On Thu, Aug 13, 2009 at 9:06 PM, Saurabh Nanda<sa...@gmail.com>
>>> wrote:
>>> >
>>> >> hive.exec.compress.output=true is the correct option. Can you post the
>>> >> "insert" command that you run which produced non-compressed results?
>>> >> Is the output in TextFileFormat or SequenceFileFormat?
>>> >
>>> > Here's the query. raw_compressed is a SequenceFile table with raw
>>> lines. raw
>>> > is a SequenceFile table with separate columns for each data field.
>>> >
>>> > from raw_compressed
>>> >     insert overwrite table raw partition (dt='2009-04-02')
>>> >     select transform(line) using 'parse_logs.rb' as ip_address, aid,
>>> uid,
>>> > ts, method, uri, response, referer, user_agent, cookies, ptime
>>> >
>>> > Saurabh.
>>> > --
>>> > http://nandz.blogspot.com
>>> > http://foodieforlife.blogspot.com
>>> >
>>>
>>>
>>>
>>> --
>>> Yours,
>>> Zheng
>>>
>>
>>
>>
>> --
>> http://nandz.blogspot.com
>> http://foodieforlife.blogspot.com
>>
>
>
>
> --
> http://nandz.blogspot.com
> http://foodieforlife.blogspot.com
>



-- 
http://nandz.blogspot.com
http://foodieforlife.blogspot.com

Re: Output compression not working on hive-trunk (r802989)

Posted by Saurabh Nanda <sa...@gmail.com>.

I can't find the previous log entry anywhere -- LOG.info("Writing to temp
file: FS " + outPath); -- where should I be looking? Should I configure
log4j differently for LOG.info to show up?

Saurabh.

On Fri, Aug 14, 2009 at 10:54 AM, Saurabh Nanda <sa...@gmail.com>wrote:

> Files in table raw_compressed start with this header:
> SEQ|"org.apache.hadoop.io.BytesWritable|org.apache.hadoop.io.Text||'org.apache.hadoop.io.compress.GzipCodec
>
> Files in table raw start with this header:
> SEQ|"org.apache.hadoop.io.BytesWritable|org.apache.hadoop.io.Text
>
> File size for raw_compressed: 250MB
> File size for raw: 2150 MB
>
> After "boolean isCompressed = conf.getCompressed();" should I put
> "LOG.info("Compression config is:" + isCompressed);" ?
>
> Saurabh.
>
>
> On Fri, Aug 14, 2009 at 9:51 AM, Zheng Shao <zs...@gmail.com> wrote:
>
>> What is the average file size in table raw?
>>
>> Can you put a log line in FileSinkOperator.java:107 ? That will tell
>> us whether compression is turned on or not.
>>
>> Zheng
>>
>> On Thu, Aug 13, 2009 at 9:06 PM, Saurabh Nanda<sa...@gmail.com>
>> wrote:
>> >
>> >> hive.exec.compress.output=true is the correct option. Can you post the
>> >> "insert" command that you run which produced non-compressed results?
>> >> Is the output in TextFileFormat or SequenceFileFormat?
>> >
>> > Here's the query. raw_compressed is a SequenceFile table with raw lines.
>> raw
>> > is a SequenceFile table with separate columns for each data field.
>> >
>> > from raw_compressed
>> >     insert overwrite table raw partition (dt='2009-04-02')
>> >     select transform(line) using 'parse_logs.rb' as ip_address, aid,
>> uid,
>> > ts, method, uri, response, referer, user_agent, cookies, ptime
>> >
>> > Saurabh.
>> > --
>> > http://nandz.blogspot.com
>> > http://foodieforlife.blogspot.com
>> >
>>
>>
>>
>> --
>> Yours,
>> Zheng
>>
>
>
>
> --
> http://nandz.blogspot.com
> http://foodieforlife.blogspot.com
>



-- 
http://nandz.blogspot.com
http://foodieforlife.blogspot.com

Re: Output compression not working on hive-trunk (r802989)

Posted by Saurabh Nanda <sa...@gmail.com>.

Files in table raw_compressed start with this header:
SEQ|"org.apache.hadoop.io.BytesWritable|org.apache.hadoop.io.Text||'org.apache.hadoop.io.compress.GzipCodec

Files in table raw start with this header:
SEQ|"org.apache.hadoop.io.BytesWritable|org.apache.hadoop.io.Text

File size for raw_compressed: 250MB
File size for raw: 2150 MB

After "boolean isCompressed = conf.getCompressed();" should I put
"LOG.info("Compression config is:" + isCompressed);" ?

Saurabh.

On Fri, Aug 14, 2009 at 9:51 AM, Zheng Shao <zs...@gmail.com> wrote:

> What is the average file size in table raw?
>
> Can you put a log line in FileSinkOperator.java:107 ? That will tell
> us whether compression is turned on or not.
>
> Zheng
>
> On Thu, Aug 13, 2009 at 9:06 PM, Saurabh Nanda<sa...@gmail.com>
> wrote:
> >
> >> hive.exec.compress.output=true is the correct option. Can you post the
> >> "insert" command that you run which produced non-compressed results?
> >> Is the output in TextFileFormat or SequenceFileFormat?
> >
> > Here's the query. raw_compressed is a SequenceFile table with raw lines.
> raw
> > is a SequenceFile table with separate columns for each data field.
> >
> > from raw_compressed
> >     insert overwrite table raw partition (dt='2009-04-02')
> >     select transform(line) using 'parse_logs.rb' as ip_address, aid, uid,
> > ts, method, uri, response, referer, user_agent, cookies, ptime
> >
> > Saurabh.
> > --
> > http://nandz.blogspot.com
> > http://foodieforlife.blogspot.com
> >
>
>
>
> --
> Yours,
> Zheng
>



-- 
http://nandz.blogspot.com
http://foodieforlife.blogspot.com

Re: Output compression not working on hive-trunk (r802989)

Posted by Zheng Shao <zs...@gmail.com>.

What is the average file size in table raw?

Can you put a log line in FileSinkOperator.java:107 ? That will tell
us whether compression is turned on or not.

Zheng

On Thu, Aug 13, 2009 at 9:06 PM, Saurabh Nanda<sa...@gmail.com> wrote:
>
>> hive.exec.compress.output=true is the correct option. Can you post the
>> "insert" command that you run which produced non-compressed results?
>> Is the output in TextFileFormat or SequenceFileFormat?
>
> Here's the query. raw_compressed is a SequenceFile table with raw lines. raw
> is a SequenceFile table with separate columns for each data field.
>
> from raw_compressed
>     insert overwrite table raw partition (dt='2009-04-02')
>     select transform(line) using 'parse_logs.rb' as ip_address, aid, uid,
> ts, method, uri, response, referer, user_agent, cookies, ptime
>
> Saurabh.
> --
> http://nandz.blogspot.com
> http://foodieforlife.blogspot.com
>



-- 
Yours,
Zheng

Re: Output compression not working on hive-trunk (r802989)

Posted by Saurabh Nanda <sa...@gmail.com>.

> hive.exec.compress.output=true is the correct option. Can you post the
> "insert" command that you run which produced non-compressed results?
> Is the output in TextFileFormat or SequenceFileFormat?


Here's the query. raw_compressed is a SequenceFile table with raw lines. raw
is a SequenceFile table with separate columns for each data field.

from raw_compressed
    insert overwrite table raw partition (dt='2009-04-02')
    select transform(line) using 'parse_logs.rb' as ip_address, aid, uid,
ts, method, uri, response, referer, user_agent, cookies, ptime

Saurabh.
-- 
http://nandz.blogspot.com
http://foodieforlife.blogspot.com

Re: Output compression not working on hive-trunk (r802989)

Posted by Zheng Shao <zs...@gmail.com>.

Hi Saurabh,

hive.exec.compress.output=true is the correct option. Can you post the
"insert" command that you run which produced non-compressed results?
Is the output in TextFileFormat or SequenceFileFormat?


Zheng

On Wed, Aug 12, 2009 at 10:52 PM, Saurabh Nanda<sa...@gmail.com> wrote:
> I've even tried setting "mapred.output.compress=true" in hadoop-site.xml and
> restarting the cluster but in vain.
>
> How do I get compression to work in Hive-trunk? Is it something to do with
> the Hive query as well. Here's what I'm trying:
>
> from raw_compressed
>     insert overwrite table raw partition (dt='2009-04-02')
>     select transform(line) using 'parse_logs.rb' as ip_address, aid, uid,
> ts, method, uri, response, referer, user_agent, cookies, ptime
>
> Saurabh.
>
> On Thu, Aug 13, 2009 at 9:44 AM, Saurabh Nanda <sa...@gmail.com>
> wrote:
>>
>> I migrated from Hive-0.30 to Hive-trunk (r802989 compiled against Hadoop
>> 0.18.3), copied over metastore_db & the conf directory. Output compression
>> used to work with my earlier Hive installation, but it seems to have stopped
>> working now. Are the configuration parameters different from Hive-0.3?
>>
>> "set -v" on Hive-trunk throws up the following relevant configuration
>> parameters:
>>
>> mapred.output.compress=false
>> hive.exec.compress.intermediate=false
>> hive.exec.compress.output=true
>> mapred.output.compression.type=BLOCK
>> mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec
>>
>> mapred.map.output.compression.codec=org.apache.hadoop.io.compress.DefaultCodec
>> io.seqfile.compress.blocksize=1000000
>> io.seqfile.lazydecompress=true
>> mapred.compress.map.output=false
>>
>> io.compression.codecs=org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec
>>
>> What am I missing?
>>
>> Saurabh.
>> --
>> http://nandz.blogspot.com
>> http://foodieforlife.blogspot.com
>
>
>
> --
> http://nandz.blogspot.com
> http://foodieforlife.blogspot.com
>



-- 
Yours,
Zheng

Re: Output compression not working on hive-trunk (r802989)

Posted by Saurabh Nanda <sa...@gmail.com>.

I've even tried setting "mapred.output.compress=true" in hadoop-site.xml and
restarting the cluster but in vain.

How do I get compression to work in Hive-trunk? Is it something to do with
the Hive query as well. Here's what I'm trying:

from raw_compressed
    insert overwrite table raw partition (dt='2009-04-02')
    select transform(line) using 'parse_logs.rb' as ip_address, aid, uid,
ts, method, uri, response, referer, user_agent, cookies, ptime

Saurabh.

On Thu, Aug 13, 2009 at 9:44 AM, Saurabh Nanda <sa...@gmail.com>wrote:

> I migrated from Hive-0.30 to Hive-trunk (r802989 compiled against Hadoop
> 0.18.3), copied over metastore_db & the conf directory. Output compression
> used to work with my earlier Hive installation, but it seems to have stopped
> working now. Are the configuration parameters different from Hive-0.3?
>
> "set -v" on Hive-trunk throws up the following relevant configuration
> parameters:
>
> mapred.output.compress=false
> hive.exec.compress.intermediate=false
> hive.exec.compress.output=true
> mapred.output.compression.type=BLOCK
> mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec
>
> mapred.map.output.compression.codec=org.apache.hadoop.io.compress.DefaultCodec
> io.seqfile.compress.blocksize=1000000
> io.seqfile.lazydecompress=true
> mapred.compress.map.output=false
>
> io.compression.codecs=org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec
>
> What am I missing?
>
> Saurabh.
> --
> http://nandz.blogspot.com
> http://foodieforlife.blogspot.com
>



-- 
http://nandz.blogspot.com
http://foodieforlife.blogspot.com