You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by Stephen Sprague <sp...@gmail.com> on 2016/08/31 03:28:50 UTC

hive.root.logger influencing query plan?? so it's not so

Hi guys,
I've banged my head on this one all day and i need to surrender.  I have a
query that hangs (never returns). However, when i turn on logging to DEBUG
level it works.  I'm stumped.   I include here the query, the different
query plans (with the only thing different being the log level) and a
traceback on the query that hangs.

* the query

#!/bin/bash

sql="
explain SELECT *
FROM   raw_logs.fact_www_access_datekey_by_hour
WHERE  date_key = 20160828      -- partition key
and lower(http_string) = 'foo'  -- always evaluates to false
limit 1                         -- if i remove limit 1 it works.  I expect
an empty result set.
;
"

#hive -hiveconf hive.root.logger=ERROR,console -e "$sql" >stdout.bad
2>stderr.bad  #hangs
hive -hiveconf hive.root.logger=DEBUG,console -e "$sql" >stdout.good
2>stderr.good #works



* explain plan for the one that hangs (ERROR,console)

STAGE DEPENDENCIES:
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-0
    Fetch Operator
      limit: 1
      Processor Tree:
        TableScan
          alias: fact_www_access_datekey_by_hour
          Statistics: Num rows: 144355924 Data size: 158755813572 Basic
stats: COMPLETE Column stats: NONE
          Filter Operator
            predicate: (lower(http_string) = 'foo') (type: boolean)
            Statistics: Num rows: 72177962 Data size: 79377906786 Basic
stats: COMPLETE Column stats: NONE
            Select Operator
              expressions: payload_year (type: int), payload_month (type:
int), payload_day (type: int), payload_time (type: string),
payload_gmt_offset (type: string), perf_time_micros (type: int),
http_string (type: string), http_type (type: string), http_version (type:
string), http_rc (type: int), http_size (type: int), referrer (type:
string), user_agent (type: string), bot_flag (type: string), ip_list (type:
array<string>), scalar_tags (type: map<string,string>), url_components
(type: array<string>), user_id (type: string), lt_session_id (type:
string), session_id (type: string), log_name (type: string), log_lineno
(type: int), log_line (type: string), web_server (type: string),
location_info (type: map<string,string>), dp_main_url_type (type: string),
dp_inventory_type (type: string), dp_main_url (type:
map<string,array<string>>), dp_tag_url (type: map<string,array<string>>),
dp_referrer_url (type: map<string,array<string>>), hour (type: int),
20160828 (type: int)
              outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5,
_col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15,
_col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23, _col24,
_col25, _col26, _col27, _col28, _col29, _col30, _col31
              Statistics: Num rows: 72177962 Data size: 79377906786 Basic
stats: COMPLETE Column stats: NONE
              Limit
                Number of rows: 1
                Statistics: Num rows: 1 Data size: 1099 Basic stats:
COMPLETE Column stats: NONE
                ListSink

* query plan for the one that works (DEBUG,console)

STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: fact_www_access_datekey_by_hour
            Statistics: Num rows: 144355924 Data size: 158755813572 Basic
stats: COMPLETE Column stats: NONE
            Filter Operator
              predicate: (lower(http_string) = 'foo') (type: boolean)
              Statistics: Num rows: 72177962 Data size: 79377906786 Basic
stats: COMPLETE Column stats: NONE
              Select Operator
                expressions: payload_year (type: int), payload_month (type:
int), payload_day (type: int), payload_time (type: string),
payload_gmt_offset (type: string), perf_time_micros (type: int),
http_string (type: string), http_type (type: string), http_version (type:
string), http_rc (type: int), http_size (type: int), referrer (type:
string), user_agent (type: string), bot_flag (type: string), ip_list (type:
array<string>), scalar_tags (type: map<string,string>), url_components
(type: array<string>), user_id (type: string), lt_session_id (type:
string), session_id (type: string), log_name (type: string), log_lineno
(type: int), log_line (type: string), web_server (type: string),
location_info (type: map<string,string>), dp_main_url_type (type: string),
dp_inventory_type (type: string), dp_main_url (type:
map<string,array<string>>), dp_tag_url (type: map<string,array<string>>),
dp_referrer_url (type: map<string,array<string>>), hour (type: int),
20160828 (type: int)
                outputColumnNames: _col0, _col1, _col2, _col3, _col4,
_col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14,
_col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22, _col23,
_col24, _col25, _col26, _col27, _col28, _col29, _col30, _col31
                Statistics: Num rows: 72177962 Data size: 79377906786 Basic
stats: COMPLETE Column stats: NONE
                Limit
                  Number of rows: 1
                  Statistics: Num rows: 1 Data size: 1099 Basic stats:
COMPLETE Column stats: NONE
                  File Output Operator
                    compressed: true
                    Statistics: Num rows: 1 Data size: 1099 Basic stats:
COMPLETE Column stats: NONE
                    table:
                        input format:
org.apache.hadoop.mapred.SequenceFileInputFormat
                        output format:
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                        serde:
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
    Fetch Operator
      limit: 1
      Processor Tree:
        ListSink




* traceback when i press ctl-c on the client

^CFailed with exception java.io.IOException:java.io.IOException: Filesystem
closed
16/08/30 20:16:09 [62cf87a8-2b39-47d2-b234-53631b60d84e main]: ERROR
CliDriver: - Failed with exception java.io.IOException:java.io.IOException:
Filesystem closed
java.io.IOException: java.io.IOException: Filesystem closed
        at
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:517)
        at
org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:424)
        at
org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:144)
        at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1885)
        at
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:252)
        at
org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183)
        at
org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399)
        at
org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:335)
        at
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:742)
        at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:483)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.io.IOException: Filesystem closed
        at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:794)
        at
org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:833)
        at
org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:897)
        at
org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:700)
        at java.io.DataInputStream.readInt(DataInputStream.java:387)
        at
org.apache.hadoop.io.SequenceFile$Reader.readBlock(SequenceFile.java:2101)
        at
org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2508)
        at
org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:82)
        at
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:484)
        ... 16 more


on the one that hangs the job is never submitted to YARN.  On the one that
succeeds the job is submitted.  Ultimately though why does changing the log
level influence the query plan?

any insights greatly appreciated.

Thanks,
Stephen.

Re: hive.root.logger influencing query plan?? so it's not so

Posted by Gopal Vijayaraghavan <go...@apache.org>.

> I have a query that hangs (never returns). However, when i turn on
>logging to DEBUG level it works.  I'm stumped.

Slowing down things does make a lot of stuff work - logging does something
more than slow things down, it actually introduces a synchronization point
(global lock) for each log.<level> call that makes it to the final output.

Running with the extra locked sections might be avoiding a race condition
- that worries me.

But the plans are different - very different.

>  Stage: Stage-0
>    Fetch Operator
>      limit: 1
>      Processor Tree:
>        ListSink

Crucial bit - the debug logging produces a 2 stage query. No reason why
logging should change the plan there.

> any insights greatly appreciated.

set hive.fetch.task.conversion=none;

I'm guessing that the whole data partition is < 1Gb?

Cheers,
Gopal