You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Fabian Alenius (JIRA)" <ji...@apache.org> on 2012/08/13 14:52:38 UTC

[jira] [Updated] (HIVE-3374) TABLESAMPLE does not ignore _SUCCESS and _logs

     [ https://issues.apache.org/jira/browse/HIVE-3374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Fabian Alenius updated HIVE-3374:
---------------------------------

    Description: 
When running a query on an external bucketed table, TABLESAMPLE does not ignore _logs and _SUCCESS. This means that if you run the select on bucket 1 or 2, the query will fail.



hive> select count(*) from table TABLESAMPLE(BUCKET 1 OUT OF 100 on username) where var = value;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>
org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://namenode/table/var/_SUCCESS
        at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:194)
        at org.apache.hadoop.mapred.lib.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:180)
        at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:387)
        at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:353)
        at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:387)
        at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:989)
        at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:981)
        at org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:170)
        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:891)
        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:844)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
        at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:844)
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:818)
        at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:452)
        at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:136)
        at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:133)
        at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
        at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1332)
        at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1123)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:931)
        at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:255)
        at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:212)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
        at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:671)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:554)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Job Submission failed with exception 'org.apache.hadoop.mapred.InvalidInputException(Input path does not exist: hdfs://namenode/table/var/_SUCCESS)'
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MapRedTask


  was:
When running a query on an external bucketed table, TABLESAMPLE does not ignore _logs and _SUCCESS. This means that if you run the select bucket 1 or 2, the query will fail.



hive> select count(*) from table TABLESAMPLE(BUCKET 1 OUT OF 100 on username) where var = value;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>
org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://namenode/table/var/_SUCCESS
        at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:194)
        at org.apache.hadoop.mapred.lib.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:180)
        at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:387)
        at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:353)
        at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:387)
        at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:989)
        at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:981)
        at org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:170)
        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:891)
        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:844)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
        at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:844)
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:818)
        at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:452)
        at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:136)
        at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:133)
        at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
        at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1332)
        at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1123)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:931)
        at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:255)
        at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:212)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
        at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:671)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:554)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Job Submission failed with exception 'org.apache.hadoop.mapred.InvalidInputException(Input path does not exist: hdfs://namenode/table/var/_SUCCESS)'
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MapRedTask


    
> TABLESAMPLE does not ignore _SUCCESS and _logs
> ----------------------------------------------
>
>                 Key: HIVE-3374
>                 URL: https://issues.apache.org/jira/browse/HIVE-3374
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.8.1
>            Reporter: Fabian Alenius
>
> When running a query on an external bucketed table, TABLESAMPLE does not ignore _logs and _SUCCESS. This means that if you run the select on bucket 1 or 2, the query will fail.
> hive> select count(*) from table TABLESAMPLE(BUCKET 1 OUT OF 100 on username) where var = value;
> Total MapReduce jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks determined at compile time: 1
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=<number>
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=<number>
> In order to set a constant number of reducers:
>   set mapred.reduce.tasks=<number>
> org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://namenode/table/var/_SUCCESS
>         at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:194)
>         at org.apache.hadoop.mapred.lib.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:180)
>         at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:387)
>         at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:353)
>         at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:387)
>         at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:989)
>         at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:981)
>         at org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:170)
>         at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:891)
>         at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:844)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
>         at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:844)
>         at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:818)
>         at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:452)
>         at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:136)
>         at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:133)
>         at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
>         at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1332)
>         at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1123)
>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:931)
>         at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:255)
>         at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:212)
>         at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
>         at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:671)
>         at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:554)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
> Job Submission failed with exception 'org.apache.hadoop.mapred.InvalidInputException(Input path does not exist: hdfs://namenode/table/var/_SUCCESS)'
> FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MapRedTask

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira