You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Rodrigo Schmidt (JIRA)" <ji...@apache.org> on 2009/02/06 11:07:59 UTC

[jira] Created: (HIVE-277) Files created by redundant tasks that have been killed are taken into consideration

Files created by redundant tasks that have been killed are taken into consideration
-----------------------------------------------------------------------------------

                 Key: HIVE-277
                 URL: https://issues.apache.org/jira/browse/HIVE-277
             Project: Hadoop Hive
          Issue Type: Bug
         Environment: Hive with Hadoop 0.19.0 on a Linux cluster
            Reporter: Rodrigo Schmidt
            Priority: Minor


Hadoop starts redundant tasks (mappers, by default) if it some particular mappers are taking too long to be executed. When one of the redundant tasks finishes, the others are killed. Killed tasks may generate output files (usually empty) and Hive is considering them as part of the job output.

In my case, I'm profiling one of the mappers in an INSERT OVERWRITE TABLE ... SELECT (map-only) query, and the extra time added by the profiler makes hadoop start a second mapper for the same part of the input. When one of these redundant mappers finishes, the other is killed, and /tmp/hive-xxxx/xxxxxxxxx.10000.insclause-0/ will have the following files:

_tmp.attempt_XX....XX_XXXX_m_000000_0
attempt_XX....XX_XXXX_m_000000_0
attempt_XX....XX_XXXX_m_000000_1
attempt_XX....XX_XXXX_m_000000_2
...

The first file is empty, but Hive considers it as part of the generated output and tries to load it in the destination table, giving the following error message:

Loading data to table output_table partition {p=p1}
Failed with exception Cannot load text files into a table stored as SequenceFile.
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask

I'm not sure if the files generated by killed tasks will always be empty. If not, this bug might render the data inconsistent.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (HIVE-277) Files created by redundant tasks that have been killed are taken into consideration

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joydeep Sen Sarma resolved HIVE-277.
------------------------------------

    Resolution: Duplicate

> Files created by redundant tasks that have been killed are taken into consideration
> -----------------------------------------------------------------------------------
>
>                 Key: HIVE-277
>                 URL: https://issues.apache.org/jira/browse/HIVE-277
>             Project: Hadoop Hive
>          Issue Type: Bug
>         Environment: Hive with Hadoop 0.19.0 on a Linux cluster
>            Reporter: Rodrigo Schmidt
>            Priority: Minor
>
> Hadoop starts redundant tasks (mappers, by default) if it some particular mappers are taking too long to be executed. When one of the redundant tasks finishes, the others are killed. Killed tasks may generate output files (usually empty) and Hive is considering them as part of the job output.
> In my case, I'm profiling one of the mappers in an INSERT OVERWRITE TABLE ... SELECT (map-only) query, and the extra time added by the profiler makes hadoop start a second mapper for the same part of the input. When one of these redundant mappers finishes, the other is killed, and /tmp/hive-xxxx/xxxxxxxxx.10000.insclause-0/ will have the following files:
> _tmp.attempt_XX....XX_XXXX_m_000000_0
> attempt_XX....XX_XXXX_m_000000_0
> attempt_XX....XX_XXXX_m_000000_1
> attempt_XX....XX_XXXX_m_000000_2
> ...
> The first file is empty, but Hive considers it as part of the generated output and tries to load it in the destination table, giving the following error message:
> Loading data to table output_table partition {p=p1}
> Failed with exception Cannot load text files into a table stored as SequenceFile.
> FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask
> I'm not sure if the files generated by killed tasks will always be empty. If not, this bug might render the data inconsistent.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.