You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "S. Alex Smith (JIRA)" <ji...@apache.org> on 2009/01/26 21:13:59 UTC

[jira] Created: (HIVE-251) Failures in Transform don't stop the job

Failures in Transform don't stop the job
----------------------------------------

                 Key: HIVE-251
                 URL: https://issues.apache.org/jira/browse/HIVE-251
             Project: Hadoop Hive
          Issue Type: Bug
          Components: Serializers/Deserializers
            Reporter: S. Alex Smith


If the program executed via a SELECT TRANSFORM() USING 'foo' exits with a non-zero exit status, Hive proceeds as if nothing bad happened.  The main way that the user knows something bad has happened is if the user checks the logs (probably because he got no output).  This is doubly bad if the program only fails part of the time (say, on certain inputs) since the job will still produce output and thus the problem will likely go undetected.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-251) Failures in Transform don't stop the job

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ashish Thusoo updated HIVE-251:
-------------------------------

    Attachment: patch-251_1.txt

uploading a new patch as there was a non determinism in an existing test. Please review this again.

> Failures in Transform don't stop the job
> ----------------------------------------
>
>                 Key: HIVE-251
>                 URL: https://issues.apache.org/jira/browse/HIVE-251
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>    Affects Versions: 0.2.0
>            Reporter: S. Alex Smith
>            Assignee: Ashish Thusoo
>            Priority: Blocker
>             Fix For: 0.3.0
>
>         Attachments: patch-251.txt, patch-251_1.txt
>
>
> If the program executed via a SELECT TRANSFORM() USING 'foo' exits with a non-zero exit status, Hive proceeds as if nothing bad happened.  The main way that the user knows something bad has happened is if the user checks the logs (probably because he got no output).  This is doubly bad if the program only fails part of the time (say, on certain inputs) since the job will still produce output and thus the problem will likely go undetected.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-251) Failures in Transform don't stop the job

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ashish Thusoo updated HIVE-251:
-------------------------------

    Attachment: patch-251.txt

Fixed the issue.

The problem was the Operator.close was catching and ignoring HiveExceptions.

Also in FileSinkOperator we through HiveExceptions that need to be ignored, e.g. while copying the transient _tmp files. This is thrown from copyUtils in hadoop.

Further, the input20.q test was actually failing but no failure was reported due to masking by this bug. We cannot support unix pipes yet. So as a workaround I have moved them to within a unix script.


> Failures in Transform don't stop the job
> ----------------------------------------
>
>                 Key: HIVE-251
>                 URL: https://issues.apache.org/jira/browse/HIVE-251
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>            Reporter: S. Alex Smith
>            Assignee: Ashish Thusoo
>            Priority: Blocker
>         Attachments: patch-251.txt
>
>
> If the program executed via a SELECT TRANSFORM() USING 'foo' exits with a non-zero exit status, Hive proceeds as if nothing bad happened.  The main way that the user knows something bad has happened is if the user checks the logs (probably because he got no output).  This is doubly bad if the program only fails part of the time (say, on certain inputs) since the job will still produce output and thus the problem will likely go undetected.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-251) Failures in Transform don't stop the job

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12681922#action_12681922 ] 

Namit Jain commented on HIVE-251:
---------------------------------

+1

looks good -- All IOExceptions in FileSinkOperator are beign ignored, but that is the same as current behavior

> Failures in Transform don't stop the job
> ----------------------------------------
>
>                 Key: HIVE-251
>                 URL: https://issues.apache.org/jira/browse/HIVE-251
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>    Affects Versions: 0.2.0
>            Reporter: S. Alex Smith
>            Assignee: Ashish Thusoo
>            Priority: Blocker
>             Fix For: 0.3.0
>
>         Attachments: patch-251.txt
>
>
> If the program executed via a SELECT TRANSFORM() USING 'foo' exits with a non-zero exit status, Hive proceeds as if nothing bad happened.  The main way that the user knows something bad has happened is if the user checks the logs (probably because he got no output).  This is doubly bad if the program only fails part of the time (say, on certain inputs) since the job will still produce output and thus the problem will likely go undetected.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HIVE-251) Failures in Transform don't stop the job

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ashish Thusoo reassigned HIVE-251:
----------------------------------

    Assignee: Ashish Thusoo

> Failures in Transform don't stop the job
> ----------------------------------------
>
>                 Key: HIVE-251
>                 URL: https://issues.apache.org/jira/browse/HIVE-251
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>            Reporter: S. Alex Smith
>            Assignee: Ashish Thusoo
>            Priority: Blocker
>
> If the program executed via a SELECT TRANSFORM() USING 'foo' exits with a non-zero exit status, Hive proceeds as if nothing bad happened.  The main way that the user knows something bad has happened is if the user checks the logs (probably because he got no output).  This is doubly bad if the program only fails part of the time (say, on certain inputs) since the job will still produce output and thus the problem will likely go undetected.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-251) Failures in Transform don't stop the job

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ashish Thusoo updated HIVE-251:
-------------------------------

      Resolution: Fixed
    Hadoop Flags: [Reviewed]
          Status: Resolved  (was: Patch Available)

committed. Also made the changes suggested by Namit.

> Failures in Transform don't stop the job
> ----------------------------------------
>
>                 Key: HIVE-251
>                 URL: https://issues.apache.org/jira/browse/HIVE-251
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>    Affects Versions: 0.2.0
>            Reporter: S. Alex Smith
>            Assignee: Ashish Thusoo
>            Priority: Blocker
>             Fix For: 0.3.0
>
>         Attachments: patch-251.txt, patch-251_1.txt
>
>
> If the program executed via a SELECT TRANSFORM() USING 'foo' exits with a non-zero exit status, Hive proceeds as if nothing bad happened.  The main way that the user knows something bad has happened is if the user checks the logs (probably because he got no output).  This is doubly bad if the program only fails part of the time (say, on certain inputs) since the job will still produce output and thus the problem will likely go undetected.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-251) Failures in Transform don't stop the job

Posted by "Raghotham Murthy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12669847#action_12669847 ] 

Raghotham Murthy commented on HIVE-251:
---------------------------------------

more information about this error based on Venky's tests.

For the case where the job succeeds even though the script fails, the following logs appear:
--
2009-02-02 17:49:26,251 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator: StreamThread ErrorProcessor done
*2009-02-02 17:49:26,251 ERROR org.apache.hadoop.hive.ql.exec.ScriptOperator: Script failed with code 1*
2009-02-02 17:49:26,251 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator: StreamThread OutputProcessor done
2009-02-02 17:49:26,251 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator: DESERIALIZE_ERRORS:0
2009-02-02 17:49:26,252 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator: SERIALIZE_ERRORS:0
2009-02-02 17:49:26,296 INFO org.apache.hadoop.mapred.TaskRunner: Task 'task_200901301729_7391_m_000000_0' done.
--

For the case where the job fails:
--
2009-02-02 17:59:03,398 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator: StreamThread ErrorProcessor done
*2009-02-02 17:59:03,398 ERROR org.apache.hadoop.hive.ql.exec.ScriptOperator: Error in writing to script: Broken pipe*
2009-02-02 17:59:03,398 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator: StreamThread OutputProcessor done
2009-02-02 17:59:03,399 INFO org.apache.hadoop.hive.ql.exec.MapOperator: DESERIALIZE_ERRORS:0
2009-02-02 17:59:03,399 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator: SERIALIZE_ERRORS:0
2009-02-02 17:59:03,399 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator: DESERIALIZE_ERRORS:0
2009-02-02 17:59:03,429 WARN org.apache.hadoop.mapred.TaskTracker: Error running child
java.lang.RuntimeException: java.io.IOException: Broken pipe
--

Looks like exceptions thrown in ScriptOperator.close are being ignored.

> Failures in Transform don't stop the job
> ----------------------------------------
>
>                 Key: HIVE-251
>                 URL: https://issues.apache.org/jira/browse/HIVE-251
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>            Reporter: S. Alex Smith
>
> If the program executed via a SELECT TRANSFORM() USING 'foo' exits with a non-zero exit status, Hive proceeds as if nothing bad happened.  The main way that the user knows something bad has happened is if the user checks the logs (probably because he got no output).  This is doubly bad if the program only fails part of the time (say, on certain inputs) since the job will still produce output and thus the problem will likely go undetected.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-251) Failures in Transform don't stop the job

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12682866#action_12682866 ] 

Namit Jain commented on HIVE-251:
---------------------------------

+1

looks good. 

Before committing, can you remove the extra commented line in the reduce script, and add a comment explaining what it is doing

> Failures in Transform don't stop the job
> ----------------------------------------
>
>                 Key: HIVE-251
>                 URL: https://issues.apache.org/jira/browse/HIVE-251
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>    Affects Versions: 0.2.0
>            Reporter: S. Alex Smith
>            Assignee: Ashish Thusoo
>            Priority: Blocker
>             Fix For: 0.3.0
>
>         Attachments: patch-251.txt, patch-251_1.txt
>
>
> If the program executed via a SELECT TRANSFORM() USING 'foo' exits with a non-zero exit status, Hive proceeds as if nothing bad happened.  The main way that the user knows something bad has happened is if the user checks the logs (probably because he got no output).  This is doubly bad if the program only fails part of the time (say, on certain inputs) since the job will still produce output and thus the problem will likely go undetected.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-251) Failures in Transform don't stop the job

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ashish Thusoo updated HIVE-251:
-------------------------------

    Priority: Blocker  (was: Major)

Marking this as a blocker for 0.3 as there is no suitable workaround.

> Failures in Transform don't stop the job
> ----------------------------------------
>
>                 Key: HIVE-251
>                 URL: https://issues.apache.org/jira/browse/HIVE-251
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>            Reporter: S. Alex Smith
>            Priority: Blocker
>
> If the program executed via a SELECT TRANSFORM() USING 'foo' exits with a non-zero exit status, Hive proceeds as if nothing bad happened.  The main way that the user knows something bad has happened is if the user checks the logs (probably because he got no output).  This is doubly bad if the program only fails part of the time (say, on certain inputs) since the job will still produce output and thus the problem will likely go undetected.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-251) Failures in Transform don't stop the job

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12681905#action_12681905 ] 

Joydeep Sen Sarma commented on HIVE-251:
----------------------------------------

why are we ignoring the exception in FileSinkOperator?

> Failures in Transform don't stop the job
> ----------------------------------------
>
>                 Key: HIVE-251
>                 URL: https://issues.apache.org/jira/browse/HIVE-251
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>    Affects Versions: 0.2.0
>            Reporter: S. Alex Smith
>            Assignee: Ashish Thusoo
>            Priority: Blocker
>             Fix For: 0.3.0
>
>         Attachments: patch-251.txt
>
>
> If the program executed via a SELECT TRANSFORM() USING 'foo' exits with a non-zero exit status, Hive proceeds as if nothing bad happened.  The main way that the user knows something bad has happened is if the user checks the logs (probably because he got no output).  This is doubly bad if the program only fails part of the time (say, on certain inputs) since the job will still produce output and thus the problem will likely go undetected.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-251) Failures in Transform don't stop the job

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12681910#action_12681910 ] 

Ashish Thusoo commented on HIVE-251:
------------------------------------

copyUtils in hadoop throws IOException if the copying file disappears after the list status is done on the directory. We hit that with tmp_ files.

> Failures in Transform don't stop the job
> ----------------------------------------
>
>                 Key: HIVE-251
>                 URL: https://issues.apache.org/jira/browse/HIVE-251
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>    Affects Versions: 0.2.0
>            Reporter: S. Alex Smith
>            Assignee: Ashish Thusoo
>            Priority: Blocker
>             Fix For: 0.3.0
>
>         Attachments: patch-251.txt
>
>
> If the program executed via a SELECT TRANSFORM() USING 'foo' exits with a non-zero exit status, Hive proceeds as if nothing bad happened.  The main way that the user knows something bad has happened is if the user checks the logs (probably because he got no output).  This is doubly bad if the program only fails part of the time (say, on certain inputs) since the job will still produce output and thus the problem will likely go undetected.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-251) Failures in Transform don't stop the job

Posted by "Venky Iyer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12680708#action_12680708 ] 

Venky Iyer commented on HIVE-251:
---------------------------------

I believe this has to do with cases where the (streaming) script reads in all of its input from stdin and _then_ fails. Scripts that fail before reading in all their input are correctly marked as having failed. 

> Failures in Transform don't stop the job
> ----------------------------------------
>
>                 Key: HIVE-251
>                 URL: https://issues.apache.org/jira/browse/HIVE-251
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>            Reporter: S. Alex Smith
>            Priority: Blocker
>
> If the program executed via a SELECT TRANSFORM() USING 'foo' exits with a non-zero exit status, Hive proceeds as if nothing bad happened.  The main way that the user knows something bad has happened is if the user checks the logs (probably because he got no output).  This is doubly bad if the program only fails part of the time (say, on certain inputs) since the job will still produce output and thus the problem will likely go undetected.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-251) Failures in Transform don't stop the job

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ashish Thusoo updated HIVE-251:
-------------------------------

        Fix Version/s: 0.3.0
    Affects Version/s: 0.2.0
               Status: Patch Available  (was: Open)

submitting patch.

> Failures in Transform don't stop the job
> ----------------------------------------
>
>                 Key: HIVE-251
>                 URL: https://issues.apache.org/jira/browse/HIVE-251
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>    Affects Versions: 0.2.0
>            Reporter: S. Alex Smith
>            Assignee: Ashish Thusoo
>            Priority: Blocker
>             Fix For: 0.3.0
>
>         Attachments: patch-251.txt
>
>
> If the program executed via a SELECT TRANSFORM() USING 'foo' exits with a non-zero exit status, Hive proceeds as if nothing bad happened.  The main way that the user knows something bad has happened is if the user checks the logs (probably because he got no output).  This is doubly bad if the program only fails part of the time (say, on certain inputs) since the job will still produce output and thus the problem will likely go undetected.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.