You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Siying Dong (JIRA)" <ji...@apache.org> on 2010/07/13 23:57:53 UTC

[jira] Created: (HIVE-1462) Reporting progress in FileSinkOperator works in multiple directory case

Reporting progress in FileSinkOperator works in multiple directory case
-----------------------------------------------------------------------

                 Key: HIVE-1462
                 URL: https://issues.apache.org/jira/browse/HIVE-1462
             Project: Hadoop Hive
          Issue Type: Bug
            Reporter: Siying Dong
            Assignee: Siying Dong


HIVE-1403 fixes the issue of timing out issue when closing too many files but it doesn't cover the case that files are under different directories. For the case of dynamic partitioning, it is usually the case so that we still get time-out.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1462) Reporting progress in FileSinkOperator works in multiple directory case

Posted by "Siying Dong (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Siying Dong updated HIVE-1462:
------------------------------

    Attachment: HIVE-1462.2.patch

remove the progress reporting after each directory.
Rerunning the test suites.

> Reporting progress in FileSinkOperator works in multiple directory case
> -----------------------------------------------------------------------
>
>                 Key: HIVE-1462
>                 URL: https://issues.apache.org/jira/browse/HIVE-1462
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>         Attachments: HIVE-1462.1.patch, HIVE-1462.2.patch
>
>
> HIVE-1403 fixes the issue of timing out issue when closing too many files but it doesn't cover the case that files are under different directories. For the case of dynamic partitioning, it is usually the case so that we still get time-out.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (HIVE-1462) Reporting progress in FileSinkOperator works in multiple directory case

Posted by "Ning Zhang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ning Zhang resolved HIVE-1462.
------------------------------

    Fix Version/s: 0.7.0
       Resolution: Fixed

Committed. Thanks Siying!

> Reporting progress in FileSinkOperator works in multiple directory case
> -----------------------------------------------------------------------
>
>                 Key: HIVE-1462
>                 URL: https://issues.apache.org/jira/browse/HIVE-1462
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1462.1.patch, HIVE-1462.2.patch
>
>
> HIVE-1403 fixes the issue of timing out issue when closing too many files but it doesn't cover the case that files are under different directories. For the case of dynamic partitioning, it is usually the case so that we still get time-out.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1462) Reporting progress in FileSinkOperator works in multiple directory case

Posted by "Ning Zhang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888086#action_12888086 ] 

Ning Zhang commented on HIVE-1462:
----------------------------------

looks good in general, some nitpicks: 
 - lines 633, 666, updateProgress() was called after commit() and closeWriters(), but it was already been called inside these 2 functions. Should we just remove these 2 lines for simplicity, as well as consistent with the case for abortWriters() in line 644. 

> Reporting progress in FileSinkOperator works in multiple directory case
> -----------------------------------------------------------------------
>
>                 Key: HIVE-1462
>                 URL: https://issues.apache.org/jira/browse/HIVE-1462
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>         Attachments: HIVE-1462.1.patch
>
>
> HIVE-1403 fixes the issue of timing out issue when closing too many files but it doesn't cover the case that files are under different directories. For the case of dynamic partitioning, it is usually the case so that we still get time-out.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1462) Reporting progress in FileSinkOperator works in multiple directory case

Posted by "Siying Dong (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Siying Dong updated HIVE-1462:
------------------------------

    Attachment: HIVE-1462.1.patch

Make the progress reporting function a method of FileSinkOperator, instead of only for FSFile class.

> Reporting progress in FileSinkOperator works in multiple directory case
> -----------------------------------------------------------------------
>
>                 Key: HIVE-1462
>                 URL: https://issues.apache.org/jira/browse/HIVE-1462
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>         Attachments: HIVE-1462.1.patch
>
>
> HIVE-1403 fixes the issue of timing out issue when closing too many files but it doesn't cover the case that files are under different directories. For the case of dynamic partitioning, it is usually the case so that we still get time-out.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.