You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Namit Jain (JIRA)" <ji...@apache.org> on 2010/02/24 19:41:29 UTC

[jira] Created: (HIVE-1193) ensure sorting properties for a table

ensure sorting properties for a table
-------------------------------------

                 Key: HIVE-1193
                 URL: https://issues.apache.org/jira/browse/HIVE-1193
             Project: Hadoop Hive
          Issue Type: New Feature
          Components: Query Processor
            Reporter: Namit Jain
             Fix For: 0.6.0


If a table is sorted, and data is being inserted into that - currently, we dont make sure that data is sorted. That might be useful some downstream operations.
This cannot be made the default due to backward compatibility, but an option can be added for the same

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HIVE-1193) ensure sorting properties for a table

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain reassigned HIVE-1193:
--------------------------------

    Assignee: Namit Jain

> ensure sorting properties for a table
> -------------------------------------
>
>                 Key: HIVE-1193
>                 URL: https://issues.apache.org/jira/browse/HIVE-1193
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>             Fix For: 0.6.0
>
>
> If a table is sorted, and data is being inserted into that - currently, we dont make sure that data is sorted. That might be useful some downstream operations.
> This cannot be made the default due to backward compatibility, but an option can be added for the same

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1193) ensure sorting properties for a table

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain updated HIVE-1193:
-----------------------------

    Attachment: hive.1193.1.patch

> ensure sorting properties for a table
> -------------------------------------
>
>                 Key: HIVE-1193
>                 URL: https://issues.apache.org/jira/browse/HIVE-1193
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>             Fix For: 0.6.0
>
>         Attachments: hive.1193.1.patch
>
>
> If a table is sorted, and data is being inserted into that - currently, we dont make sure that data is sorted. That might be useful some downstream operations.
> This cannot be made the default due to backward compatibility, but an option can be added for the same

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1193) ensure sorting properties for a table

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838628#action_12838628 ] 

He Yongqiang commented on HIVE-1193:
------------------------------------

Looks good. Will test.

> ensure sorting properties for a table
> -------------------------------------
>
>                 Key: HIVE-1193
>                 URL: https://issues.apache.org/jira/browse/HIVE-1193
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>             Fix For: 0.6.0
>
>         Attachments: hive.1193.1.patch
>
>
> If a table is sorted, and data is being inserted into that - currently, we dont make sure that data is sorted. That might be useful some downstream operations.
> This cannot be made the default due to backward compatibility, but an option can be added for the same

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1193) ensure sorting properties for a table

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839004#action_12839004 ] 

Namit Jain commented on HIVE-1193:
----------------------------------

There are 2 different jiras: one for ensuring the bucketing properties and one for ensuring the sorted properties.

Currently, even though the tables are sorted and bucketed during the table creation, they are not enforced.
It is up to the user to make sure the data is bucketed/sorted appropriately while loading.
Since it is not enforced, the optimizer cannot take advantage of that because it doesnt know whether the data is actually sorted.

There was a jira previously, which took advantage of the fact that the data is sorted for processing for group by.
This is controlled by configurable parameters.

Going forward, we want to use them for joining, specifically for sort merge joins.

@Edward, currently we are not doing skipping based on sorting properties.

Currently, we create an additional map-reduce job for bucketing/sorting.
Even if there is a cluster by, and the data is already bucketed/sorted by the correct key, we dont use that. There
will be another map-reduce job. This can be optimized in future.

Merging of map-only jobs is disabled, but same thing should be performed for map-reduce jobs also. I will file a follow-up
jira on that.


> ensure sorting properties for a table
> -------------------------------------
>
>                 Key: HIVE-1193
>                 URL: https://issues.apache.org/jira/browse/HIVE-1193
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>             Fix For: 0.6.0
>
>         Attachments: hive.1193.1.patch
>
>
> If a table is sorted, and data is being inserted into that - currently, we dont make sure that data is sorted. That might be useful some downstream operations.
> This cannot be made the default due to backward compatibility, but an option can be added for the same

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1193) ensure sorting properties for a table

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838958#action_12838958 ] 

He Yongqiang commented on HIVE-1193:
------------------------------------

@Zheng,
>>1. How do we make sure that the data is bucketed / sorted? By adding an additional map-reduce job?
Yes. 
>>2. What if the user already specified "CLUSTER BY key" in his query?
As 1, there will be a new job added which will redistribute the data. 
If the user specify a cluster by column different than the table's sort and bucket property, we maybe should let it fail. But right now that cluster by is actually ignored.
>>3. Do we disable merging of small files when we do this?
Yes. We should disable it. we should disable it when enabled enforceBucketing or enforceSorting


> ensure sorting properties for a table
> -------------------------------------
>
>                 Key: HIVE-1193
>                 URL: https://issues.apache.org/jira/browse/HIVE-1193
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>             Fix For: 0.6.0
>
>         Attachments: hive.1193.1.patch
>
>
> If a table is sorted, and data is being inserted into that - currently, we dont make sure that data is sorted. That might be useful some downstream operations.
> This cannot be made the default due to backward compatibility, but an option can be added for the same

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1193) ensure sorting properties for a table

Posted by "Edward Capriolo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838914#action_12838914 ] 

Edward Capriolo commented on HIVE-1193:
---------------------------------------

Also how can the optimizer take advantage of this? If we know data is sorted we could do some aggressive pruning (if we know offsets) and short circuiting for some where conditions.

> ensure sorting properties for a table
> -------------------------------------
>
>                 Key: HIVE-1193
>                 URL: https://issues.apache.org/jira/browse/HIVE-1193
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>             Fix For: 0.6.0
>
>         Attachments: hive.1193.1.patch
>
>
> If a table is sorted, and data is being inserted into that - currently, we dont make sure that data is sorted. That might be useful some downstream operations.
> This cannot be made the default due to backward compatibility, but an option can be added for the same

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1193) ensure sorting properties for a table

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838737#action_12838737 ] 

Zheng Shao commented on HIVE-1193:
----------------------------------

Can we have some more description on the JIRA?
The patch contains 2 properties: enforceBucketing and enforceSorting. But I don't see it from the JIRA.

1. How do we make sure that the data is bucketed / sorted? By adding an additional map-reduce job?
2. What if the user already specified "CLUSTER BY key" in his query?
3. Do we disable merging of small files when we do this?


> ensure sorting properties for a table
> -------------------------------------
>
>                 Key: HIVE-1193
>                 URL: https://issues.apache.org/jira/browse/HIVE-1193
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>             Fix For: 0.6.0
>
>         Attachments: hive.1193.1.patch
>
>
> If a table is sorted, and data is being inserted into that - currently, we dont make sure that data is sorted. That might be useful some downstream operations.
> This cannot be made the default due to backward compatibility, but an option can be added for the same

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1193) ensure sorting properties for a table

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain updated HIVE-1193:
-----------------------------

    Status: Patch Available  (was: Open)

> ensure sorting properties for a table
> -------------------------------------
>
>                 Key: HIVE-1193
>                 URL: https://issues.apache.org/jira/browse/HIVE-1193
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>             Fix For: 0.6.0
>
>         Attachments: hive.1193.1.patch
>
>
> If a table is sorted, and data is being inserted into that - currently, we dont make sure that data is sorted. That might be useful some downstream operations.
> This cannot be made the default due to backward compatibility, but an option can be added for the same

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1193) Ensure sorting properties for a table

Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Carl Steinbach updated HIVE-1193:
---------------------------------

    Release Note:   (was: HIVE-1193. ensure sorting properties for a table.)
         Summary: Ensure sorting properties for a table  (was: ensure sorting properties for a table)

> Ensure sorting properties for a table
> -------------------------------------
>
>                 Key: HIVE-1193
>                 URL: https://issues.apache.org/jira/browse/HIVE-1193
>             Project: Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>             Fix For: 0.6.0
>
>         Attachments: hive.1193.1.patch
>
>
> If a table is sorted, and data is being inserted into that - currently, we dont make sure that data is sorted. That might be useful some downstream operations.
> This cannot be made the default due to backward compatibility, but an option can be added for the same

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1193) ensure sorting properties for a table

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

He Yongqiang updated HIVE-1193:
-------------------------------

      Resolution: Fixed
    Release Note: HIVE-1193. ensure sorting properties for a table.
    Hadoop Flags: [Reviewed]
          Status: Resolved  (was: Patch Available)

Committed! Thanks Namit!

> ensure sorting properties for a table
> -------------------------------------
>
>                 Key: HIVE-1193
>                 URL: https://issues.apache.org/jira/browse/HIVE-1193
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>             Fix For: 0.6.0
>
>         Attachments: hive.1193.1.patch
>
>
> If a table is sorted, and data is being inserted into that - currently, we dont make sure that data is sorted. That might be useful some downstream operations.
> This cannot be made the default due to backward compatibility, but an option can be added for the same

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.