You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "John Sichi (JIRA)" <ji...@apache.org> on 2010/04/28 22:17:49 UTC

[jira] Created: (HIVE-1328) make mapred.input.dir.recursive work for select *

make mapred.input.dir.recursive work for select *
-------------------------------------------------

                 Key: HIVE-1328
                 URL: https://issues.apache.org/jira/browse/HIVE-1328
             Project: Hadoop Hive
          Issue Type: Improvement
          Components: Query Processor
    Affects Versions: 0.6.0
            Reporter: John Sichi
            Assignee: John Sichi
             Fix For: 0.6.0


For the script below, we would like the behavior from MAPREDUCE-1501 to apply so that the select * returns two rows instead of none.

create table fact_daily(x int)
partitioned by (ds string);

create table fact_tz(x int)
partitioned by (ds string, hr string, gmtoffset string);

alter table fact_tz 
add partition (ds='2010-01-03', hr='1', gmtoffset='-8');
insert overwrite table fact_tz
partition (ds='2010-01-03', hr='1', gmtoffset='-8')
select key+11 from src where key=484;

alter table fact_tz 
add partition (ds='2010-01-03', hr='2', gmtoffset='-7');
insert overwrite table fact_tz
partition (ds='2010-01-03', hr='2', gmtoffset='-7')
select key+12 from src where key=484;

alter table fact_daily
set tblproperties('EXTERNAL'='TRUE');

alter table fact_daily
add partition (ds='2010-01-03')
location '/user/hive/warehouse/fact_tz/ds=2010-01-03';

set mapred.input.dir.recursive=true;
select * from fact_daily where ds='2010-01-03';


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1328) make mapred.input.dir.recursive work for select *

Posted by "John Sichi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

John Sichi updated HIVE-1328:
-----------------------------

    Attachment: HIVE-1328.1.patch

Still testing this one.  Won't be possible to submit an automated test until we're running against a version of Hadoop which includes MAPREDUCE-1501, so I'll open a separate deferred issue for that.


> make mapred.input.dir.recursive work for select *
> -------------------------------------------------
>
>                 Key: HIVE-1328
>                 URL: https://issues.apache.org/jira/browse/HIVE-1328
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>    Affects Versions: 0.6.0
>            Reporter: John Sichi
>            Assignee: John Sichi
>             Fix For: 0.6.0
>
>         Attachments: HIVE-1328.1.patch
>
>
> For the script below, we would like the behavior from MAPREDUCE-1501 to apply so that the select * returns two rows instead of none.
> create table fact_daily(x int)
> partitioned by (ds string);
> create table fact_tz(x int)
> partitioned by (ds string, hr string, gmtoffset string);
> alter table fact_tz 
> add partition (ds='2010-01-03', hr='1', gmtoffset='-8');
> insert overwrite table fact_tz
> partition (ds='2010-01-03', hr='1', gmtoffset='-8')
> select key+11 from src where key=484;
> alter table fact_tz 
> add partition (ds='2010-01-03', hr='2', gmtoffset='-7');
> insert overwrite table fact_tz
> partition (ds='2010-01-03', hr='2', gmtoffset='-7')
> select key+12 from src where key=484;
> alter table fact_daily
> set tblproperties('EXTERNAL'='TRUE');
> alter table fact_daily
> add partition (ds='2010-01-03')
> location '/user/hive/warehouse/fact_tz/ds=2010-01-03';
> set mapred.input.dir.recursive=true;
> select * from fact_daily where ds='2010-01-03';

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1328) make mapred.input.dir.recursive work for select *

Posted by "John Sichi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12862077#action_12862077 ] 

John Sichi commented on HIVE-1328:
----------------------------------

Hi Ed,

This is not a new feature--this is an inconsistency in an existing feature when a particular Hadoop parameter is enabled (it should not matter whether you use select * vs a more complex select, you should get the same results).

In general, prioritization is driven by a number of factors such as the overall project roadmap, quality, and the use cases which the developer wants or needs to make work (this one happens to be important for Facebook, which is why I'm working on it at the moment); if the ones you mention are high priority for you, please submit patches for them so we can get them resolved.

Regardless of that, thanks for all the bug reports that you have submitted--they're very valuable in themselves, and we want to get them all fixed too.


> make mapred.input.dir.recursive work for select *
> -------------------------------------------------
>
>                 Key: HIVE-1328
>                 URL: https://issues.apache.org/jira/browse/HIVE-1328
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>    Affects Versions: 0.6.0
>            Reporter: John Sichi
>            Assignee: John Sichi
>             Fix For: 0.6.0
>
>
> For the script below, we would like the behavior from MAPREDUCE-1501 to apply so that the select * returns two rows instead of none.
> create table fact_daily(x int)
> partitioned by (ds string);
> create table fact_tz(x int)
> partitioned by (ds string, hr string, gmtoffset string);
> alter table fact_tz 
> add partition (ds='2010-01-03', hr='1', gmtoffset='-8');
> insert overwrite table fact_tz
> partition (ds='2010-01-03', hr='1', gmtoffset='-8')
> select key+11 from src where key=484;
> alter table fact_tz 
> add partition (ds='2010-01-03', hr='2', gmtoffset='-7');
> insert overwrite table fact_tz
> partition (ds='2010-01-03', hr='2', gmtoffset='-7')
> select key+12 from src where key=484;
> alter table fact_daily
> set tblproperties('EXTERNAL'='TRUE');
> alter table fact_daily
> add partition (ds='2010-01-03')
> location '/user/hive/warehouse/fact_tz/ds=2010-01-03';
> set mapred.input.dir.recursive=true;
> select * from fact_daily where ds='2010-01-03';

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1328) make mapred.input.dir.recursive work for select *

Posted by "John Sichi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

John Sichi updated HIVE-1328:
-----------------------------

    Status: Patch Available  (was: Open)

Review notes:

* Refactored recursive walk function from BucketizedHIveInputFormat to FileUtils
* Opened HIVE-1336 for test case.



> make mapred.input.dir.recursive work for select *
> -------------------------------------------------
>
>                 Key: HIVE-1328
>                 URL: https://issues.apache.org/jira/browse/HIVE-1328
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>    Affects Versions: 0.6.0
>            Reporter: John Sichi
>            Assignee: John Sichi
>             Fix For: 0.6.0
>
>         Attachments: HIVE-1328.1.patch
>
>
> For the script below, we would like the behavior from MAPREDUCE-1501 to apply so that the select * returns two rows instead of none.
> create table fact_daily(x int)
> partitioned by (ds string);
> create table fact_tz(x int)
> partitioned by (ds string, hr string, gmtoffset string);
> alter table fact_tz 
> add partition (ds='2010-01-03', hr='1', gmtoffset='-8');
> insert overwrite table fact_tz
> partition (ds='2010-01-03', hr='1', gmtoffset='-8')
> select key+11 from src where key=484;
> alter table fact_tz 
> add partition (ds='2010-01-03', hr='2', gmtoffset='-7');
> insert overwrite table fact_tz
> partition (ds='2010-01-03', hr='2', gmtoffset='-7')
> select key+12 from src where key=484;
> alter table fact_daily
> set tblproperties('EXTERNAL'='TRUE');
> alter table fact_daily
> add partition (ds='2010-01-03')
> location '/user/hive/warehouse/fact_tz/ds=2010-01-03';
> set mapred.input.dir.recursive=true;
> select * from fact_daily where ds='2010-01-03';

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1328) make mapred.input.dir.recursive work for select *

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12864101#action_12864101 ] 

Namit Jain commented on HIVE-1328:
----------------------------------

+1

looks good

> make mapred.input.dir.recursive work for select *
> -------------------------------------------------
>
>                 Key: HIVE-1328
>                 URL: https://issues.apache.org/jira/browse/HIVE-1328
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>    Affects Versions: 0.6.0
>            Reporter: John Sichi
>            Assignee: John Sichi
>             Fix For: 0.6.0
>
>         Attachments: HIVE-1328.1.patch
>
>
> For the script below, we would like the behavior from MAPREDUCE-1501 to apply so that the select * returns two rows instead of none.
> create table fact_daily(x int)
> partitioned by (ds string);
> create table fact_tz(x int)
> partitioned by (ds string, hr string, gmtoffset string);
> alter table fact_tz 
> add partition (ds='2010-01-03', hr='1', gmtoffset='-8');
> insert overwrite table fact_tz
> partition (ds='2010-01-03', hr='1', gmtoffset='-8')
> select key+11 from src where key=484;
> alter table fact_tz 
> add partition (ds='2010-01-03', hr='2', gmtoffset='-7');
> insert overwrite table fact_tz
> partition (ds='2010-01-03', hr='2', gmtoffset='-7')
> select key+12 from src where key=484;
> alter table fact_daily
> set tblproperties('EXTERNAL'='TRUE');
> alter table fact_daily
> add partition (ds='2010-01-03')
> location '/user/hive/warehouse/fact_tz/ds=2010-01-03';
> set mapred.input.dir.recursive=true;
> select * from fact_daily where ds='2010-01-03';

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1328) make mapred.input.dir.recursive work for select *

Posted by "Edward Capriolo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12862074#action_12862074 ] 

Edward Capriolo commented on HIVE-1328:
---------------------------------------

Can we look at HIVE-1318 and maybe HIVE-1303 first. Already the external partitions seem to have bugs can we get them working properly before more features are added?

> make mapred.input.dir.recursive work for select *
> -------------------------------------------------
>
>                 Key: HIVE-1328
>                 URL: https://issues.apache.org/jira/browse/HIVE-1328
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>    Affects Versions: 0.6.0
>            Reporter: John Sichi
>            Assignee: John Sichi
>             Fix For: 0.6.0
>
>
> For the script below, we would like the behavior from MAPREDUCE-1501 to apply so that the select * returns two rows instead of none.
> create table fact_daily(x int)
> partitioned by (ds string);
> create table fact_tz(x int)
> partitioned by (ds string, hr string, gmtoffset string);
> alter table fact_tz 
> add partition (ds='2010-01-03', hr='1', gmtoffset='-8');
> insert overwrite table fact_tz
> partition (ds='2010-01-03', hr='1', gmtoffset='-8')
> select key+11 from src where key=484;
> alter table fact_tz 
> add partition (ds='2010-01-03', hr='2', gmtoffset='-7');
> insert overwrite table fact_tz
> partition (ds='2010-01-03', hr='2', gmtoffset='-7')
> select key+12 from src where key=484;
> alter table fact_daily
> set tblproperties('EXTERNAL'='TRUE');
> alter table fact_daily
> add partition (ds='2010-01-03')
> location '/user/hive/warehouse/fact_tz/ds=2010-01-03';
> set mapred.input.dir.recursive=true;
> select * from fact_daily where ds='2010-01-03';

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1328) make mapred.input.dir.recursive work for select *

Posted by "Edward Capriolo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12862217#action_12862217 ] 

Edward Capriolo commented on HIVE-1328:
---------------------------------------

I find external partitions to be pretty badly broken now. I am circling around one or two other bugs in them, that I am about to report. Users (including myself) are frustrated beause rather then working with data they have to work around bugs like HIVE-1318. I understand everyone has their own priorities. Call it what you will (inconsistancy/feature) we are adding to the capability of external tables while current features do not even work well. 

In particular HIVE-1318 is brutal. When working with my data I can make no assumptions when querying. I have to do all types of shell scripting to ensure that partitions exist before I query them, adding extra where clauses to carefully select ranges of partitions. 

If you are using external partitions at facebook, I wonder how you work around HIVE-1318, and I am also curious if you experience HIVE-1303 or is this just something in my environment. The handfull of users I have constantly have issues, does everyone there just 'suck it up'?

> make mapred.input.dir.recursive work for select *
> -------------------------------------------------
>
>                 Key: HIVE-1328
>                 URL: https://issues.apache.org/jira/browse/HIVE-1328
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>    Affects Versions: 0.6.0
>            Reporter: John Sichi
>            Assignee: John Sichi
>             Fix For: 0.6.0
>
>
> For the script below, we would like the behavior from MAPREDUCE-1501 to apply so that the select * returns two rows instead of none.
> create table fact_daily(x int)
> partitioned by (ds string);
> create table fact_tz(x int)
> partitioned by (ds string, hr string, gmtoffset string);
> alter table fact_tz 
> add partition (ds='2010-01-03', hr='1', gmtoffset='-8');
> insert overwrite table fact_tz
> partition (ds='2010-01-03', hr='1', gmtoffset='-8')
> select key+11 from src where key=484;
> alter table fact_tz 
> add partition (ds='2010-01-03', hr='2', gmtoffset='-7');
> insert overwrite table fact_tz
> partition (ds='2010-01-03', hr='2', gmtoffset='-7')
> select key+12 from src where key=484;
> alter table fact_daily
> set tblproperties('EXTERNAL'='TRUE');
> alter table fact_daily
> add partition (ds='2010-01-03')
> location '/user/hive/warehouse/fact_tz/ds=2010-01-03';
> set mapred.input.dir.recursive=true;
> select * from fact_daily where ds='2010-01-03';

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1328) make mapred.input.dir.recursive work for select *

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain updated HIVE-1328:
-----------------------------

          Status: Resolved  (was: Patch Available)
    Hadoop Flags: [Reviewed]
      Resolution: Fixed

Committed. Thanks John

> make mapred.input.dir.recursive work for select *
> -------------------------------------------------
>
>                 Key: HIVE-1328
>                 URL: https://issues.apache.org/jira/browse/HIVE-1328
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>    Affects Versions: 0.6.0
>            Reporter: John Sichi
>            Assignee: John Sichi
>             Fix For: 0.6.0
>
>         Attachments: HIVE-1328.1.patch
>
>
> For the script below, we would like the behavior from MAPREDUCE-1501 to apply so that the select * returns two rows instead of none.
> create table fact_daily(x int)
> partitioned by (ds string);
> create table fact_tz(x int)
> partitioned by (ds string, hr string, gmtoffset string);
> alter table fact_tz 
> add partition (ds='2010-01-03', hr='1', gmtoffset='-8');
> insert overwrite table fact_tz
> partition (ds='2010-01-03', hr='1', gmtoffset='-8')
> select key+11 from src where key=484;
> alter table fact_tz 
> add partition (ds='2010-01-03', hr='2', gmtoffset='-7');
> insert overwrite table fact_tz
> partition (ds='2010-01-03', hr='2', gmtoffset='-7')
> select key+12 from src where key=484;
> alter table fact_daily
> set tblproperties('EXTERNAL'='TRUE');
> alter table fact_daily
> add partition (ds='2010-01-03')
> location '/user/hive/warehouse/fact_tz/ds=2010-01-03';
> set mapred.input.dir.recursive=true;
> select * from fact_daily where ds='2010-01-03';

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1328) make mapred.input.dir.recursive work for select *

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12862295#action_12862295 ] 

Namit Jain commented on HIVE-1328:
----------------------------------

I haven't heard anyone running into https://issues.apache.org/jira/browse/HIVE-1303 at facebook.



> make mapred.input.dir.recursive work for select *
> -------------------------------------------------
>
>                 Key: HIVE-1328
>                 URL: https://issues.apache.org/jira/browse/HIVE-1328
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>    Affects Versions: 0.6.0
>            Reporter: John Sichi
>            Assignee: John Sichi
>             Fix For: 0.6.0
>
>
> For the script below, we would like the behavior from MAPREDUCE-1501 to apply so that the select * returns two rows instead of none.
> create table fact_daily(x int)
> partitioned by (ds string);
> create table fact_tz(x int)
> partitioned by (ds string, hr string, gmtoffset string);
> alter table fact_tz 
> add partition (ds='2010-01-03', hr='1', gmtoffset='-8');
> insert overwrite table fact_tz
> partition (ds='2010-01-03', hr='1', gmtoffset='-8')
> select key+11 from src where key=484;
> alter table fact_tz 
> add partition (ds='2010-01-03', hr='2', gmtoffset='-7');
> insert overwrite table fact_tz
> partition (ds='2010-01-03', hr='2', gmtoffset='-7')
> select key+12 from src where key=484;
> alter table fact_daily
> set tblproperties('EXTERNAL'='TRUE');
> alter table fact_daily
> add partition (ds='2010-01-03')
> location '/user/hive/warehouse/fact_tz/ds=2010-01-03';
> set mapred.input.dir.recursive=true;
> select * from fact_daily where ds='2010-01-03';

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.