You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Navis (JIRA)" <ji...@apache.org> on 2012/07/05 10:12:33 UTC

[jira] [Created] (HIVE-3227) Implement data loading from user provided string directly for test

Navis created HIVE-3227:
---------------------------

             Summary: Implement data loading from user provided string directly for test
                 Key: HIVE-3227
                 URL: https://issues.apache.org/jira/browse/HIVE-3227
             Project: Hive
          Issue Type: Improvement
          Components: Query Processor, Testing Infrastructure
    Affects Versions: 0.10.0
            Reporter: Navis
            Assignee: Navis
            Priority: Trivial


{code}
load data instream 'key value\nkey2 value2' into table test;
{code}

This will make test easier and also can reduce test time. For example,
{code}
-- ppr_pushdown.q
create table ppr_test (key string) partitioned by (ds string);
alter table ppr_test add partition (ds = '1234');
insert overwrite table ppr_test partition(ds = '1234') select * from (select '1234' from src limit 1 union all select 'abcd' from src limit 1) s;
{code}
last query is 4MR job. But can be replaced by
{code}
create table ppr_test (key string) partitioned by (ds string) ROW FORMAT delimited fields terminated by ' ';
alter table ppr_test add partition (ds = '1234');
load data local instream '1234\nabcd' overwrite into table ppr_test partition(ds = '1234');
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-3227) Implement data loading from user provided string directly for test

Posted by "Navis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-3227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13424695#comment-13424695 ] 

Navis commented on HIVE-3227:
-----------------------------

You're right. I didn't thought about that.
                
> Implement data loading from user provided string directly for test
> ------------------------------------------------------------------
>
>                 Key: HIVE-3227
>                 URL: https://issues.apache.org/jira/browse/HIVE-3227
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor, Testing Infrastructure
>    Affects Versions: 0.10.0
>            Reporter: Navis
>            Assignee: Navis
>            Priority: Trivial
>
> {code}
> load data instream 'key value\nkey2 value2' into table test;
> {code}
> This will make test easier and also can reduce test time. For example,
> {code}
> -- ppr_pushdown.q
> create table ppr_test (key string) partitioned by (ds string);
> alter table ppr_test add partition (ds = '1234');
> insert overwrite table ppr_test partition(ds = '1234') select * from (select '1234' from src limit 1 union all select 'abcd' from src limit 1) s;
> {code}
> last query is 4MR job. But can be replaced by
> {code}
> create table ppr_test (key string) partitioned by (ds string) ROW FORMAT delimited fields terminated by ' ';
> alter table ppr_test add partition (ds = '1234');
> load data local instream '1234\nabcd' overwrite into table ppr_test partition(ds = '1234');
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-3227) Implement data loading from user provided string directly for test

Posted by "Navis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-3227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Navis updated HIVE-3227:
------------------------

    Status: Patch Available  (was: Open)

https://reviews.facebook.net/D3993
                
> Implement data loading from user provided string directly for test
> ------------------------------------------------------------------
>
>                 Key: HIVE-3227
>                 URL: https://issues.apache.org/jira/browse/HIVE-3227
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor, Testing Infrastructure
>    Affects Versions: 0.10.0
>            Reporter: Navis
>            Assignee: Navis
>            Priority: Trivial
>
> {code}
> load data instream 'key value\nkey2 value2' into table test;
> {code}
> This will make test easier and also can reduce test time. For example,
> {code}
> -- ppr_pushdown.q
> create table ppr_test (key string) partitioned by (ds string);
> alter table ppr_test add partition (ds = '1234');
> insert overwrite table ppr_test partition(ds = '1234') select * from (select '1234' from src limit 1 union all select 'abcd' from src limit 1) s;
> {code}
> last query is 4MR job. But can be replaced by
> {code}
> create table ppr_test (key string) partitioned by (ds string) ROW FORMAT delimited fields terminated by ' ';
> alter table ppr_test add partition (ds = '1234');
> load data local instream '1234\nabcd' overwrite into table ppr_test partition(ds = '1234');
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-3227) Implement data loading from user provided string directly for test

Posted by "Edward Capriolo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-3227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13424380#comment-13424380 ] 

Edward Capriolo commented on HIVE-3227:
---------------------------------------

I think we have to be careful here. What does it do for partitioned tables. What does it do for bucket tables? What does it do for tables that are ordered by something? We know that this is just supposed to be used for light testing but users will likely use it for everything and then it either needs to work or fail with error
                
> Implement data loading from user provided string directly for test
> ------------------------------------------------------------------
>
>                 Key: HIVE-3227
>                 URL: https://issues.apache.org/jira/browse/HIVE-3227
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor, Testing Infrastructure
>    Affects Versions: 0.10.0
>            Reporter: Navis
>            Assignee: Navis
>            Priority: Trivial
>
> {code}
> load data instream 'key value\nkey2 value2' into table test;
> {code}
> This will make test easier and also can reduce test time. For example,
> {code}
> -- ppr_pushdown.q
> create table ppr_test (key string) partitioned by (ds string);
> alter table ppr_test add partition (ds = '1234');
> insert overwrite table ppr_test partition(ds = '1234') select * from (select '1234' from src limit 1 union all select 'abcd' from src limit 1) s;
> {code}
> last query is 4MR job. But can be replaced by
> {code}
> create table ppr_test (key string) partitioned by (ds string) ROW FORMAT delimited fields terminated by ' ';
> alter table ppr_test add partition (ds = '1234');
> load data local instream '1234\nabcd' overwrite into table ppr_test partition(ds = '1234');
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-3227) Implement data loading from user provided string directly for test

Posted by "Edward Capriolo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-3227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13408086#comment-13408086 ] 

Edward Capriolo commented on HIVE-3227:
---------------------------------------

@Navis  This is a good idea but we have to be very careful about features we add to the language. Also we have to cover the cases of overwritten files etc.

Your idea though prompted me to write:
https://issues.apache.org/jira/browse/HIVE-3238

I think user-space is a better answer for this problem. We can still consider adding this issue but I think 3238 is a little safer.

You should hang out on hive IRC so we can discuss more. 3238 is a bit more verbose and will not speed up until testing like you mentioned, but I like the approach better.
                
> Implement data loading from user provided string directly for test
> ------------------------------------------------------------------
>
>                 Key: HIVE-3227
>                 URL: https://issues.apache.org/jira/browse/HIVE-3227
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor, Testing Infrastructure
>    Affects Versions: 0.10.0
>            Reporter: Navis
>            Assignee: Navis
>            Priority: Trivial
>
> {code}
> load data instream 'key value\nkey2 value2' into table test;
> {code}
> This will make test easier and also can reduce test time. For example,
> {code}
> -- ppr_pushdown.q
> create table ppr_test (key string) partitioned by (ds string);
> alter table ppr_test add partition (ds = '1234');
> insert overwrite table ppr_test partition(ds = '1234') select * from (select '1234' from src limit 1 union all select 'abcd' from src limit 1) s;
> {code}
> last query is 4MR job. But can be replaced by
> {code}
> create table ppr_test (key string) partitioned by (ds string) ROW FORMAT delimited fields terminated by ' ';
> alter table ppr_test add partition (ds = '1234');
> load data local instream '1234\nabcd' overwrite into table ppr_test partition(ds = '1234');
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-3227) Implement data loading from user provided string directly for test

Posted by "Navis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-3227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409159#comment-13409159 ] 

Navis commented on HIVE-3227:
-----------------------------

@Edward Capriolo 
As you said, HIVE-3238(cool!) seemed to be much better and safer way for doing the same. 

Nonetheless, I insist there should be a alternative way which can populate table/partition simply and quickly (only) for test use. Current full test time(4h?) seemed to be too long and ever increasing.
                
> Implement data loading from user provided string directly for test
> ------------------------------------------------------------------
>
>                 Key: HIVE-3227
>                 URL: https://issues.apache.org/jira/browse/HIVE-3227
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor, Testing Infrastructure
>    Affects Versions: 0.10.0
>            Reporter: Navis
>            Assignee: Navis
>            Priority: Trivial
>
> {code}
> load data instream 'key value\nkey2 value2' into table test;
> {code}
> This will make test easier and also can reduce test time. For example,
> {code}
> -- ppr_pushdown.q
> create table ppr_test (key string) partitioned by (ds string);
> alter table ppr_test add partition (ds = '1234');
> insert overwrite table ppr_test partition(ds = '1234') select * from (select '1234' from src limit 1 union all select 'abcd' from src limit 1) s;
> {code}
> last query is 4MR job. But can be replaced by
> {code}
> create table ppr_test (key string) partitioned by (ds string) ROW FORMAT delimited fields terminated by ' ';
> alter table ppr_test add partition (ds = '1234');
> load data local instream '1234\nabcd' overwrite into table ppr_test partition(ds = '1234');
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira