You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tajo.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2014/07/09 10:14:04 UTC

[jira] [Commented] (TAJO-925) Child ExecutionBlock of JOIN node has different number of shuffle keys.

    [ https://issues.apache.org/jira/browse/TAJO-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14055978#comment-14055978 ] 

ASF GitHub Bot commented on TAJO-925:
-------------------------------------

GitHub user babokim opened a pull request:

    https://github.com/apache/tajo/pull/61

    TAJO-925: Child ExecutionBlock of JOIN node has different number of shuffle keys.

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/babokim/tajo TAJO-925

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/tajo/pull/61.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #61
    
----
commit c54d9f5d0e2891c5ea7ad6161f1409ee2718083d
Author: 김형준 <ba...@babokim-macbook-pro.local>
Date:   2014-07-09T07:56:27Z

    TAJO-925: Child ExecutionBlock of JOIN node has different number of shuffle keys.

commit 15317af05e24e9346e2035188dfac1476d5f1d20
Author: 김형준 <ba...@babokim-macbook-pro.local>
Date:   2014-07-09T07:57:11Z

    Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo

----


> Child ExecutionBlock of JOIN node has different number of shuffle keys.
> -----------------------------------------------------------------------
>
>                 Key: TAJO-925
>                 URL: https://issues.apache.org/jira/browse/TAJO-925
>             Project: Tajo
>          Issue Type: Bug
>            Reporter: Hyoungjun Kim
>            Assignee: Hyoungjun Kim
>            Priority: Minor
>
> If both sides of a join node is not SCAN but SUBQUERY, each node has different number shuffle keys.
> In that case JOIN query returns a wrong result.  I tested with the below test code.
> {code}
> @Test
> public void testJoinWithDifferentShuffleKey() throws Exception {
>   KeyValueSet tableOptions = new KeyValueSet();
>   tableOptions.put(StorageConstants.CSVFILE_DELIMITER, StorageConstants.DEFAULT_FIELD_DELIMITER);
>   tableOptions.put(StorageConstants.CSVFILE_NULL, "\\\\N");
>   Schema schema = new Schema();
>   schema.addColumn("id", Type.INT4);
>   schema.addColumn("name", Type.TEXT);
>   List<String> data = new ArrayList<String>();
>   int bytes = 0;
>   for (int i = 0; i < 1000000; i++) {
>     String row = i + "|" + i + "name012345678901234567890123456789012345678901234567890";
>     bytes += row.getBytes().length;
>     data.add(row);
>     if (bytes > 2 * 1024 * 1024) {
>       break;
>     }
>   }
>   TajoTestingCluster.createTable("large_table", schema, tableOptions, data.toArray(new String[]{}));
>   int originConfValue = conf.getIntVar(ConfVars.DIST_QUERY_JOIN_PARTITION_VOLUME);
>   testingCluster.setAllTajoDaemonConfValue(ConfVars.DIST_QUERY_JOIN_PARTITION_VOLUME.varname, "1");
>   ResultSet res = executeString(
>      "select count(b.id) " +
>          "from (select id, count(*) as cnt from large_table group by id) a " +
>          "left outer join (select id, count(*) as cnt from large_table where id < 200 group by id) b " +
>          "on a.id = b.id"
>   );
>   try {
>     String expected =
>         "?count\n" +
>             "-------------------------------\n" +
>             "200\n";
>     assertEquals(expected, resultSetToString(res));
>   } finally {
>     testingCluster.setAllTajoDaemonConfValue(ConfVars.DIST_QUERY_JOIN_PARTITION_VOLUME.varname, "" + originConfValue);
>     cleanupQuery(res);
>     executeString("DROP TABLE large_table PURGE").close();
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)