You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tajo.apache.org by "Hyunsik Choi (JIRA)" <ji...@apache.org> on 2014/07/11 07:48:05 UTC
[jira] [Resolved] (TAJO-925) Child ExecutionBlock of JOIN node has
different number of shuffle keys.
[ https://issues.apache.org/jira/browse/TAJO-925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyunsik Choi resolved TAJO-925.
-------------------------------
Resolution: Fixed
Fix Version/s: 0.9.0
committed.
> Child ExecutionBlock of JOIN node has different number of shuffle keys.
> -----------------------------------------------------------------------
>
> Key: TAJO-925
> URL: https://issues.apache.org/jira/browse/TAJO-925
> Project: Tajo
> Issue Type: Bug
> Reporter: Hyoungjun Kim
> Assignee: Hyoungjun Kim
> Priority: Minor
> Fix For: 0.9.0
>
>
> If both sides of a join node is not SCAN but SUBQUERY, each node has different number shuffle keys.
> In that case JOIN query returns a wrong result. I tested with the below test code.
> {code}
> @Test
> public void testJoinWithDifferentShuffleKey() throws Exception {
> KeyValueSet tableOptions = new KeyValueSet();
> tableOptions.put(StorageConstants.CSVFILE_DELIMITER, StorageConstants.DEFAULT_FIELD_DELIMITER);
> tableOptions.put(StorageConstants.CSVFILE_NULL, "\\\\N");
> Schema schema = new Schema();
> schema.addColumn("id", Type.INT4);
> schema.addColumn("name", Type.TEXT);
> List<String> data = new ArrayList<String>();
> int bytes = 0;
> for (int i = 0; i < 1000000; i++) {
> String row = i + "|" + i + "name012345678901234567890123456789012345678901234567890";
> bytes += row.getBytes().length;
> data.add(row);
> if (bytes > 2 * 1024 * 1024) {
> break;
> }
> }
> TajoTestingCluster.createTable("large_table", schema, tableOptions, data.toArray(new String[]{}));
> int originConfValue = conf.getIntVar(ConfVars.DIST_QUERY_JOIN_PARTITION_VOLUME);
> testingCluster.setAllTajoDaemonConfValue(ConfVars.DIST_QUERY_JOIN_PARTITION_VOLUME.varname, "1");
> ResultSet res = executeString(
> "select count(b.id) " +
> "from (select id, count(*) as cnt from large_table group by id) a " +
> "left outer join (select id, count(*) as cnt from large_table where id < 200 group by id) b " +
> "on a.id = b.id"
> );
> try {
> String expected =
> "?count\n" +
> "-------------------------------\n" +
> "200\n";
> assertEquals(expected, resultSetToString(res));
> } finally {
> testingCluster.setAllTajoDaemonConfValue(ConfVars.DIST_QUERY_JOIN_PARTITION_VOLUME.varname, "" + originConfValue);
> cleanupQuery(res);
> executeString("DROP TABLE large_table PURGE").close();
> }
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.2#6252)