You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by "Alicia Ying Shu (JIRA)" <ji...@apache.org> on 2015/11/11 08:54:10 UTC

[jira] [Comment Edited] (PHOENIX-2315) Group by fields of primary key columns together with non-primary key columns can give wrong result

    [ https://issues.apache.org/jira/browse/PHOENIX-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14997698#comment-14997698 ] 

Alicia Ying Shu edited comment on PHOENIX-2315 at 11/11/15 7:54 AM:
--------------------------------------------------------------------

[~jamestaylor] Here is the unit test in GroupByIT.java that reproduces the issue.
{code}
    @Test
    public void testExpressionInGroupBy() throws Exception {
    	Properties props = PropertiesUtil.deepCopy(TEST_PROPERTIES);
    	Connection conn = DriverManager.getConnection(getUrl(), props);
    	String ddl = " create table tgb_counter(tgb_id integer NOT NULL,utc_date_epoch integer NOT NULL,tgb_name varchar(40),ack_success_count integer" +
    			",ack_success_one_ack_count integer, CONSTRAINT pk_tgb_counter PRIMARY KEY(tgb_id, utc_date_epoch))";
    	String query = "SELECT tgb_id, tgb_name, (utc_date_epoch/10)*10 AS utc_epoch_hour,SUM(ack_success_count + ack_success_one_ack_count) AS ack_tx_sum" +
    			" FROM tgb_counter GROUP BY tgb_id, tgb_name, utc_epoch_hour";

    	createTestTable(getUrl(), ddl);
    	try {
    		String dml = "UPSERT INTO tgb_counter VALUES(?,?,?,?,?)";
    		PreparedStatement stmt = conn.prepareStatement(dml);
    		stmt.setInt(1, 1);
    		stmt.setInt(2, 1000);
    		stmt.setString(3, "aaa");
    		stmt.setInt(4, 1);
    		stmt.setInt(5, 1);
    		stmt.execute();
    		stmt.setInt(1, 2);
    		stmt.setInt(2, 2000);
    		stmt.setString(3, "bbb");
    		stmt.setInt(4, 2);
    		stmt.setInt(5, 2);
    		stmt.execute();
    		conn.commit();

    		ResultSet rs = conn.createStatement().executeQuery(query);
    		assertTrue(rs.next());
    		assertEquals(1,rs.getInt(1));
    		assertEquals("aaa",rs.getString(2));
    		assertEquals(1000,rs.getInt(3));
    		assertEquals(2,rs.getInt(4));
    		assertTrue(rs.next());
    		assertEquals(2,rs.getInt(1));
    		assertEquals("bbb",rs.getString(2));
    		assertEquals(2000,rs.getInt(3));
    		assertEquals(4,rs.getInt(4));
    		assertFalse(rs.next());
    		rs.close();
    	} finally {
    		conn.createStatement().execute("drop table tgb_counter");
    		conn.close();
    	}
    }
{code}
The error we got: 
java.lang.AssertionError: expected:<aaa> but was:<null>
	at org.junit.Assert.fail(Assert.java:88)
	at org.junit.Assert.failNotEquals(Assert.java:834)
	at org.junit.Assert.assertEquals(Assert.java:118)
	at org.junit.Assert.assertEquals(Assert.java:144)
	at org.apache.phoenix.end2end.GroupByIT.testExpressionInGroupBy(GroupByIT.java:681)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
	at org.junit.runners.Suite.runChild(Suite.java:128)
	at org.junit.runners.Suite.runChild(Suite.java:27)
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
	at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
	at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)

==========================

What I found is that in GroupByCompiler.java, inside compiler() method Collections.sort(groupBys, new Comparator<Pair<Integer,Expression>>() is called. The group by fields are re-arranged and primary keys are put in the beginning. In my example, original "GROUP BY tgb_id, tgb_name, utc_epoch_hour" becomes "GROUP BY tgb_id, utc_epoch_hour, tgb_name". This introduced the wrong result NULL for tgb_name when it should be 'aaa'. Why the sorting of the group by fields is needed? Removing the sort will get us the correct results. 



was (Author: aliciashu):
[~jamestaylor] Here is the unit test in GroupByIT.java that reproduces the issue.
    @Test
    public void testExpressionInGroupBy() throws Exception {
    	Properties props = PropertiesUtil.deepCopy(TEST_PROPERTIES);
    	Connection conn = DriverManager.getConnection(getUrl(), props);
    	String ddl = " create table tgb_counter(tgb_id integer NOT NULL,utc_date_epoch integer NOT NULL,tgb_name varchar(40),ack_success_count integer" +
    			",ack_success_one_ack_count integer, CONSTRAINT pk_tgb_counter PRIMARY KEY(tgb_id, utc_date_epoch))";
    	String query = "SELECT tgb_id, tgb_name, (utc_date_epoch/10)*10 AS utc_epoch_hour,SUM(ack_success_count + ack_success_one_ack_count) AS ack_tx_sum" +
    			" FROM tgb_counter GROUP BY tgb_id, tgb_name, utc_epoch_hour";

    	createTestTable(getUrl(), ddl);
    	try {
    		String dml = "UPSERT INTO tgb_counter VALUES(?,?,?,?,?)";
    		PreparedStatement stmt = conn.prepareStatement(dml);
    		stmt.setInt(1, 1);
    		stmt.setInt(2, 1000);
    		stmt.setString(3, "aaa");
    		stmt.setInt(4, 1);
    		stmt.setInt(5, 1);
    		stmt.execute();
    		stmt.setInt(1, 2);
    		stmt.setInt(2, 2000);
    		stmt.setString(3, "bbb");
    		stmt.setInt(4, 2);
    		stmt.setInt(5, 2);
    		stmt.execute();
    		conn.commit();

    		ResultSet rs = conn.createStatement().executeQuery(query);
    		assertTrue(rs.next());
    		assertEquals(1,rs.getInt(1));
    		assertEquals("aaa",rs.getString(2));
    		assertEquals(1000,rs.getInt(3));
    		assertEquals(2,rs.getInt(4));
    		assertTrue(rs.next());
    		assertEquals(2,rs.getInt(1));
    		assertEquals("bbb",rs.getString(2));
    		assertEquals(2000,rs.getInt(3));
    		assertEquals(4,rs.getInt(4));
    		assertFalse(rs.next());
    		rs.close();
    	} finally {
    		conn.createStatement().execute("drop table tgb_counter");
    		conn.close();
    	}
    }

The error we got: 
java.lang.AssertionError: expected:<aaa> but was:<null>
	at org.junit.Assert.fail(Assert.java:88)
	at org.junit.Assert.failNotEquals(Assert.java:834)
	at org.junit.Assert.assertEquals(Assert.java:118)
	at org.junit.Assert.assertEquals(Assert.java:144)
	at org.apache.phoenix.end2end.GroupByIT.testExpressionInGroupBy(GroupByIT.java:681)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
	at org.junit.runners.Suite.runChild(Suite.java:128)
	at org.junit.runners.Suite.runChild(Suite.java:27)
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
	at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
	at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)

==========================

What I found is that in GroupByCompiler.java, inside compiler() method Collections.sort(groupBys, new Comparator<Pair<Integer,Expression>>() is called. The group by fields are re-arranged and primary keys are put in the beginning. In my example, original "GROUP BY tgb_id, tgb_name, utc_epoch_hour" becomes "GROUP BY tgb_id, utc_epoch_hour, tgb_name". This introduced the wrong result NULL for tgb_name when it should be 'aaa'. Why the sorting of the group by fields is needed? Removing the sort will get us the correct results. 


> Group by fields of primary key columns together with non-primary key columns can give wrong result
> --------------------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-2315
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2315
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Alicia Ying Shu
>            Assignee: Alicia Ying Shu
>
> create table tgb_counter(tgb_id integer NOT NULL,utc_date_epoch integer NOT NULL,tgb_name varchar(40),ack_success_count integer,ack_success_one_ack_count integer, CONSTRAINT pk_tgb_counter PRIMARY KEY(tgb_id, utc_date_epoch)); 
> SELECT tgb_id, tgb_name, (utc_date_epoch/10)*10 AS utc_epoch_hour,SUM(ack_success_count + ack_success_one_ack_count) AS ack_tx_sum FROM tgb_counter GROUP BY tgb_id, tgb_name, utc_epoch_hour;
> tgb_name returns NULL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)