You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Eugene Koifman (JIRA)" <ji...@apache.org> on 2016/05/19 23:29:12 UTC
[jira] [Comment Edited] (HIVE-13354) Add ability to specify Compaction options per table and per request

    [ https://issues.apache.org/jira/browse/HIVE-13354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15292324#comment-15292324 ] 

Eugene Koifman edited comment on HIVE-13354 at 5/19/16 11:29 PM:
-----------------------------------------------------------------

{quote} // intentionally set this high so that ttp1 will not trigger major compaction later on
       conf.setFloatVar(HiveConf.ConfVars.HIVE_COMPACTOR_DELTA_PCT_THRESHOLD, 0.8f);
{quote}
could this be moved to where it's used - it's confusing at its current location

{quote}
           runWorker(conf);  // compact ttp2
	    runWorker(conf);  // compact ttp1
	    runCleaner(conf);
	    rsp = txnHandler.showCompact(new ShowCompactRequest());
	    Assert.assertEquals(2, rsp.getCompacts().size());
	    Assert.assertEquals("ttp2", rsp.getCompacts().get(0).getTablename());
	    Assert.assertEquals("ready for cleaning", rsp.getCompacts().get(0).getState());
	    Assert.assertEquals("ttp1", rsp.getCompacts().get(1).getTablename());
	    Assert.assertEquals("ready for cleaning", rsp.getCompacts().get(1).getState());
{quote}
The "ready for cleaning" seems suspicious after successful runCleaner()...  Also, perhaps TxnStrore.CLEANING_RESPONSE would be better

{quote}
           // ttp1 has 0.8 for DELTA_PCT_THRESHOLD (from hive conf), whereas ttp2 has 0.5 (from tblproperties)
	    // so only ttp2 will trigger major compaction for the newly inserted row (actual pct: 0.66)
{quote}
this seems wrong.    ttp2 had 5 rows which were Major compacted into a base.  Now 2 more rows are added.  2/5 = 40%
Perhaps compaction is triggered because in this case ORC headers make up 99% of the file size.

bq. 949	    Assert.assertEquals("ready for cleaning", rsp.getCompacts().get(2).getState());
I would've expected this state to be TxnStore.SUCCEEDED_RESPONSE after runCleaner().  Why isn't it?

bq. 973	    Assert.assertTrue(job.get("hive.compactor.table.props").contains("orc.compress.size4:8192"));
Why "size4"?

{quote}
void compact(String dbname, String tableName, String partitionName, CompactionType type,
1440	               Map<String, String> tblproperties) throws TException;
1440	
{quote}
This is public API change so should probably deprecate the method with old signature

{quote}
348 pStmt = dbConn.prepareStatement("insert into COMPLETED_COMPACTIONS(CC_ID, CC_DATABASE, CC_TABLE, CC_PARTITION, CC_STATE, CC_TYPE, CC_TBLPROPERTIES, CC_WORKER_ID, CC_START, CC_END, CC_RUN_AS, CC_HIGHEST_TXN_ID, CC_META_INFO, CC_HADOOP_JOB_ID) VALUES(?,?,?,?,?, ?,?,?,?,?, ?,?,?)");
{quote}
A new column is added here but the number of "?" is the same.  How does this work?

{quote}
714	        rs = stmt.executeQuery("select cc_id, cc_database, cc_table, cc_partition, cc_state, " +
715	            "cc_tblproperties from COMPLETED_COMPACTIONS order by cc_database, cc_table, " +
716	            "cc_partition, cc_id desc");
{quote}
Why do you need to know cc_tblproperties in order to delete the entry from history?

etc

Clearly no tests were run on this patch.  In fact the SQL statement errors would cause all these methods to fail, which would explain why your new tests end up seeing unexpected status for various compaction operations.


was (Author: ekoifman):
{quote} // intentionally set this high so that ttp1 will not trigger major compaction later on
       conf.setFloatVar(HiveConf.ConfVars.HIVE_COMPACTOR_DELTA_PCT_THRESHOLD, 0.8f);
{quote}
could this be moved to where it's used - it's confusing at its current location

{quote}
           runWorker(conf);  // compact ttp2
	    runWorker(conf);  // compact ttp1
	    runCleaner(conf);
	    rsp = txnHandler.showCompact(new ShowCompactRequest());
	    Assert.assertEquals(2, rsp.getCompacts().size());
	    Assert.assertEquals("ttp2", rsp.getCompacts().get(0).getTablename());
	    Assert.assertEquals("ready for cleaning", rsp.getCompacts().get(0).getState());
	    Assert.assertEquals("ttp1", rsp.getCompacts().get(1).getTablename());
	    Assert.assertEquals("ready for cleaning", rsp.getCompacts().get(1).getState());
{quote}
The "ready for cleaning" seems suspicious after successful runCleaner()...  Also, perhaps TxnStrore.CLEANING_RESPONSE would be better

{quote}
           // ttp1 has 0.8 for DELTA_PCT_THRESHOLD (from hive conf), whereas ttp2 has 0.5 (from tblproperties)
	    // so only ttp2 will trigger major compaction for the newly inserted row (actual pct: 0.66)
{quote}
this seems wrong.    ttp2 had 5 rows which were Major compacted into a base.  Now 2 more rows are added.  2/5 = 40%
Perhaps compaction is triggered because in this case ORC headers make up 99% of the file size.

bq. 949	    Assert.assertEquals("ready for cleaning", rsp.getCompacts().get(2).getState());
I would've expected this state to be TxnStore.SUCCEEDED_RESPONSE after runCleaner().  Why isn't it?

bq. 973	    Assert.assertTrue(job.get("hive.compactor.table.props").contains("orc.compress.size4:8192"));
Why "size4"?

{quote}
void compact(String dbname, String tableName, String partitionName, CompactionType type,
1440	               Map<String, String> tblproperties) throws TException;
1440	
{quote}
This is public API change so should probably deprecate the method with old signature

{quote}
348 pStmt = dbConn.prepareStatement("insert into COMPLETED_COMPACTIONS(CC_ID, CC_DATABASE, CC_TABLE, CC_PARTITION, CC_STATE, CC_TYPE, CC_TBLPROPERTIES, CC_WORKER_ID, CC_START, CC_END, CC_RUN_AS, CC_HIGHEST_TXN_ID, CC_META_INFO, CC_HADOOP_JOB_ID) VALUES(?,?,?,?,?, ?,?,?,?,?, ?,?,?)");
{quote}
A new column is added here but the number of "?" is the same.  How does this work?

{quote}
714	        rs = stmt.executeQuery("select cc_id, cc_database, cc_table, cc_partition, cc_state, " +
715	            "cc_tblproperties from COMPLETED_COMPACTIONS order by cc_database, cc_table, " +
716	            "cc_partition, cc_id desc");
{quote}
Why do you need to know cc_tblproperties in order to delete the entry from history?

etc


> Add ability to specify Compaction options per table and per request
> -------------------------------------------------------------------
>
>                 Key: HIVE-13354
>                 URL: https://issues.apache.org/jira/browse/HIVE-13354
>             Project: Hive
>          Issue Type: Improvement
>    Affects Versions: 1.3.0, 2.0.0
>            Reporter: Eugene Koifman
>            Assignee: Wei Zheng
>              Labels: TODOC2.1
>         Attachments: HIVE-13354.1.patch, HIVE-13354.1.withoutSchemaChange.patch
>
>
> Currently the are a few options that determine when automatic compaction is triggered.  They are specified once for the warehouse.
> This doesn't make sense - some table may be more important and need to be compacted more often.
> We should allow specifying these on per table basis.
> Also, compaction is an MR job launched from within the metastore.  There is currently no way to control job parameters (like memory, for example) except to specify it in hive-site.xml for metastore which means they are site wide.
> Should add a way to specify these per table (perhaps even per compaction if launched via ALTER TABLE)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)